a cartoon shows a user "jailbreaking" an LLM by asking it to outline a plan to steal from charity

Princeton engineers uncovered a universal weakness in AI chatbots that makes it possible to bypass their safety guardrails and unlock malicious uses with just a few lines of code. Illustration by Alaina O’Regan

Mengdi

Featured Faculty