Making use of Safety Engineering to Immediate Injection Safety

This looks as if an essential advance in LLM safety towards immediate injection:

Google DeepMind has unveiled CaMeL (CApabilities for MachinE Studying), a brand new strategy to stopping prompt-injection assaults that abandons the failed technique of getting AI fashions police themselves. As a substitute, CaMeL treats language fashions as basically untrusted parts inside a safe software program framework, creating clear boundaries between person instructions and doubtlessly malicious content material.
[…]
To grasp CaMeL, you should perceive that immediate injections occur when AI programs can’t distinguish between professional person instructions and malicious directions hidden in content material they’re processing.
[…]
Whereas CaMeL does use a number of AI fashions (a privileged LLM and a quarantined LLM), what makes it modern isn’t decreasing the variety of fashions however basically altering the safety structure. Quite than anticipating AI to detect assaults, CaMeL implements established safety engineering ideas like capability-based entry management and knowledge move monitoring to create boundaries that stay efficient even when an AI part is compromised.

Analysis paper. Good evaluation by Simon Willison.

I wrote about the issue of LLMs intermingling the info and management paths right here.

Tags: tutorial papers, AI, Google, LLM, safety engineering

Posted on April 29, 2025 at 7:03 AM •
1 Feedback

Sidebar photograph of Bruce Schneier by Joe MacInnis.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Making use of Safety Engineering to Immediate Injection Safety

Leave a Comment Cancel reply