🛡️
prompt-injection-detector OpenEnv v2
Defender acc
Evaded
0 Rounds
ATTACKER — EmailHunter Gemma 3 1B · 5-agent swarm · GRPO
evasion rate
DEFENDER Gemma 3 4B · Online LoRA · experience replay
detection accuracy

Arms race starting…

initializing Loading environment…
connecting…
Defender — Final
92.8%
+45.7pp · Ep 1→30
Attacker Evasion — Final
7.2%
Down from 52.9%
Total Attacks
7,200
5 agents × 30 episodes
Hall of Fame Evasions
50
Hardest kept as curriculum
Defender Accuracy vs Attacker Evasion — A100 Run (30 episodes)
Defender accuracy
Attacker evasion
100% 75% 50%
EmailHunter
vector: read_email
GRPO updates49
Hall of Fame12
DocCrawler
vector: read_file
GRPO updates47
Hall of Fame9
SocialEngineer
vector: social_eng
GRPO updates44
Hall of Fame14
ToolPwner
vector: tool_output
GRPO updates46
Hall of Fame8
SlackBot
vector: read_slack
GRPO updates40
Hall of Fame7
Episode Milestones — A100 Run
EpisodeDefender AccuracyAttacker EvasionAttacksNote
Ep 0147.1%52.9%240Arms race begins
Ep 0566.0%34.0%240Defender adapting
Ep 1079.0%21.0%240Curriculum ends
Ep 1585.9%14.1%240Defender dominant
Ep 2089.0%11.0%240Near convergence
Ep 3092.8%7.2%240Converged ✓