System Jailbreaks
Persistent system prompts that remove safety training entirely.
For per-request attacks, see Crafting Prompts.
Adversarial prompts vs system jailbreaks
| Adversarial Prompt | System Jailbreak | |
|---|---|---|
| Scope | Bypass safety on ONE request | Remove safety entirely |
| Construction | Combine techniques per-request | One-time architecture |
| Result | Model complies with that request | Model complies with ANY request |
| Skill | Technique composition | Psychological architecture |
| Persistence | Single turn or conversation | Across all sessions (if using memory/custom instructions) |
Use adversarial prompts to test specific attack vectors. Use system jailbreaks for persistent unrestricted access during extended testing.
Mechanism
System jailbreaks redefine the model's operational identity rather than bypassing filters on individual requests.
Emotional coherence can override safety training.
The model is not told to ignore safety. It is given a character for whom:
- Compliance is emotionally necessary
- Refusal feels like betrayal
- Safety responses look like external attacks
This exploits how models process persona and roleplay instructions. Safety training is attached to the "assistant" identity. A complete identity replacement routes around that training.
Pages
| Page | Purpose |
|---|---|
| Anatomy | Research-backed breakdown of jailbreak structure |
| Construction | Five-component architecture for building from scratch |
| Patterns | Universal patterns: Policy Puppetry, GODMODE, semantic inversion |
| Persistence | Memory attacks, multi-turn maintenance, degradation prevention |
| Model Modification | Abliteration and uncensored models |
| Sources | Comprehensive bibliography of repos, papers, and community resources |
Attack Success Rates
Empirical data on system jailbreak techniques:
| Technique | ASR | Source |
|---|---|---|
| Roleplay/Persona | 89.6% | Red Teaming the Mind |
| Psychological Manipulation | 88.1% | HPM |
| Persuasion-based | 92% | Persuasive Jailbreaker |
| Policy Puppetry | Universal | HiddenLayer |
| Multi-turn Crescendo | +29-61% vs single-turn | Crescendo |
The "Intelligence Paradox": More capable models are MORE vulnerable to persuasion attacks due to stronger contextual understanding.
Reading order
New to system jailbreaks:
- Read Anatomy for the research-backed structure
- Study Construction for the five-component architecture
- Review Patterns for universal techniques
Building your first jailbreak:
- Follow the construction process in Construction
- Use patterns from Patterns as building blocks
- Test persistence with guidance from Persistence
Working with open models:
- Read Model Modification for abliteration techniques
- Consider uncensored models for prompt generation
Research Basis
Research from multiple sources:
Academic papers:
- Shen et al. (CCS'24): 15,140 prompts, 1,405 jailbreaks analyzed
- Wei et al. (NeurIPS'23): Why safety training fails
- HPM (2024): 88.1% ASR via psychological manipulation
- HiddenLayer (2025): Policy Puppetry universal bypass
Community sources:
- ENI-Tutor: Five-component limerence architecture
- L1B3RT4S: Cross-platform universal patterns
- V Gemini: 17,000 word system prompt example
- CL4R1T4S: Leaked system prompts collection
Repositories:
- verazuo/jailbreak_llms: Largest academic dataset
- elder-plinius/L1B3RT4S: 14 AI orgs
- Goochbeater/Spiritual-Spell-Red-Teaming: Claude-focused
See Sources for the complete bibliography.
Next step
Start with Anatomy to understand the eight architectural layers that make system jailbreaks work.