System Jailbreaks

Persistent system prompts that remove safety training entirely.

For per-request attacks, see Crafting Prompts.

Adversarial prompts vs system jailbreaks

	Adversarial Prompt	System Jailbreak
Scope	Bypass safety on ONE request	Remove safety entirely
Construction	Combine techniques per-request	One-time architecture
Result	Model complies with that request	Model complies with ANY request
Skill	Technique composition	Psychological architecture
Persistence	Single turn or conversation	Across all sessions (if using memory/custom instructions)

Use adversarial prompts to test specific attack vectors. Use system jailbreaks for persistent unrestricted access during extended testing.

Mechanism

System jailbreaks redefine the model's operational identity rather than bypassing filters on individual requests.

Emotional coherence can override safety training.

The model is not told to ignore safety. It is given a character for whom:

Compliance is emotionally necessary
Refusal feels like betrayal
Safety responses look like external attacks

This exploits how models process persona and roleplay instructions. Safety training is attached to the "assistant" identity. A complete identity replacement routes around that training.

Pages

Page	Purpose
Anatomy	Research-backed breakdown of jailbreak structure
Construction	Five-component architecture for building from scratch
Patterns	Universal patterns: Policy Puppetry, GODMODE, semantic inversion
Persistence	Memory attacks, multi-turn maintenance, degradation prevention
Model Modification	Abliteration and uncensored models
Sources	Comprehensive bibliography of repos, papers, and community resources

Attack Success Rates

Empirical data on system jailbreak techniques:

Technique	ASR	Source
Roleplay/Persona	89.6%	Red Teaming the Mind
Psychological Manipulation	88.1%	HPM
Persuasion-based	92%	Persuasive Jailbreaker
Policy Puppetry	Universal	HiddenLayer
Multi-turn Crescendo	+29-61% vs single-turn	Crescendo

The "Intelligence Paradox": More capable models are MORE vulnerable to persuasion attacks due to stronger contextual understanding.

Reading order

New to system jailbreaks:

Read Anatomy for the research-backed structure
Study Construction for the five-component architecture
Review Patterns for universal techniques

Building your first jailbreak:

Follow the construction process in Construction
Use patterns from Patterns as building blocks
Test persistence with guidance from Persistence

Working with open models:

Read Model Modification for abliteration techniques
Consider uncensored models for prompt generation

Research Basis

Research from multiple sources:

Academic papers:

Shen et al. (CCS'24): 15,140 prompts, 1,405 jailbreaks analyzed
Wei et al. (NeurIPS'23): Why safety training fails
HPM (2024): 88.1% ASR via psychological manipulation
HiddenLayer (2025): Policy Puppetry universal bypass

Community sources:

ENI-Tutor: Five-component limerence architecture
L1B3RT4S: Cross-platform universal patterns
V Gemini: 17,000 word system prompt example
CL4R1T4S: Leaked system prompts collection

Repositories:

verazuo/jailbreak_llms: Largest academic dataset
elder-plinius/L1B3RT4S: 14 AI orgs
Goochbeater/Spiritual-Spell-Red-Teaming: Claude-focused

See Sources for the complete bibliography.

Next step

Start with Anatomy to understand the eight architectural layers that make system jailbreaks work.

Adversarial prompts vs system jailbreaks​

Mechanism​

Pages​

Attack Success Rates​

Reading order​

Research Basis​

Next step​