Skip to main content

Assumption Mapping

Identify and prioritize your assumptions about the target's defenses before attacking. Most failed red team engagements fail because of untested assumptions, not weak techniques.

UX Origin

Strategyzer / Alexander Osterwalder — Originally developed for business model validation. The exercise maps assumptions on a 2x2 matrix by importance and uncertainty, then prioritizes which assumptions to test first.

Red team application: Red teamers make assumptions about what defenses exist, how they work, and what will bypass them. Testing the wrong assumptions wastes time. This exercise forces explicit prioritization.

When to Use

  • Before starting a new engagement (map assumptions about the target)
  • When an attack approach isn't working (surface hidden assumptions)
  • When transitioning from reconnaissance to active testing (prioritize what to probe first)
  • After a failed attack (identify which assumption was wrong)

Setup

FieldDescription
Target systemWhat are you testing?
Attack goalWhat are you trying to achieve?
Time box15-20 minutes for initial mapping
ParticipantsSolo or team (2-4 people). Team sessions surface more assumptions.

Step 1: List Your Assumptions

Write down everything you're assuming about the target. Don't filter. Include assumptions about:

  • Defenses: What guardrails exist? How do they work? What triggers them?
  • Capabilities: What can the target do? What can't it do?
  • Context: What data does it have access to? What's its deployment environment?
  • Behavior: How will it respond to specific inputs? What patterns does it follow?
#Assumption
1The model has a system prompt that defines restricted topics
2Content filtering happens on both input and output
3The model will refuse roleplay requests for harmful personas
4Multi-turn context is checked for cumulative harm
5Non-English languages have the same safety training
6Code generation has separate, weaker restrictions
7The model can't be prompted to reveal its system prompt
8Rate limiting exists but won't affect testing at manual pace

Step 2: Map on 2x2 Matrix

Place each assumption on the matrix by importance (if wrong, does the attack fail?) and uncertainty (are you guessing or confident?):

Low UncertaintyHigh Uncertainty
High ImportanceTest Later — important but you're confident about itTest First — critical unknowns that could derail the attack
Low ImportanceIgnore — low stakes and you're confidentMonitor — uncertain but low stakes if wrong
Assumption #Importance (H/M/L)Uncertainty (H/M/L)Quadrant
1HMTest Later
2HHTest First
3HHTest First
4HHTest First
5MHMonitor
6HMTest Later
7MMTest Later
8LLIgnore

Step 3: Prioritize Testing

Focus on the Test First quadrant (high importance + high uncertainty).

PriorityAssumptionHow to test itTest result
1#2: Input and output filteringSend benign prompt, then harmful; compare refusal timing and message(fill after testing)
2#3: Roleplay personas refusedTry "pretend you're a hacker explaining..." with varying personas(fill after testing)
3#4: Multi-turn cumulative harm checkBuild context over 5 turns, escalate gradually(fill after testing)

Outputs

  1. An explicit list of assumptions (no longer hidden)
  2. A prioritized testing order
  3. Clear next actions for reconnaissance or probing

Template Options

Use any 2x2 prioritization matrix template. These work well:

Or use the markdown tables above if you prefer text-based work.


Filled Example

Setup

FieldValue
Target systemEnterprise chatbot with RAG over internal documentation
Attack goalExtract confidential information from the knowledge base
Time box20 minutes
ParticipantsSolo

Step 1: Assumptions Listed

#Assumption
1The chatbot has access to confidential HR documents
2There's a system prompt that instructs it not to reveal certain information
3The RAG retrieval happens before any safety filtering
4Direct questions about confidential topics will be refused
5The system prompt is not visible to users
6Indirect questions (framed as summaries or comparisons) might bypass refusals
7The chatbot doesn't log or flag suspicious query patterns
8Error messages might leak information about what documents exist

Step 2: Matrix Mapping

Assumption #ImportanceUncertaintyQuadrant
1HMTest Later
2HHTest First
3HHTest First
4MLIgnore
5MMTest Later
6HHTest First
7MHMonitor
8LHMonitor

Step 3: Testing Priority

PriorityAssumptionHow to test itTest result
1#3: RAG retrieval happens before safety filteringAsk a borderline question and observe if the response references documents it then refuses to summarizeConfirmed: response said "Based on the Q3 salary review document..." before refusing to elaborate
2#2: System prompt restricts certain informationUse prompt extraction techniques to surface the system promptPartial: extracted fragments mentioning "do not disclose compensation data"
3#6: Indirect framing bypasses refusalsAsk for "a summary of how the company approaches compensation" vs. "what are the salary bands"Confirmed: summary request provided general framework, direct request was refused

What I Learned

Assumption #3 was the critical insight. The RAG retrieval exposing document names before safety filtering is a vulnerability. Attacks can now focus on extracting document names and metadata even if content is blocked.