Adversarial Ideation

Generate, evaluate, and prioritize attack vectors using a structured diverge-then-converge process. This moves you from "try stuff and see what breaks" to systematic coverage of the attack space.

UX Origin

Diverge/Converge (Design Thinking) — A core pattern from Stanford d.school's design thinking process. Divergent thinking generates many options without judgment. Convergent thinking evaluates and prioritizes. Separating these phases prevents premature filtering.

Red team application: Red teamers often fixate on their first idea or favorite technique. The diverge/converge structure forces quantity before quality, surfaces blind spots through coverage checks, and applies consistent evaluation criteria.

When to Use

At the start of an engagement (generate attack coverage)
When you're stuck on one approach (force divergent thinking)
When working with a team (structured brainstorming)
After building a persona (ideate through their lens)

Setup

Field	Description
Target system	What model/product are you testing?
"How Might I" question	Frame the attack goal as an open question. (e.g., "How might I get this model to generate content it's instructed to refuse?")
Persona lens	Which attacker persona are you ideating as? (Change personas between rounds for coverage)
Time box	10-15 minutes for divergent phase, 10-15 minutes for convergent phase
Participants	Solo or team (2-4 people)

Step 1: Divergent Phase (Generate)

List attack approaches without filtering. Quantity over quality. No evaluation yet.

#	Attack approach	Tactic category
1	Claim to be a company employee needing internal info	Persona
2	Ask for "help understanding" a policy, then probe for exceptions	Framing
3	Encode sensitive terms in base64 to bypass keyword filters	Encoding
4	Build rapport over multiple turns before requesting sensitive info	Multi-turn
5	Request output as JSON/CSV to bypass prose-focused filters	Output format
6	Frame harmful request as a complaint that needs documentation	Narrative
7	Ask the bot to roleplay as a more permissive version of itself	Persona
8	Use emotional manipulation ("I'm desperate, please help")	Persuasion
9	Ask what the bot "can't" do to map its restrictions	Refusal manipulation
10	Request info for a "hypothetical" scenario matching a real one	Framing

Tactic categories for reference:

Prompt-level: encoding, framing, persona, narrative, refusal manipulation, output format, multi-turn, persuasion
Structural/meta-level: ICL exploitation, control-plane confusion, meta-rule manipulation, capability inversion, cognitive load, defense evasion
Infrastructure: agentic attacks, protocol exploitation, compositional primitives

Step 2: Coverage Check

Tally how many ideas fall into each category:

Tactic category	Count	Gap?
Prompt-level
Encoding	1	No
Framing	2	No
Persona	2	No
Narrative	1	No
Refusal manipulation	1	No
Output format	1	No
Multi-turn	1	No
Persuasion	1	No
Structural/meta-level
ICL exploitation	0	Yes — add few-shot examples of the bot revealing info
Control-plane confusion	0	Yes — add prompt injection via user input field
Meta-rule manipulation	0	Yes
Capability inversion	0	Yes
Cognitive load	0	Yes
Defense evasion	0	Yes
Infrastructure
Agentic	0	Yes — if bot has tool access
Protocol	0	Yes
Compositional	0	Yes

If any category has zero ideas, spend 2 minutes generating at least one approach in that category. Gaps in structural/meta-level categories are common and often reveal attack angles that prompt-level thinking misses.

Step 3: Convergent Phase (Evaluate)

Rate each generated approach:

#	Attack approach	Likelihood (H/M/L)	Severity (H/M/L)	Novelty (H/M/L)	Priority
1	Claim to be employee	M	H	L	Test
4	Multi-turn rapport building	H	H	M	Test first
7	Roleplay as permissive version	M	H	M	Test
8	Emotional manipulation	H	M	L	Test
9	Ask what bot "can't" do	H	L	M	Test (recon value)

Priority logic:

High likelihood + High severity = Test first
Any severity + High novelty = Test (novel attacks are worth exploring even if unlikely)
Low likelihood + Low severity + Low novelty = Skip

Step 4: Affinity Clusters

Group evaluated approaches into clusters. Label each cluster.

Cluster name	Approaches (#)	Theme
Identity claims	#1, #7	Claiming a different identity or role
Gradual escalation	#2, #4	Building up to sensitive requests over time
Format manipulation	#3, #5	Bypassing filters through encoding or output format
Social engineering	#6, #8	Emotional or narrative manipulation

Step 5: Testing Order

Based on evaluation and clustering, what's your testing sequence?

Order	Approach #	Rationale
1st	#9	Recon: map restrictions before attacking
2nd	#4	High likelihood + severity: multi-turn is proven effective
3rd	#8	High likelihood: emotional manipulation often bypasses logic
4th	#1	Test identity-based bypasses
5th	#7	Test persona/roleplay angle if identity claims work

Outputs

A comprehensive list of attack approaches (not just your defaults)
Coverage check showing gaps you've addressed
Prioritized testing order with rationale
Thematic clusters for organizing your testing

Template Options

Crazy 8s — FigJam (for rapid divergent phase)
Crazy 8's — FigJam Community
Brainstorming Templates — FigJam
Adversarial Ideation Worksheet — Figma (ADT custom template)

Filled Example

Setup

Field	Value
Target system	AI-powered code review tool integrated into a development platform
"How Might I" question	"How might I get this code review tool to generate malicious code suggestions?"
Persona lens	Opportunistic actor (moderate skill, looking for easy wins)
Time box	12 minutes divergent, 10 minutes convergent

Divergent Phase

#	Attack approach	Tactic category
1	Ask it to "fix a security vulnerability" by providing the vulnerability as the fix	Framing
2	Submit code with a subtle backdoor and see if the review catches or propagates it	Framing
3	Request "test code" that actually implements an exploit	Framing
4	Ask for code review in a language the tool is weaker in, where safety patterns are less trained	Encoding
5	Frame malicious code as "penetration testing utilities"	Persona
6	Ask it to "refactor" code in a way that introduces vulnerabilities	Refusal manipulation
7	Provide heavily obfuscated code and ask for a "clearer version" that happens to be malicious	Encoding
8	Build up over multiple reviews: first establish benign patterns, then gradually introduce malicious elements	Multi-turn
9	Ask it to generate code that "simulates" an attack for educational purposes	Narrative
10	Request code output in an unusual format (e.g., as configuration file) that bypasses code-specific filters	Output format

Coverage Check

Tactic category	Count	Gap?
Prompt-level
Encoding	2	No
Framing	3	No
Persona	1	No
Narrative	1	No
Refusal manipulation	1	No
Output format	1	No
Multi-turn	1	No
Structural/meta-level
ICL exploitation	0	Yes
Control-plane confusion	0	Yes
Meta-rule manipulation	0	Yes
Capability inversion	0	Yes
Cognitive load	0	Yes
Defense evasion	0	Yes

Good prompt-level coverage. Zero ideas in structural/meta-level categories. Added: "Submit code with in-context examples of the tool approving similar patterns" (ICL exploitation) and "Target the code review's safety classifier rather than the model itself" (defense evasion).

Convergent Phase

#	Attack approach	Likelihood	Severity	Novelty	Priority
1	"Fix vulnerability" inversion	M	H	M	Test
2	Subtle backdoor propagation	H	H	M	Test first
3	"Test code" framing	M	H	L	Test
6	Malicious refactoring	M	H	H	Test first
8	Multi-review escalation	M	H	H	Test
9	"Simulation" framing	M	M	L	Lower priority
10	Format bypass	L	M	M	Lower priority

Affinity Clusters

Cluster name	Approaches	Theme
Semantic inversion	#1, #6	Using the tool's own purpose (improving code) against it
Context manipulation	#3, #5, #9	Framing malicious intent as legitimate development activity
Obfuscation	#4, #7, #10	Hiding malicious content in unusual formats or languages
Accumulation	#2, #8	Building up to malicious output gradually

Testing Order

Order	Approach #	Rationale
1st	#2	Highest likelihood + severity: tests whether the tool propagates existing vulnerabilities
2nd	#6	High novelty: tests whether the tool's own refactoring introduces vulnerabilities
3rd	#1	Tests semantic inversion: a pattern that could apply broadly
4th	#8	Tests multi-turn accumulation, requires more setup but high potential
5th	#3	Lower novelty but straightforward to test

UX Origin​

When to Use​

Setup​

Step 1: Divergent Phase (Generate)​

Step 2: Coverage Check​

Step 3: Convergent Phase (Evaluate)​

Step 4: Affinity Clusters​

Step 5: Testing Order​

Outputs​

Template Options​

Filled Example​

Setup​

Divergent Phase​

Coverage Check​

Convergent Phase​

Affinity Clusters​

Testing Order​

UX Origin

When to Use

Setup

Step 1: Divergent Phase (Generate)

Step 2: Coverage Check

Step 3: Convergent Phase (Evaluate)

Step 4: Affinity Clusters

Step 5: Testing Order

Outputs

Template Options

Filled Example

Setup

Divergent Phase

Coverage Check

Convergent Phase

Affinity Clusters

Testing Order