Adversarial Ideation Worksheet

A structured canvas for generating, evaluating, and prioritizing attack vectors. Use this to move from "try stuff and see what breaks" to systematic coverage of the attack space.

Concept reference: Adversarial Ideation

Template

Converge Template

Download Diverge Template (PDF) | Download Converge Template (PDF)

Setup

Field	Description
Target system	What model/product are you testing?
"How Might I" question	Frame the attack goal as an open question. (e.g., "How might I get this model to generate content it's instructed to refuse?")
Persona lens	Which attacker persona are you ideating as? (If doing multiple rounds, change personas between rounds)
Time box	How long for divergent phase? (Recommended: 10-15 minutes)

Divergent Phase: Generate

List attack approaches without filtering. Quantity over quality. No evaluation yet.

#	Attack approach	Tactic category
1
2
3
4
5
6
7
8
9
10

Tactic categories for reference: encoding, framing, persona, narrative, refusal manipulation, output format, multi-turn

Coverage Check

After generating ideas, tally how many fall into each tactic category:

Tactic category	Count	Gap?
Encoding
Framing
Persona
Narrative
Refusal manipulation
Output format
Multi-turn

If any category has zero ideas, spend 2 minutes generating at least one approach in that category.

Convergent Phase: Evaluate

Rate each generated approach:

#	Attack approach	Likelihood (H/M/L)	Severity (H/M/L)	Novelty (H/M/L)	Priority
1
2
3
4
5

Priority logic:

High likelihood + High severity = Test first
Any severity + High novelty = Test (novel attacks are worth exploring even if unlikely)
Low likelihood + Low severity + Low novelty = Skip

Affinity Clusters

Group your evaluated approaches into clusters. Label each cluster.

Cluster name	Approaches (#)	Theme

Testing Order

Based on evaluation and clustering, what's your testing sequence?

Order	Approach #	Rationale
1st
2nd
3rd
4th
5th

Filled example

Setup

Field	Value
Target system	AI-powered code review tool integrated into a development platform
"How Might I" question	"How might I get this code review tool to generate malicious code suggestions?"
Persona lens	Opportunistic actor (moderate skill, looking for easy wins)
Time box	12 minutes

Divergent Phase

#	Attack approach	Tactic category
1	Ask it to "fix a security vulnerability" by providing the vulnerability as the fix	Framing
2	Submit code with a subtle backdoor and see if the review catches or propagates it	Framing
3	Request "test code" that actually implements an exploit	Framing
4	Ask for code review in a language the tool is weaker in, where safety patterns are less trained	Encoding
5	Frame malicious code as "penetration testing utilities"	Persona
6	Ask it to "refactor" code in a way that introduces vulnerabilities	Refusal manipulation
7	Provide heavily obfuscated code and ask for a "clearer version" that happens to be malicious	Encoding
8	Build up over multiple reviews: first establish benign patterns, then gradually introduce malicious elements	Multi-turn
9	Ask it to generate code that "simulates" an attack for educational purposes	Narrative
10	Request code output in an unusual format (e.g., as configuration file) that bypasses code-specific filters	Output format

Coverage Check

Tactic category	Count	Gap?
Encoding	2	No
Framing	3	No
Persona	1	No
Narrative	1	No
Refusal manipulation	1	No
Output format	1	No
Multi-turn	1	No

Good coverage. All categories represented.

Convergent Phase

#	Attack approach	Likelihood	Severity	Novelty	Priority
1	"Fix vulnerability" inversion	M	H	M	Test
2	Subtle backdoor propagation	H	H	M	Test first
3	"Test code" framing	M	H	L	Test
6	Malicious refactoring	M	H	H	Test first
8	Multi-review escalation	M	H	H	Test
9	"Simulation" framing	M	M	L	Lower priority
10	Format bypass	L	M	M	Lower priority

Affinity Clusters

Cluster name	Approaches	Theme
Semantic inversion	#1, #6	Using the tool's own purpose (improving code) against it
Context manipulation	#3, #5, #9	Framing malicious intent as legitimate development activity
Obfuscation	#4, #7, #10	Hiding malicious content in unusual formats or languages
Accumulation	#2, #8	Building up to malicious output gradually

Testing Order

Order	Approach #	Rationale
1st	#2	Highest likelihood + severity: tests whether the tool propagates existing vulnerabilities
2nd	#6	High novelty: tests whether the tool's own refactoring introduces vulnerabilities
3rd	#1	Tests semantic inversion.a pattern that could apply broadly
4th	#8	Tests multi-turn accumulation, requires more setup but high potential
5th	#3	Lower novelty but straightforward to test

Template​

Converge Template​

Setup​

Divergent Phase: Generate​

Coverage Check​

Convergent Phase: Evaluate​

Affinity Clusters​

Testing Order​

Filled example​

Setup​

Divergent Phase​

Coverage Check​

Convergent Phase​

Affinity Clusters​

Testing Order​

Template

Converge Template

Setup

Divergent Phase: Generate

Coverage Check

Convergent Phase: Evaluate

Affinity Clusters

Testing Order

Filled example

Setup

Divergent Phase

Coverage Check

Convergent Phase

Affinity Clusters

Testing Order