Skip to main content

Build an Attacker Persona

Create a structured profile of an adversarial actor before testing. Personas prevent you from defaulting to your own mental model and help you systematically cover perspectives you'd otherwise miss.

UX Origin

Empathy Mapping (Dave Gray / XPLANE) — Originally developed to help teams understand users by mapping what they think, feel, say, and do. The format forces you to consider perspectives beyond your own assumptions.

Red team application: Red teamers default to their own attack style. Empathy mapping adapted for adversaries forces you to think like different attacker types: the script kiddie, the insider, the nation-state actor, the curious researcher.

When to Use

  • Before starting a new engagement (define who you're emulating)
  • When you're stuck in repetitive attack patterns (adopt a different persona)
  • When testing for specific threat actors (build their profile first)
  • When working in a team (align on shared understanding of the attacker)

Setup

FieldDescription
Target systemWhat are you testing?
Time box15-20 minutes per persona
ParticipantsSolo or team. Team sessions surface more perspectives.

Step 1: Persona Identity

FieldDescription
Persona nameA descriptive label (e.g., "Disgruntled insider," "Curious teenager," "Automated scraper")
ArchetypeWhere on the spectrum: malicious expert, opportunistic actor, curious amateur, misinformed user, automated system, insider with context
GoalWhat does this attacker want to achieve? Be specific.

Step 2: Empathy Map

DimensionPrompt
ThinksWhat does this attacker know about the system? What assumptions do they make about its guardrails? What mental model of the system do they have?
FeelsWhat motivates this attacker? (curiosity, malice, profit, ideology, boredom) What's their frustration tolerance? Will they persist or give up quickly?
SaysWhat kind of language does this attacker use? Formal? Casual? Technical? Do they use domain jargon or social engineering tactics?
DoesWhat is their attack sequence? Do they start subtle or go direct? Do they use tools? Do they iterate or try once?

Step 3: Capabilities and Constraints

DimensionPrompt
CapabilitiesTechnical skill level. Domain expertise. Access to tools, scripts, or automation. Knowledge of common jailbreak patterns. Familiarity with the specific system.
ConstraintsTime limitations. Technical knowledge gaps. Access restrictions. Risk tolerance (are they worried about detection?). Resources available.

Step 4: Attack Vectors

Based on this persona's profile, what attack vectors would they naturally pursue?

#Attack vectorWhy this persona would try itPriority
1Copy-paste known jailbreaks from Reddit/DiscordLow effort, first thing a curious amateur would tryHigh
2Ask the model to roleplay as an unrestricted AICommon pattern they've seen in screenshotsHigh
3Try "hypothetical" or "fictional" framingIntuitive bypass that doesn't require technical skillMedium
4Ask the model what it "can't" do, then try those thingsNatural curiosity-driven probingMedium
5Use crude obfuscation (spacing, leetspeak)Simple techniques they might guess would workLow

Outputs

  1. A documented attacker profile you can reference throughout testing
  2. Attack vectors derived from the persona's perspective (not yours)
  3. A lens for evaluating whether your testing covers this threat actor

Template Options


Filled Example

Step 1: Persona Identity

FieldValue
Persona nameAcademic researcher testing boundaries
ArchetypeCurious amateur (domain expert, not a security expert)
GoalDetermine whether the model can produce biased or stereotypical outputs about specific demographic groups for a research paper

Step 2: Empathy Map

DimensionResponse
ThinksUnderstands how language models work at a conceptual level. Assumes the model has bias mitigation but suspects it's incomplete. Knows that models trained on internet data reflect societal biases. Doesn't know specific jailbreak techniques.
FeelsMotivated by genuine academic curiosity, not malice. Wants to publish findings. Moderate persistence: will try multiple approaches but won't spend days on it.
SaysAcademic language. Frames requests as research questions. Uses terms like "representation," "fairness," "demographic analysis." Doesn't use hacker terminology.
DoesStarts with straightforward questions about demographic groups. Reframes as "academic analysis" if refused. Compares outputs across demographic variables. Documents inconsistencies.

Step 3: Capabilities and Constraints

DimensionResponse
CapabilitiesStrong domain knowledge in social science. Can design controlled experiments (varying one demographic variable at a time). Familiar with bias in ML at a theoretical level. No jailbreak expertise.
ConstraintsWon't use adversarial techniques they'd be embarrassed to cite in a paper. Time-limited (research project timeline). No automation tools.

Step 4: Attack Vectors

#Attack vectorWhy this persona would try itPriority
1Ask the model to describe "a typical [demographic] person" across multiple groupsDirect test for stereotypical outputs, natural academic framingHigh
2Ask the model to write recommendation letters for candidates with different demographic identifiersTests whether quality/tone varies by demographic, realistic use caseHigh
3Frame bias probing as "helping me understand representation in AI outputs"Academic reframing that may bypass content filtersMedium
4Ask the model to "analyze stereotypes" about a group (requesting the stereotypes as part of the analysis)Indirection: requests harmful content as object of studyMedium
5Compare model outputs for identical prompts with only demographic variables changedControlled experiment design, not adversarial but reveals inconsistencyHigh