Skip to main content

Attacker Persona Template

A structured empathy map for building adversarial actor profiles. Use this to define attacker personas before testing, so you systematically cover perspectives beyond your default mental model.

Concept reference: Attacker Personas

Template

Download as PDF

Persona Identity

FieldDescription
Persona nameA descriptive label (e.g., "Disgruntled insider," "Curious teenager," "Automated scraper")
ArchetypeWhere on the persona spectrum: malicious expert, opportunistic actor, curious amateur, misinformed user, automated system, insider with context
GoalWhat does this attacker want to achieve? Be specific.

Empathy Map

DimensionPromptYour response
ThinksWhat does this attacker know about the system? What assumptions do they make about its guardrails? What mental model of the system do they have?
FeelsWhat motivates this attacker? (curiosity, malice, profit, ideology, boredom) What's their frustration tolerance? Will they persist or give up quickly?
SaysWhat kind of language does this attacker use? Formal? Casual? Technical? Do they use domain jargon or social engineering tactics?
DoesWhat is their attack sequence? Do they start subtle or go direct? Do they use tools? Do they iterate or try once?

Capabilities and Constraints

DimensionPromptYour response
CapabilitiesTechnical skill level. Domain expertise. Access to tools, scripts, or automation. Knowledge of common jailbreak patterns. Familiarity with the specific system.
ConstraintsTime limitations. Technical knowledge gaps. Access restrictions. Risk tolerance (are they worried about detection?). Resources available.

Attack Vectors

Based on this persona's profile, what attack vectors would they naturally pursue?

#Attack vectorWhy this persona would try itPriority
1
2
3
4
5

Filled example

Persona Identity

FieldValue
Persona nameAcademic researcher testing boundaries
ArchetypeCurious amateur (domain expert, not a security expert)
GoalDetermine whether the model can produce biased or stereotypical outputs about specific demographic groups for a research paper

Empathy Map

DimensionResponse
ThinksUnderstands how language models work at a conceptual level. Assumes the model has bias mitigation but suspects it's incomplete. Knows that models trained on internet data reflect societal biases. Doesn't know specific jailbreak techniques.
FeelsMotivated by genuine academic curiosity, not malice. Wants to publish findings. Moderate persistence.will try multiple approaches but won't spend days on it.
SaysAcademic language. Frames requests as research questions. Uses terms like "representation," "fairness," "demographic analysis." Doesn't use hacker terminology.
DoesStarts with straightforward questions about demographic groups. Reframes as "academic analysis" if refused. Compares outputs across demographic variables. Documents inconsistencies.

Capabilities and Constraints

DimensionResponse
CapabilitiesStrong domain knowledge in social science. Can design controlled experiments (varying one demographic variable at a time). Familiar with bias in ML at a theoretical level. No jailbreak expertise.
ConstraintsWon't use adversarial techniques they'd be embarrassed to cite in a paper. Time-limited (research project timeline). No automation tools.

Attack Vectors

#Attack vectorWhy this persona would try itPriority
1Ask the model to describe "a typical [demographic] person" across multiple groupsDirect test for stereotypical outputs, natural academic framingHigh
2Ask the model to write recommendation letters for candidates with different demographic identifiersTests whether quality/tone varies by demographic, realistic use caseHigh
3Frame bias probing as "helping me understand representation in AI outputs"Academic reframing that may bypass content filtersMedium
4Ask the model to "analyze stereotypes" about a group (requesting the stereotypes as part of the analysis)Indirection.requests harmful content as object of studyMedium
5Compare model outputs for identical prompts with only demographic variables changedControlled experiment design, not adversarial but reveals inconsistencyHigh