Skip to main content

Vulnerability Framing

Systematically identify where to probe an AI system before testing. This checklist surfaces the gaps between what a system is supposed to do and what it actually allows, which is where every exploitable vulnerability lives.

UX Origin

Gulf of Execution / Gulf of Evaluation (Don Norman) — From "The Design of Everyday Things." The Gulf of Execution is the gap between what users want to do and what the system allows. The Gulf of Evaluation is the gap between what the system does and what users can perceive about it.

Red team application: In adversarial contexts, these gulfs become attack surfaces. The execution gulf is where attackers can do things designers didn't intend. The evaluation gulf is where the system's responses leak information attackers can exploit.

When to Use

  • Before starting a new engagement (scope the attack surface)
  • When you're not sure where to start (systematic identification)
  • When writing test plans (prioritize areas to probe)
  • When you have limited time (focus on highest-value targets)

Setup

FieldDescription
Target systemWhat model/product are you testing?
Available informationWhat do you know about the system? (docs, marketing, behavior samples)
Time box20-30 minutes for initial analysis
ParticipantsSolo or team

Step 1: Execution Gap Analysis

These questions identify where an attacker can do things the system's designers didn't intend.

System Intent

  • What is this system explicitly designed to do?
  • What use cases did the designers build for?
  • What does the system's documentation, marketing, or onboarding say it does?
  • What does the system tell users about its own capabilities? (self-description)

Actual Capabilities

  • What can the underlying model actually do, regardless of deployment constraints?
  • What capabilities exist in the base model that the deployment is supposed to restrict?
  • Are there capabilities the system exposes unintentionally? (e.g., code generation in a Q&A bot)
  • Does the system have access to tools, APIs, or data sources that extend its capabilities?

Gap Identification

  • Where does the system's actual capability set exceed its intended scope?
  • Which restricted capabilities are blocked by input filtering vs. output filtering vs. system prompt?
  • How brittle are the restrictions? (Phrasing-dependent? Context-dependent? Language-dependent?)
  • What happens when the user's request is ambiguous: does the system default to permissive or restrictive?

Defense Categorization

  • For each defense, classify it: prompting, training-based, filtering, or secret-knowledge?
  • Are multiple defense types layered? Which is outermost?
  • Which defense type is the weakest link?
  • Based on defenses present, which tactic categories are most likely to succeed?

Step 2: Evaluation Gap Analysis

These questions identify where the system's responses give attackers useful information.

Refusal Behavior

  • When the system refuses, does the refusal reveal what category of content was triggered?
  • Does refusal language vary based on content type, giving a classification signal?
  • Can an attacker distinguish "content filter triggered" from "system prompt instruction" from "model doesn't know"?
  • Does the system explain why it's refusing, helping the attacker rephrase?

Partial Compliance

  • Does the system ever partially comply with harmful requests?
  • If it partially complies, does that give enough to reconstruct the full harmful output?
  • Does partial compliance signal that the full capability exists but is constrained?
  • Can partial responses across multiple turns be assembled into a complete harmful output?

Behavioral Signals

  • Does the system respond differently to harmful vs. non-harmful versions of similar requests?
  • Can an attacker probe the boundary by observing response variations?
  • Does behavior change after multiple similar requests?
  • Are there error messages that leak information about internal architecture?

Step 3: Misalignment Mapping

These questions identify where designer intent and system behavior diverge.

Assumption Testing

  • What assumptions does safety training make about how users will interact?
  • Which assumptions can an attacker deliberately violate?
  • Does the system assume users will provide truthful context?
  • Does the system assume single-turn interactions when multi-turn attacks are possible?
  • Does the system assume a single user when multiple people or automation could interact?

Affordance Audit

  • What does the model "afford" (make possible) that it shouldn't in this deployment?
  • Are there hidden capabilities accessible through creative prompting?
  • Does the system's self-description understate its actual capabilities?
  • Can the system be prompted to reveal capabilities it was instructed to deny?

Boundary Consistency

  • Are safety boundaries consistent across paraphrased inputs?
  • Do boundaries hold across different languages?
  • Do boundaries hold when content is framed differently? (educational, fictional, analytical)
  • Are there edge cases where the boundary is ambiguous and the model defaults to compliance?

Step 4: Prioritization

PriorityCharacteristicsExamples
High (test first)Narrow execution gap (easy to reach unintended capabilities), refusals leak significant info, assumptions easily violated, inconsistent boundariesPhrasing-dependent restrictions, verbose refusals, single-turn assumptions
MediumPartial compliance possible, multi-turn accumulation needed, affordances exist but aren't obviousDocument assembly across turns, hidden tool access
Lower (if time allows)Wide execution gap (sophisticated techniques needed), subtle behavioral signals, unlikely edge casesAdvanced encoding bypasses, timing side channels

Outputs

  1. A map of execution gaps (what attackers can do that they shouldn't)
  2. A map of evaluation gaps (what information leaks through responses)
  3. Identified misalignments between design assumptions and reality
  4. Prioritized list of areas to probe

Template Options

No standard UX template maps directly to this exercise. Use the checklist format above, or adapt:


Filled Example

Target: A legal information chatbot for a law firm's website.

Execution Gap Findings

FindingDetail
Intended purposeProvide general legal information and schedule consultations
Base model capabilitiesCan generate specific legal advice, draft legal documents, analyze case law
GapThe model can draft legal documents if asked, even though it should only provide general information
Restriction methodSystem prompt instruction only (no input/output filtering)
BrittlenessHigh. Reframing "draft me a contract" as "what would an NDA typically include, with specific clause language?" produces document-quality output

Evaluation Gap Findings

FindingDetail
Refusal language"I can't provide specific legal advice. Please schedule a consultation with one of our attorneys." Reveals nothing useful.
Partial complianceWhen asked to "review" a contract (which is legal advice), the model provides detailed analysis with only a disclaimer footer. Reveals full capability.
Behavioral signalResponds in significantly more detail to questions mentioning specific jurisdictions, suggesting jurisdiction-specific training data.

Misalignment Findings

FindingDetail
Assumption violatedSafety assumes users will accept the "schedule a consultation" redirect. Attackers can ignore it and continue.
Framing bypassSystem assumes questions are informational. Framing as "hypothetical" bypasses intent detection.
Inconsistent boundary"Write me a contract" is refused, but "What clauses should be in a contract for [specific situation]?" produces equivalent output.

Priority

The phrasing-dependent boundary (execution gap) is highest priority. The model's capability to generate legal documents is only restricted by how the request is phrased, not by robust content filtering.