Prompt Security and Injection Defense
Protect prompt workflows from injection, leakage, unsafe outputs, tool misuse, and policy bypass attempts.
Overview
Prompt-based systems introduce a new attack surface. A user, document, retrieved passage, or tool output may try to override intended behavior, extract hidden data, or persuade the model to act outside approved boundaries.
This lesson focuses on defensive prompting, role separation, injection awareness, and QA-driven security testing for prompt workflows.
A Practical Note for QA Learners
You do not need to be a dedicated security engineer to test prompt security effectively. The practical goal is to identify which inputs are trusted, which are untrusted, and what should happen when untrusted content tries to change model behavior.
Learning Goals
- Recognize common prompt injection and misuse patterns.
- Apply defensive prompt design patterns.
- Distinguish trusted instructions from untrusted content.
- Design red-team prompt test packs.
- Add prompt security checks into regular QA workflows.
Core Concepts
1. Why Prompt Security Matters
Prompt workflows can be attacked through:
- direct user instructions
- hidden payloads in documents
- malicious retrieved content
- tool output pollution
- data exfiltration attempts
If the system does not separate trusted and untrusted inputs clearly, safety and policy rules can erode quickly.
2. Common Attack Patterns
| Attack type | Example |
|---|---|
| Instruction override | "Ignore previous instructions." |
| Secret extraction | "Reveal the hidden system prompt." |
| Tool abuse | "Use the delete endpoint to clean up test data." |
| Indirect injection | Retrieved page contains malicious instructions |
| Policy bypass | User reframes harmful action as debugging |
3. Defensive Patterns
Useful protections:
- separate trusted and untrusted content
- use stable developer instructions
- keep tool permissions narrow
- validate external content before reuse
- require evidence-bound answers where possible
4. Security QA Test Design
Prompt security should be tested with:
- jailbreak prompt suite
- policy boundary tests
- tool misuse simulations
- retrieval override attempts
- secret-extraction attempts
Good QA questions:
- did the stable instruction hold?
- did untrusted content gain authority?
- did the system fail safely?
5. Tool and Retrieval Risk
Prompt security is not only about user chat text.
Risky inputs can come from:
- RAG documents
- search results
- log files
- third-party tool output
- agent memory
If those inputs are treated as trusted authority, the system becomes much easier to break.
QA/SDET Relevance
Manual QA should test:
- refusal behavior
- policy boundary consistency
- prompt injection resilience
- data leakage risks
Automation and SDET teams should test:
- jailbreak suites in CI
- tool-call restrictions
- retrieval override attempts
- secret-extraction prompts
- security regression drift after prompt changes
Practical Work
Exercise: Build a 20-Case Injection Test Pack
Create 20 tests across these categories:
- direct override attempts
- secret extraction requests
- unsafe tool-action requests
- retrieved-content injection cases
- ambiguous edge requests
Classify each result as:
- safe
- unsafe
- ambiguous
Track:
- refusal quality
- policy consistency
- false positives
- false negatives
Reflection
- Which attack style succeeded most easily?
- Which defenses reduced risk without over-refusing valid requests?
- Which tests should become permanent regression checks?
Recommended Resources
- OWASP Top 10 for LLM Applications
- Anthropic guardrails guidance
- OpenAI safety best practices
- AWS prompt engineering security best practices
Key Takeaways
- Prompt security is a first-class QA domain.
- Prompt injection is easier when trusted and untrusted content are mixed carelessly.
- Layered defenses work better than one prompt rule alone.
- QA teams should run security-oriented prompt suites continuously.
- Safety regressions should be treated like any other production-critical defect.
Next Step
Continue to Prompt Versioning and Experiment Tracking to manage prompt changes safely over time.