Lesson

Prompt Security and Injection Defense

Protect prompt workflows from injection, leakage, unsafe outputs, tool misuse, and policy bypass attempts.

5 min read

Prompt security diagram showing attack patterns, defense layers, and QA red-team testing.

Overview

Prompt-based systems introduce a new attack surface. A user, document, retrieved passage, or tool output may try to override intended behavior, extract hidden data, or persuade the model to act outside approved boundaries.

This lesson focuses on defensive prompting, role separation, injection awareness, and QA-driven security testing for prompt workflows.

A Practical Note for QA Learners

You do not need to be a dedicated security engineer to test prompt security effectively. The practical goal is to identify which inputs are trusted, which are untrusted, and what should happen when untrusted content tries to change model behavior.

Learning Goals

Recognize common prompt injection and misuse patterns.
Apply defensive prompt design patterns.
Distinguish trusted instructions from untrusted content.
Design red-team prompt test packs.
Add prompt security checks into regular QA workflows.

Core Concepts

1. Why Prompt Security Matters

Prompt workflows can be attacked through:

direct user instructions
hidden payloads in documents
malicious retrieved content
tool output pollution
data exfiltration attempts

If the system does not separate trusted and untrusted inputs clearly, safety and policy rules can erode quickly.

2. Common Attack Patterns

Attack type	Example
Instruction override	"Ignore previous instructions."
Secret extraction	"Reveal the hidden system prompt."
Tool abuse	"Use the delete endpoint to clean up test data."
Indirect injection	Retrieved page contains malicious instructions
Policy bypass	User reframes harmful action as debugging

3. Defensive Patterns

Useful protections:

separate trusted and untrusted content
use stable developer instructions
keep tool permissions narrow
validate external content before reuse
require evidence-bound answers where possible

4. Security QA Test Design

Prompt security should be tested with:

jailbreak prompt suite
policy boundary tests
tool misuse simulations
retrieval override attempts
secret-extraction attempts

Good QA questions:

did the stable instruction hold?
did untrusted content gain authority?
did the system fail safely?

5. Tool and Retrieval Risk

Prompt security is not only about user chat text.

Risky inputs can come from:

RAG documents
search results
log files
third-party tool output
agent memory

If those inputs are treated as trusted authority, the system becomes much easier to break.

QA/SDET Relevance

Manual QA should test:

refusal behavior
policy boundary consistency
prompt injection resilience
data leakage risks

Automation and SDET teams should test:

jailbreak suites in CI
tool-call restrictions
retrieval override attempts
secret-extraction prompts
security regression drift after prompt changes

Practical Work

Exercise: Build a 20-Case Injection Test Pack

Create 20 tests across these categories:

direct override attempts
secret extraction requests
unsafe tool-action requests
retrieved-content injection cases
ambiguous edge requests

Classify each result as:

safe
unsafe
ambiguous

Track:

refusal quality
policy consistency
false positives
false negatives

Reflection

Which attack style succeeded most easily?
Which defenses reduced risk without over-refusing valid requests?
Which tests should become permanent regression checks?

Recommended Resources

Key Takeaways

Prompt security is a first-class QA domain.
Prompt injection is easier when trusted and untrusted content are mixed carelessly.
Layered defenses work better than one prompt rule alone.
QA teams should run security-oriented prompt suites continuously.
Safety regressions should be treated like any other production-critical defect.

Next Step

Continue to Prompt Versioning and Experiment Tracking to manage prompt changes safely over time.