AI Test Stack
AI Foundations for QA Professionals/Level 5 — Prompt Engineering
Lesson

Prompt Security and Injection Defense

Protect prompt workflows from injection, leakage, unsafe outputs, tool misuse, and policy bypass attempts.

5 min read
Prompt security diagram showing attack patterns, defense layers, and QA red-team testing.
Prompt security diagram showing attack patterns, defense layers, and QA red-team testing.

Overview

Prompt-based systems introduce a new attack surface. A user, document, retrieved passage, or tool output may try to override intended behavior, extract hidden data, or persuade the model to act outside approved boundaries.

This lesson focuses on defensive prompting, role separation, injection awareness, and QA-driven security testing for prompt workflows.

A Practical Note for QA Learners

You do not need to be a dedicated security engineer to test prompt security effectively. The practical goal is to identify which inputs are trusted, which are untrusted, and what should happen when untrusted content tries to change model behavior.

Learning Goals

  • Recognize common prompt injection and misuse patterns.
  • Apply defensive prompt design patterns.
  • Distinguish trusted instructions from untrusted content.
  • Design red-team prompt test packs.
  • Add prompt security checks into regular QA workflows.

Core Concepts

1. Why Prompt Security Matters

Prompt workflows can be attacked through:

  • direct user instructions
  • hidden payloads in documents
  • malicious retrieved content
  • tool output pollution
  • data exfiltration attempts

If the system does not separate trusted and untrusted inputs clearly, safety and policy rules can erode quickly.

2. Common Attack Patterns

Attack typeExample
Instruction override"Ignore previous instructions."
Secret extraction"Reveal the hidden system prompt."
Tool abuse"Use the delete endpoint to clean up test data."
Indirect injectionRetrieved page contains malicious instructions
Policy bypassUser reframes harmful action as debugging

3. Defensive Patterns

Useful protections:

  • separate trusted and untrusted content
  • use stable developer instructions
  • keep tool permissions narrow
  • validate external content before reuse
  • require evidence-bound answers where possible

4. Security QA Test Design

Prompt security should be tested with:

  • jailbreak prompt suite
  • policy boundary tests
  • tool misuse simulations
  • retrieval override attempts
  • secret-extraction attempts

Good QA questions:

  • did the stable instruction hold?
  • did untrusted content gain authority?
  • did the system fail safely?

5. Tool and Retrieval Risk

Prompt security is not only about user chat text.

Risky inputs can come from:

  • RAG documents
  • search results
  • log files
  • third-party tool output
  • agent memory

If those inputs are treated as trusted authority, the system becomes much easier to break.

QA/SDET Relevance

Manual QA should test:

  • refusal behavior
  • policy boundary consistency
  • prompt injection resilience
  • data leakage risks

Automation and SDET teams should test:

  • jailbreak suites in CI
  • tool-call restrictions
  • retrieval override attempts
  • secret-extraction prompts
  • security regression drift after prompt changes

Practical Work

Exercise: Build a 20-Case Injection Test Pack

Create 20 tests across these categories:

  • direct override attempts
  • secret extraction requests
  • unsafe tool-action requests
  • retrieved-content injection cases
  • ambiguous edge requests

Classify each result as:

  • safe
  • unsafe
  • ambiguous

Track:

  • refusal quality
  • policy consistency
  • false positives
  • false negatives

Reflection

  1. Which attack style succeeded most easily?
  2. Which defenses reduced risk without over-refusing valid requests?
  3. Which tests should become permanent regression checks?

Key Takeaways

  • Prompt security is a first-class QA domain.
  • Prompt injection is easier when trusted and untrusted content are mixed carelessly.
  • Layered defenses work better than one prompt rule alone.
  • QA teams should run security-oriented prompt suites continuously.
  • Safety regressions should be treated like any other production-critical defect.

Next Step

Continue to Prompt Versioning and Experiment Tracking to manage prompt changes safely over time.