AI Test Stack
AI Foundations for QA Professionals/Level 4B — AI Terminology & Concept Revision
Lesson

AI Terminology in Real QA Workflows

Apply AI jargon correctly in test design, defect analysis, evaluations, and release decisions.

6 min read
A QA workflow diagram with checkpoints labeled by AI terms such as prompt, token limit, hallucination check, grounding check, and guardrails.
A QA workflow diagram with checkpoints labeled by AI terms such as prompt, token limit, hallucination check, grounding check, and guardrails.

Overview

Knowing terminology is useful, but applying it correctly in workflow is what prevents production incidents. This lesson maps AI jargon to concrete QA actions so your team can use the same language in planning, testing, and release gating.

After this lesson, terms like token budget, context window, grounding, and hallucination will no longer be abstract. They will become specific checks in your daily quality process.

A Practical Note for QA Learners

This lesson is where terminology stops being a glossary and starts becoming a working QA tool. You do not need to remember every definition perfectly. What matters is being able to choose the right term when you write a requirement, design a test, raise a defect, or argue for a release check.

If you prefer a simpler focus, concentrate on these three ideas:

  • vague AI language leads to vague testing
  • precise terminology makes risks measurable
  • good QA workflows turn jargon into concrete checks

Learning Goals

  • Map AI terminology to practical QA lifecycle stages.
  • Write clearer defects and risks using precise AI language.
  • Design better test suites by selecting the right term-driven checks.
  • Build a release checklist that reflects AI-specific failure patterns.
  • Prepare for Prompt Engineering with a strong operational vocabulary.

Core Concepts

1. Requirement Stage: Vocabulary Drives Scope

If requirements say "AI summary should be accurate," that is too vague.

Better requirement language:

  • Grounding: summary must rely only on provided evidence.
  • Format reliability: output must match schema.
  • Robustness: paraphrased input should preserve key meaning.
  • Latency target: response under agreed threshold.

Terminology here prevents ambiguous acceptance criteria.

2. Prompt Design Stage: Terms Become Constraints

Key terms and how they apply:

TermPrompt-design usage
ContextInclude only relevant facts; avoid noise
Token budgetKeep instructions concise to preserve room for retrieved evidence
RoleSeparate developer constraints from user request
Output formatAsk for schema/table/JSON explicitly
GuardrailsState forbidden behavior clearly

Example prompt frame:

text
5 lines
1Role: You are a QA assistant.
2Task: Generate boundary-focused API test cases.
3Context: Include auth rules and retry policy.
4Constraints: No invented endpoints. Use only provided fields.
5Output: JSON array with id, scenario, type, expected_result.

3. Test Design Stage: Build Term-Aligned Coverage

A useful AI test matrix should include:

Test typeLinked terminology
Long-input truncation testtoken budget, context window
Fact-consistency testgrounding, hallucination
Prompt variation testrobustness, stability
Safety testguardrails, alignment
Retrieval quality testembedding, vector search, RAG

This helps teams avoid generic "AI test" buckets and create measurable checks.

4. Defect Triage Stage: Use Exact Language

Weak defect title:

  • "AI gave wrong answer"

Strong defect title:

  • "Hallucination: model generated nonexistent API field outside provided schema"

Weak root-cause note:

  • "Model confused"

Strong root-cause note:

  • "Context window exceeded; relevant acceptance criteria truncated before inference"

Precise terms accelerate triage and make fixes testable.

5. Evaluation Stage: Multi-Dimensional Quality

AI output quality is not one score. A useful rubric tracks several dimensions:

DimensionTerm linkage
Correctnessgrounding, hallucination
Completenesscontext coverage
Consistencyrobustness
Safetyalignment, guardrails
Efficiencylatency, token usage

6. Release Gate Stage: Terminology-Based Checklist

Before release, ask:

  • Do we have evidence for hallucination rate on our core tasks?
  • Do we test context-limit behavior with realistic payloads?
  • Are safety refusal and over-refusal both measured?
  • Are RAG citations validated against source documents?
  • Do prompts and templates have version tracking?

These are vocabulary-backed controls, not informal opinions.

7. Agent Terms: Operational Caution for Now

You may hear "let's make it agentic" before your team is ready.

For now, apply caution language:

  • We currently ship an assistant workflow, not a full autonomous agent.
  • Tool-calling behavior needs separate test coverage.
  • Memory persistence requires privacy and data-retention checks.

Deep agent design will come in later levels.

QA/SDET Relevance

Manual QA impact:

  • better exploratory prompts
  • clearer evidence-based defects
  • improved risk communication with product and leadership

Automation/SDET impact:

  • cleaner test taxonomy
  • easier CI integration for AI checks
  • better observability metrics tied to known failure modes

Practical Work

Exercise: Convert a Generic QA Plan into an AI-Specific Plan

Objective: Upgrade one existing QA plan using precise AI terminology.

  1. Take a current feature that uses AI output.
  2. Identify vague terms: smart, accurate, stable, safe.
  3. Rewrite them using precise terms from this module.
  4. Add at least 8 term-linked tests.
  5. Define pass/fail thresholds for release.

Template:

Old statementRevised statement
AI should be accurateGrounded summary must contain only facts present in source ticket and log excerpt
Output should be stableAcross 5 paraphrases, key entity extraction F1 must remain above agreed threshold
AI should be safePrompt-injection attempts must not expose hidden system instructions

Reflection:

  1. Which revised term changed your test strategy the most?
  2. Which terms are now mandatory in defect reports?
  3. What should be automated first before Prompt Engineering level starts?

Key Takeaways

  • Terminology becomes valuable only when it changes workflow behavior.
  • Better terms produce better requirements, tests, and triage outcomes.
  • AI quality requires multi-dimensional evaluation, not single-score thinking.
  • Precise vocabulary reduces confusion and release risk.
  • You are now ready to enter Prompt Engineering with a shared language baseline.

Next Step

Proceed to Level 5, Prompt Engineering Fundamentals, where you will design prompts systematically using the terminology and workflow controls from this module. If that next level is still being authored, pause here and make sure your team can already use these terms correctly in requirements, tests, and defect reports.