The Subtle Art of Herding Cats: Show, Don’t Tell: Teaching AI by Example (Part 2 of 4)

Quick Recap: The Problems We Discovered

In Part 1, I learned the hard way that giving AI 47 rules is like trying to get a cat to do well, much of anything. In Part 2, I’ll show you the approach that finally made it behave: gold standards and lazy loading.

Show, Don’t Tell

The Failed Approach: Death by Documentation

My first idea was to write complete rules. Hundreds of detailed instructions covering every possible scenario, edge case, and formatting requirement. The AI nodded politely and ignored most of it.

The talks were exhausting:

Me: „Why didn’t you follow the naming rules?“
Agent: „Which ones? There were several different patterns mentioned.“
Me: „The ones in section 4.2.1 about specification references!“
Agent: „I focused on the examples in section 6.3 instead.“

Sound familiar? I was trying to teach by explanation rather than showing examples.

The Breakthrough Moment

Instead of writing more rules, I tried something different. I created one perfect example of what I wanted:

It didn’t look exactly like this, this is just a tribute.

Gold Standard BDD Scenario:

Feature: Product Configuration [SPEC-123]

Scenario: Premium package selection shows correct pricing
  Given I am on the product configuration page
  When I select the premium package
  Then I should see the premium pricing displayed
  And the package should be marked as selected

Then I asked the agent: „Look at this gold standard. What rules do you need to reproduce this quality?“

The agent looked through the rules and Instead of 300 lines of stuff, the AI found 10 key principles:

Use clear, business-focused language
Follow Given-When-Then structure
Include specification references
Focus on user-observable outcomes

📌 Pattern-Led Prompting Discovery: AI agents learn better from perfect examples than from detailed explanations. Show the destination, let them find the path.

The Domain Separation Breakthrough

The gold standard looked simple, but there was hidden cleverness. Some elements needed to come from domain configuration, not be hardcoded in the pattern:

Universal Pattern:

Feature: Vehicle Package Configuration [SPEC-123]

Scenario: Premium package selection shows correct pricing
  Given I am on the [DOMAIN: configuration_page_url]
  When I select the [DOMAIN: premium_package_name]  
  Then I should see the [DOMAIN: pricing_display_format] updated
  And the package should show [DOMAIN: selection_indicator_state]

Domain Configuration:

{
  "configuration_page_url": "vehicle configuration page",
  "premium_package_name": "M Sport Package",
  "pricing_display_format": "total pricing", 
  "selection_indicator_state": "as selected"
}

For an online sock store, the same pattern works with different domain values:

Product categories: „athletic socks“
Size options: „Size 8-10“
Error messages: „Sorry, out of stock in your size“
UI elements: „add to basket button“

A BDD pattern is universal your domain makes it real.

These domain mappings improve the BDD’s accuracy and relevance to your specific context, making scenarios immediately useful rather than generic templates. Other domains will have completely different maps, allowing the AI to recognise universal patterns while adapting to your specific terminology.

Previously, this context was scattered throughout the rules – a BMW configurator conversation would gradually contaminate generic BDD patterns with „M Sport Package“ references. In the new approach, domain specifics live in dedicated files.

„Show, Don’t Tell“ in Practice

This discovery matched a principle I’d written in my framework: „Show, don’t tell.“

Instead of explaining what makes good BDD, I showed the agent perfect examples and let it find the patterns. The AI understands what BDD is, I don’t need to tell it that. What it does need to know is what I want from it.

But I learned there’s a big difference between guidelines (flexible suggestions) and rules (mandatory requirements). For things that absolutely had to happen, I needed very clear language.

Lazy Loading Context: The Architecture That Changed Everything

The Problem: Context Explosion

Using 25% of Amazon Q’s context window was like trying to have a conversation at a concert. Too many distractions, too much noise, too many competing priorities.

The AI was drowning in:

✓ Generic BDD patterns         ✓ Domain-specific mappings
✓ Assessment criteria          ✓ Implementation details  
✓ Quality gates               ✓ Error handling
✓ Naming conventions          ✓ Edge case handling
✓ Reporting structures        ✓ 38 more categories...

The Solution: Task-Based Context Loading

I realised the agent needed focused context per task, not everything at once. Here’s the architecture that worked:

Base Context (Always Available)

Task execution framework
Conversation logging patterns
State management rules

Dynamic Context (Loaded Per Task)

Task 1: Context Extraction

Load: Analysis rules + Relevant Domain context only
Goal: Extract requirements from Jira
Output: Structured conversation log
Blocked: BDD patterns, automation rules, technical details

Task 2: BDD Generation

Load: BDD patterns + Task 1 output only
Goal: Create human-readable scenarios
Output: Feature files for manual testing
Blocked: Automation assessment, technical implementation

Task 3a: Behavioural Assessment

Load: Assessment criteria + Task 2 output only
Goal: Determine automation suitability
Output: Automation assessment report
Blocked: Code generation patterns, implementation details

Task 3b: Test Automation Generation

Load: Technical patterns + approved scenarios only
Goal: Create executable automation code
Output: Test Automation-compatible feature files
Blocked: Analysis rules, BDD guidelines

Context Smartness Principle: Each task gets exactly the context it needs – no more, no less. No competing priorities, no overwhelming rule sets.

The Performance Impact

The difference was dramatic:

Before: 25% context usage, inconsistent results, mysterious failures
After: <5% context per task, reliable patterns, predictable behaviour

The AI went from confused and unreliable to focused and consistent. Each task could concentrate on its specific job without distraction.

Pseudocode Rules: When You Absolutely Must Be Obeyed

If there is one thing you take from this, it’s if you really need the AI to follow rules use pseudo code and not just words.

The Guidelines vs Rules Problem

Even with focused context, the AI would still treat critical requirements as optional suggestions. Natural language left too much room for creative interpretation.

Before (Ignored):
„Please assess each scenario carefully, considering automation suitability and technical feasibility.“

After (Followed Religiously):

FOR EACH scenario IN bdd_scenarios:
    IF scenario.type == "accessibility":
        EXCLUDE(scenario, reason="specialized_tools_required")
    ELIF scenario.complexity == "single_component": 
        EXCLUDE(scenario, reason="unit_test_territory")
    ELSE:
        ASSESS(scenario, gates=[0,1,2,3])

Mandatory Language That Works

For absolutely critical processes, I learned to use clear commanding language:

‚MANDATORY‘ – not „should“ or „please“
‚ZERO TOLERANCE‘ – not „try to avoid“
‚AUTOMATIC EXCLUSIONS‘ – not „generally not recommended“

The AI follows pseudocode and explicit commands while treating natural language as flexible guidance. But, each instance of commanding language is backed up with pseudo code.

The Conversation State Pattern

The breakthrough was having the agent create a structured conversation log in Task 1:

## Requirements Analysis
- REQ-001: User can select premium packages
- REQ-002: Pricing updates when packages change  
- REQ-003: Conflicts prevent invalid combinations

## Positive Test Scenarios
- Premium package selection
- Price calculation accuracy
- Package combination validation

## Negative Test Scenarios  
- Invalid package combinations
- Error handling for unavailable options

## Inferred Requirements (Agent Additions)
- Loading states during price calculation
- Confirmation dialogs for expensive options
- Network error handling

The „Made Up“ Requirements Solution: Embracing AI Creativity

The Unexpected Discovery

Here’s something that surprised me: the agent kept adding requirements that weren’t in the original spec. My first idea was to stop this behavior.

Instead, I asked it to share invented requirements in a separate section.

The Value of AI Inference

Sometimes these „made up“ requirements were brilliant:

„What happens during price calculation loading?“
„Should there be confirmation for expensive options?“
„How do we handle network errors?“

The agent was thinking like a tester, finding gaps in specifications. I learned to embrace this creativity rather than suppress it – but keep it clearly labeled so humans could check the suggestions.

Don’t Limit the AI (But Do Limit Its Authority)

Something I decided early on: even though the AI can update Jira, I don’t want it to. That’s taking away too much control and will make people lazy. I need humans to decide whether other humans should do a test or not.

Put it this way: if you make the AI limit tests to only those that are „important,“ and something goes wrong, it won’t be the AI that gets told off.

The AI’s role: Prioritise and recommend

Your role: Make the final calls

Yes, the AI puts a priority on its creations. They’re in an order, but you decide what actually gets done. The AI can be creative with requirements, suggest test scenarios, and even rate automation suitability, but humans retain control over the decisions that matter.

The AI won’t be the one getting told off

Quick Wins You Can Implement Today

1. Create Your Gold Standard

Find your best existing BDD scenario and clean it to perfection. This becomes your teaching example.

2. Extract Minimal Rules

Ask your AI: „What rules do you need to reproduce this quality?“ You’ll get 5-10 essential principles instead of 300 lines of documentation.

3. Separate Domain from Pattern

Identify what’s universal (user actions, observable results) vs domain-specific (product names, URLs, error messages). Put domain details in separate configuration.

4. Use Pseudocode for Critical Logic

Replace „Please assess carefully“ with explicit IF/THEN logic for anything that must happen without exception. You don’t have to write the code, just write bullet points and ask it to make the code

What’s Coming in Part 3

These solutions sound good in theory, but do they actually work in practice? In Part 3, I’ll show you:

Real before/after examples: Contaminated scenarios transformed into universal patterns
The framework in action: Complete workflow from Jira ticket to executable tests
Honest assessment: What actually works, ongoing challenges, and what it doesn’t fix
What Not to Automate: Smarter Test Filtering: How to decide what should be automated vs tested manually

The cats are starting to line up, but the real test is whether they stay in formation when facing real-world complexity.

Paul Coles is a software tester who discovered that AI agents respond better to examples than explanations. In Part 2, he reveals the specific techniques that transformed chaotic AI behavior into reliable, consistent output. His actual cat learned to use the litter tray but still ignores most other commands.

🐾 Series Navigation

Part 1: Why AI Starts Making Stuff Up

The cat has opinions — and your postcode formatting rules aren’t one of them.

[Read it →] https://dev.to/paul_coles_633f698b10fd6e/the-subtle-art-of-herding-cats-why-ai-agents-ignore-your-rules-part-1-of-4-5fhd
Part 2: Show, Don’t Tell: Teaching AI with Better Examples

Bribing the cat with gold standards and smaller piles of paper.

← You are here
Part 3: How I Made My AI Stop Guessing

Teaching the cat one trick at a time with task-focused training.

(Coming soon)
Part 4: The More You Say, the Less It Learns

When you talk too much, the cat stops listening — and invents new requirements instead.

(Coming soon)

Photo by Jon Tyson on Unsplash

Name	Typ	Größe	Geändert am	Zugriff
📁 AILInux-App	Ordner	-	27.07.2025 16:31	0755
📁 AILinux-ISO	Ordner	-	27.07.2025 10:23	0755
📁 Android-App	Ordner	-	27.07.2025 16:31	0755
📁 Distors	Ordner	-	07.07.2025 15:37	0755
📁 Wine Runtimes	Ordner	-	07.07.2025 15:37	0755