Training
Certification Leadership Frameworks Agentic for Business
Community Keynotes Retreat Blog Book A Consultation
Protocol 17 of 18 · Track, Continuity · How it lasts

AI testing vs human testing

Where AI can self-test (correctness, regressions, completeness) and where humans still must (UX, taste, judgment calls). Knowing which is which saves hours.

Protocol 16 User roles on an Infinite Leverage team All 18 Protocol 18 Working with human tokens after you leave
Why this matters

The pain it
solves

AI will test the code is one of the most over-claimed promises in the current AI hype cycle. The truth is more useful and less exciting: AI can test some things very well, cannot test other things at all, and lying to yourself about the difference will eventually ship a bug your customers find before you do.

This protocol gives you a clean line between what to automate and what to keep human. The line is not philosophical. It is operational and you will use it every day.

The teaching

What this
actually is

The honest line on what AI can test

"AI will test the code" is one of the most over-claimed promises in the current AI hype cycle. The truth is more useful and less exciting: AI can test some things very well, cannot test other things at all, and lying to yourself about the difference will eventually ship a bug your customers find before you do.

This protocol gives you a clean line between what to automate and what to keep human. The line is operational. You will use it every day.

Three categories of test

Most testing thinking confuses these three. Once you separate them, every test you ever write fits cleanly into one bucket.

  • Correctness, AI is excellent
    Deterministic checks. Does the function return the right value? Does the API respond with the right shape? Did the migration apply cleanly? AI writes these, runs them, and catches regressions. Unit tests, integration tests, type checks. Automate everything in this category.
  • UX, taste, and judgment, humans only
    Does the page feel right? Is the copy in the right voice? Is the loading state confusing? These depend on the goal of the experience and the taste of the team. AI can spot violations of a design system you defined; it cannot tell you the system itself is wrong.
  • End-to-end flow, collaborative
    Can a real user complete the task? AI assists (simulates clicks, checks pages load, captures screenshots). Human delivers the verdict. Tools like Playwright drive the agent; the human asks does this feel like the journey we wanted.

A rule of thumb that holds 95% of the time

If the test has a deterministic right answer, an agent runs it. If it has a well-it-depends answer, a human runs it. Memorise this. It saves the next two hundred meta-arguments about whether to automate.

The cadence

When to run which kind. Doing this consistently is more useful than any single sophisticated test.

  • AI on every commit
    The QA agent runs the correctness suite on every push. If anything goes red, the build does not deploy.
  • Human on every epic close
    Before an epic moves to Done in project-status.html, a human runs the e2e flow and signs off. Five minutes per epic. Catches the taste and judgment things AI missed.
  • Full regression before each launch
    A planned 30-minute window before each big launch where the QA agent runs everything and a human walks the entire flow. Both kinds. Same hour. Same room.
Try it yourself 30 minutes

Compare AI and human testing on one feature in 30 minutes

Pick a feature you already shipped (lead capture from Protocol 07 works well). You will write an AI test suite, run a human pass, and list three things AI missed.

  1. Step 01
    Have the QA agent write a test suite

    In Claude Code: "You are the QA agent. Write Playwright tests for the lead capture form: it renders, accepts valid input, rejects invalid email, inserts a row to Supabase on submit, shows the thank-you message." Let it write 5 to 10 tests.

  2. Step 02
    Run the AI suite

    npx playwright test. Watch the tests run. Fix anything red. Once it all passes, save the run as the baseline.

  3. Step 03
    Do a human pass

    Open your live site. Submit a form as if you were a real lead. Pay attention. Is the focus right? Is the spacing weird on mobile? Is the thank-you copy in your voice? Is the loading state confusing?

  4. Step 04
    List three things AI missed

    Write them down. "AI did not flag that the submit button lacks a loading state." "AI did not flag that the thank-you copy is generic." "AI did not flag that on mobile the form overflows."

  5. Step 05
    Decide which ones become tests

    Some belong in the AI suite (the mobile overflow can become an automated visual regression). Some stay human-only (voice). Note which is which in your QA plan.

Outcome

A working test suite that runs on every commit, a fresh sense of what AI catches and misses, and a written cadence: AI on commit, human on epic close, both on launch.

Official resources

Straight from
the source

What you walk out with

By the end of this
protocol

At the retreat

You learn it by
doing it

You watch the QA agent generate and run a test suite for your contact form in five minutes. Then you do the human pass and find three things the AI suite did not catch.

Connects to

Other protocols this
compounds with

← Previous, Protocol 16 Next, Protocol 18 →