Test Execution
Failure Triage
Automatic cluster + label of every failure: real regression, new failure, flake, environment, or auto-healed — with concrete fix suggestions.
Failure Triage
When a test run finishes with failures, AegisRunner runs an automatic triage pass. It groups failures by root cause, labels each one, suggests a fix, and surfaces a Triage tab on the run page. You don't trigger it — it just appears whenever a run goes red.
What you see
The Triage tab on a failed run shows:
- Summary — one paragraph: "7 of 12 failures are real regressions concentrated in the checkout flow. 3 are flaky-pattern. 2 look environmental."
- Clusters — failures grouped by shared root cause, ranked by severity.
- Per-failure labels — every failed test gets one of the labels below, plus a one-paragraph explanation and a recommended fix.
Labels
| Label | What it means | Where it shows up |
|---|---|---|
| Real regression | This test passed before. Something in your app broke between then and now. The most action-required label. | Top of the Triage tab. |
| New failure | This test has never passed. Often the test was just generated or just edited and needs a tweak before it works. | Below regressions. |
| Likely flake | Has a history of intermittent failures. Triage will say "passed 9 of last 10 runs". Lower priority. | Below new failures. |
| Environment | The site itself looks broken — login redirect loop, 5xx on every step, DNS failure, bot-block challenge. Probably not your test. | Their own section. Only one or two are usually enough to pull the whole run into this category. |
| Auto-healed | Locator drifted, AegisRunner found the new locator, the test passed. Listed for transparency only — no action needed. | Bottom, collapsed by default. |
Clusters
Triage groups related failures so you can fix one thing and clear ten tests. Each cluster shows:
- Cluster name — e.g. "Cart page elements not found".
- Severity — high, medium, low. High = real regression with several failures behind it; low = isolated flake.
- Root cause — one sentence on what's broken. "Cart starts empty in this environment, so cart-related tests find no items."
- Recommended fix — concrete next step. "Add a prerequisite step that adds an item to the cart, or seed test data with non-empty carts."
- Affected tests — list of test cases in the cluster. Click to filter the Results tab.
How triage decides labels
Triage uses two sources:
- Run history — how often this exact test passed before, on the same browser, in the same environment. Drives flake detection and the "real regression vs new failure" split.
- Failure data — error message, step number, screenshot, DOM snapshot, console output. Drives root-cause clustering.
An LLM does the clustering and writes the recommendations. The labels themselves come from deterministic logic on top of run history (the LLM doesn't get to choose "this is a real regression" — that's measured).
Fix suggestions
For each failed test, the Triage tab can produce a concrete fix suggestion:
- Updated locators — when an element wasn't found, the suggestion includes the new locator (auto-heal would have applied this if heal could match confidently; the suggestion is for cases where heal couldn't).
- Missing prerequisite step — when a test assumes prior state, the suggestion adds the setup.
- Wait/timeout tweaks — when failures look timing-related.
- Test data fix — when an assertion is checking for stale data.
Click Apply suggestion to write the change into the suite. The next run uses the updated test.
Where it doesn't run
- Passed runs — the Triage tab only appears when there are failures.
- Bot-blocked runs — when the entire run is short-circuited by a bot-block error, the run is marked Environment without further AI analysis. Spending cluster compute on a captcha page would be wasted.
- Plan limits — Free plan gets summary-only triage. Starter and above get full clustering, fix suggestions, and history-based labels.
Run-history retention
Triage's "real regression vs new failure" judgment depends on how far back run history goes. That's bounded by your plan's data retention:
- Free — 7 days. Most tests look like "new failures" simply because they have no history.
- Starter — 30 days. Real regressions become reliable.
- Pro — 90 days.
- Business — 1 year.
- Enterprise — unlimited.
Acting on a triage report
Recommended order:
- Open Environment failures first. If your staging is down, no other label matters.
- Open the Real regression clusters. These are real bugs in your app.
- Open New failure clusters next — usually one fix to a flaky locator clears the cluster.
- Glance at Likely flake if pass rate is suspicious. Quarantine persistent flakes from the test detail page.
- Skip Auto-healed unless you keep seeing healed badges on the same test — that signals app instability worth refactoring.
Related
- Debugging Failed Tests — the full investigation flow.
- Running Tests — the run page and tabs.
- Baseline Replays — deterministic CI to reduce flake.
- Notifications & Alerts — get triage reports delivered to Slack/email.