Test Results Best Practices

Understanding test results helps you distinguish real issues from noise and make informed decisions about your release quality.

Interpreting results

Each test execution ends with one of several statuses. Understanding what each one means helps you prioritize your response:

Passed

The test completed and all checks matched their expected outcomes. No action is needed. If a test consistently passes across multiple runs, that is a good signal that the feature it covers is stable.

Failed

The test completed but one or more checks did not match what was expected. This is the most common outcome that requires investigation. A failure could indicate a real bug in the application, or it could mean the test expectations need updating because the application behavior has intentionally changed.

Error

The execution hit a technical problem that prevented it from completing. Errors are usually not caused by bugs in the application. Common causes include environment issues (the application was not reachable), misconfigured environment variables, or timeouts due to slow page loads. Look at the execution logs to identify the root cause.

Skipped

The test was not executed during this run. Tests can be skipped when a precondition is not met or when they are intentionally excluded. Check the test's preconditions if you expected it to run.

Investigating failures

When a test fails, resist the urge to immediately assume it is a bug. Take a systematic approach:

Check the environment first

If multiple tests failed in the same run, the problem is likely environmental. Verify that the application is running and accessible, that environment variables are configured correctly, and that the test environment reflects the expected state.

Review the execution details

Expand the failed test execution to see the step-by-step timeline. The logs show exactly what the AI did, what it observed, and where things diverged from expectations. Screenshots captured during execution can be especially helpful for understanding what the AI saw at the point of failure.

Distinguish bugs from test issues

Not every failure is a bug in the application. Sometimes the test itself needs updating:

If the application behavior changed intentionally, the test expectations need to be updated to match.
If the test steps reference UI elements that no longer exist or have moved, the test needs to be regenerated or edited.
If the failure is intermittent (passes sometimes, fails other times), it may be a timing issue in the test or an instability in the test environment.

Take action based on what you find

If the failure reveals a real bug, report it through your team's normal process (qtrl integrates with Jira if your team uses it). If the test needs updating, edit it directly or re-run a generation task to produce updated tests. If it was an environment issue, fix the configuration and re-run.

Using reports

PDF reports generated from test runs serve several purposes:

Stakeholder communication: share reports with product managers, engineering leads, or other stakeholders who need to understand test coverage and results but do not work in qtrl day to day.
Release documentation: attach reports to release notes or deployment records as evidence that testing was performed.
Compliance: in regulated environments, reports provide an auditable record of what was tested, when, and what the outcomes were.

Continuous improvement

Test results over time tell a story about the quality and stability of your application. Pay attention to patterns:

Tests that fail repeatedly may be pointing to persistent quality issues in a specific area of the application. Consider whether that area needs more development attention.
Tests that flip between passing and failing are often called "flaky" tests. They may be sensitive to timing, data state, or environment conditions. These tests should be investigated and either stabilized or rewritten.
Consistently passing test suites after changes to the application are a strong signal that your test coverage is working. If everything always passes, though, consider whether your tests are comprehensive enough to catch real issues.

The goal: when a test fails, it points you to a real problem. When tests pass, you can trust that the tested functionality actually works.