Test metrics are crucial for testers, agile teams and QA managers who want to take their testing one step forward. There is no better way to improve your testing than to select a relevant test metric, draw a baseline, and track your progress over the next weeks, months or years.
But which metrics should you choose? There are (literally) hundreds. In this page, and in additional articles within this test metrics guide, we’ll help you wrap your head around test metrics and understand:
Which broad types of test metrics are out there
What is the difference between test metrics and software quality metrics
Which metrics are typically used at different organizational levels (project, department, cross-organization)
Which metrics are used in waterfall organizations vs. agile development environments
Which metrics are used to measure manual vs. automatic testing
What are the test metrics that can help you transform into a CI/CD, DevOps-centric organization
In the rest of this page, we’ll provide brief definitions and lists of metrics across these important dimensions, and provide links to additional resources to help you dive into specific metrics.
Which Types of Test Metrics are there?
It’s useful to understand the general categories of test metrics before diving into endless lists of specific metrics.
Test coverage – Helps you understand which areas of the application are known to be tested. Under the assumption that tests are of good quality, this metric can uncover which parts of the software have a known level of defects vs. unknown. Examples of metrics in this group are Requirements Coverage Percentage, Test Cases by Requirement Category, Unit Test Coverage, Integration and API test coverage, UI Test Coverage, Manual or Exploratory Test Coverage, and more.
Test tracking and efficiency – Shows you how useful tests are in discovering relevant defects. Metrics include Percent of Passed/Failed Test Cases, Percent of Defects Accepted/Rejected, and Percent of Critical Defects of all Defects.
Test effort–Basic facts about your test effort can help establish baselines for future test planning. Metrics include Number of Tests Run, Defects per Test Hour, and Average Time to Test a Bug Fix.
Defect distribution – Helps you understand which part of your software or process is most susceptible to defects, and therefore where to focus testing effort. Metrics include number, percentage or severity of defects distributed by categories like severity, priority, module, platform, test type, testing team, and so on. Many teams measure defect distribution per build, or at the end of test cycles. Looking at the distribution over time, it’s possible to see if problematic categories are better, the same or worse.
Test execution – A basic measurement of testing activity which records how many tests were actually conducted and how many classified as passed, failed, blocked, incomplete, or unexecuted. A main benefit of test execution is that it is easy to visualize and understand by testing teams. Metrics include Test Execution Status, Test Run Results, and Test Results by Day.
Regression – Changes to software add features but usually introduce new defects, reduce application stability, and jeopardize quality. This type of metric helps understand how effective the change was in addressing user concerns, without hurting the existing user experience. Metrics include Defect Injection Rate and Defects per Build / Release / Version.
Test team metrics – This measures testing work allocation and test outputs, for teams or team members. Experts advise never using these metrics to pit individual testers against each other, but rather as a way of tracking progress and learning within units. Metrics include Distribution of Defects Discovered, Defects Returned Per Team Member, and Test Cases Allocated Per Team Members.
Test economics metrics – Testing outputs per staff, tools and infrastructure used in testing. These metrics can help plan budgets for testing activities and evaluate the ROI of testing. Metrics include Total Cost of Testing, Cost per Bug Fix, and Testing Budget Variance.
What’s the Difference Between Test Metrics and Software Quality Metrics
Test metrics ask the question “how good are the tests?” Software quality metrics ask the question “how good is the software?”
A few examples of software quality metrics are below – these do not assess the test metrics, they only asses the quality of the software.
Reliability – Refers to the level of risk inherent in a software product and the likelihood it will fail. This metric is related to “stability,” as termed by ISO: how likely are there to be regressions in the software when changes are made?
Performance – In the CISQ software quality model, this aspect is known as “Efficiency.” Typically, software performance depends on how its source code is written, its software architecture, the components within that architecture (databases, web servers, etc.) and its scalability options.
Security – Security (in the context of software quality) reflects how likely it is that attackers might breach the software, interrupt its activity or gain access to sensitive information, due to poor coding practices or architecture. A central concept is “vulnerabilities” – known issues that can result in a security issue or breach. The number and severity of vulnerabilities discovered in a system is an important indication of its level of security, and is often a reason to postpone the release and fix the vulnerabilities.
Maintainability and code quality – Software maintainability measures the ease with which software can be adapted to other purposes, how portable it is between environments, and whether it is transferable from one development team or from one product to another. Maintainability is closely related to code quality. If code is of high quality, the software is likely to be more easily maintainable.
Rate of delivery – In agile development environments, new iterations of software are delivered to users quickly. This is a measure of software quality in an agile mindset because the more frequently software is delivered, the more feedback is received from real users, and the more opportunities there are for quality to improve.
Which Metrics are Used at Different Organizational Levels?
At the software project level, development teams might track:
Requirements and requirement coverage – How many of the workflows users follow are actually covered by different types of tests?
Defect distribution – How many defects are being discovered in different parts of the software? Is there progress over time?
Defect open and close rate – How long does it take to discover a bug, and how fast are developers attending to defects discovered in the testing process?
Test execution trends – Which tests have been executed by a given member of the QA team or by an automated testing framework or server?
Burn Down Chart – This is a visualization of the amount of work that has to be completed by a development team. If testing is a separate activity, burn down charts can help management visualize how close they are to complete testing scope for a release. An agile team, dev and testing are a unified task – a story is not done before it is tested – so the burndown chart reflects both dev and testing activity.
At the department level, leaders of the organizational unit responsible for development and testing might track:
MTTD and MTTR – In aggregate, what is the organization’s Mean Time to Detect a defect, and what is the Mean Time to Recovery from defects that affect the software’s users?
Defect Removal Efficiency – How many defects are identified during the development cycle, and how many are actually fixed?
Testing and Defect Trends – Over the span of a few months to a few years, how is testing activity and the nature of defects evolving over time? Is software quality improving? Are we detecting and resolving a larger percentage of defects, and is the product’s maturity and stability improving or worsening?
At the company level, the CEO, CFO, CMO or COO might track the metrics below:
Issues reported or experienced by customers – How many and what type of quality problems affect the organization’s customers? Over time, this can provide a “bottom line” insight as to how software quality is affecting users.
Defect severity – How badly do defects affect users, across multiple projects or products? How much time was invested in finding and resolving these defects, and was it spent effectively? For example, an organization might discover that certain products or projects require more testing resources because they exhibit more severe defects that hurt customer satisfaction and revenue.
MTTR and MTTD – At the organizational level, how much time is needed to discover and respond to defects that affect customers?
System outages and downtime – How frequently, and for how long, do systems experience operational disruptions? These metrics become increasingly important as many software companies transition to a SaaS delivery model.
Cost of bug fixes pre/post release – How much effort was spent to fix issues that were discovered before the release of a software version vs. afterward? It is well known that the later bugs are discovered, the more costly they are to resolve. This metric can help an organization understand how much costlier it is to fix a bug in production, and by extension, the ROI of better testing to discover defects earlier.
For more details on metrics at different organizational levels, see this white paper from Zapier.
Metrics in Waterfall vs. Agile Environments
The waterfall model takes a non-iterative approach to development where each stage needs to be completed before the next stage begins.
In traditional waterfall environments, test metrics include:
Product quality – Once development nears the end of a waterfall project, there is a concerted effort to test and stabilize the software in order to achieve a level of quality that will enable delivery to users.
Test effectiveness – Are the tests of high value? How capable is the testing team in discovering relevant defects and helping the development team understand and resolve them?
Test status – How many tests are planned, how many have run, and how many remain?
Test resources – What resources are available for the product or project within the testing organization, and how well are they spent?
Sprint burndown – Helps teams visualize how much work is remaining in the current iteration, and by extension, how much testing remains to be done.
Number of working tested features / running tested features – The more features or agile “stories” are consistently added to the software and fully tested, the healthier the project.
Velocity – The speed at which the development team is completing new features. Faster progress is desirable but should be combined with monitoring of technical debt, to ensure teams aren’t racing to complete functionality while skipping best practices or leaving quality gaps.
Cumulative flow – Helps visualize bottlenecks in the agile process. In particular, helps teams visualize if testing resources are adequate and if testing is slowing down the development cycle.
Earned value analysis – A cost estimation method that can help determine the economic value of software testing as a whole, and at an individual task level, whether specific tests are cost-effective.
Percentage of Automated Test Coverage – Measures the percentage of test coverage achieved by automated testing out of the total of manual and automated tests. This indicates the maturity of the agile organization.
Code Complexity & Static Code Analysis – Use cyclomatic complexity or other forms of automated analysis to gauge code quality.
Defects Found in Production / Escaped Defects – Counts the defects for a given release that were found after the release date. A “bottom line” metric showing the quality of the software delivered to end users.
Defect Categories – The number of defects by groups; such as functionality errors, communication errors, security bugs, performance defects. his can help agile teams determine the Pareto 20% of defects that cause 80% of issues for end users.
Defect Cycle Time – Measures how much time elapses between starting work on fixing a bug and fully resolving that bug. An agile control chart can help visually represent the speed of resolving tasks within the agile cycle.
Defect Spill-Over – Measures the number of defects that don’t get resolved in a given sprint or iteration, and how many defects spill over from one sprint, to be resolved in the next.
Which Test Metrics are Used for Manual vs. Automated Testing?
Test automation has swept the world by storm, and it is widely thought that agile development would be impossible without extensive automated testing. However, there is still room for manual testing. It is quite common, even for agile teams, to subject critical acceptance tests, complex or sensitive user stories, to manual testing and analysis.
Traditionally, different metrics have been used to test manual and automated software testing.
Manual Test Metrics
Test case execution– How many test cases have been run on a specific software version by the testing team, or an individual tester?
Test case preparation– How many test case scripts have been designed to cover software functionality?
Defect metrics – A variety of metrics on the number and nature of defects found by annual testers, including:
Defects by priority
Defects by severity
Defect slippage ratio – The percentage of defects that manual testers did not manage to identify before the software was shipped.
Test Automation Metrics
Total test duration – How long it takes to run the automated tests. This is significant because tests are commonly a bottleneck in the agile development cycle.
Unit test coverage – Measures how much of the software code is covered by unit tests. This metric gives a rough approximation for how extensively tested a software codebase is.
Path Coverage – A measurement of the linearly independent paths covered by the tests. Path coverage requires very thorough testing; every statement in the program executes at least once with full path coverage.
Requirements coverage – Shows what features are tested, and how many tests are aligned with a user story or requirement. A very important measure of the maturity of test automation, because it tracks how many of the features delivered to customers are covered by automation.
% of tests passed or failed – Counts the number of tests that have recently passed or failed, as a percentage of total tests planned to run. This metric provides an overview of testing progress, which is easy to visualize and compare between builds, releases, or time periods.
# of defects found in testing – A measure of the number of valid defects encountered during the test execution phase. Captures “how bad” a software release is compared to previous releases. Also useful for predictive modeling.
% automated test coverage of total coverage – This metric reports on the percentage of test coverage achieved by automated testing, as compared to manual testing. Helps quantify the progress of test automation initiatives.
Test execution – Total tests executed as part of a build. A crucial statistic to understand if automated tests ran as expected and aggregate their results.
Useful vs. irrelevant results – Compares useful results from automated tests against irrelevant results, which might be caused by changes to the software which break the tests, problems with the test environment, etc.
Defects in production – Many agile teams use this as the “bottom line” of automated testing efficiency: how many serious problems were found in production after the software was released.
Percentage of broken builds – Measures how many builds were broken because automated tests failed, and by extension, the quality of code committed by engineers to the shared codebase.
The number of flaky tests – A “flaky” test is a test that might exhibit both a passing and a failing result with the same code and the same configuration. Flaky tests can be harmful to developers because failures do not always indicate bugs in the code, and looking for the root cause is a waste of time.
There are many test metrics out there. Choosing the right metrics, following them and doing what it takes to improve them, is key to a successful software testing operation. We provide brief overviews of test metrics across several dimensions – types of tests, organizational usage, waterfall vs. agile and manual vs. automated testing.
We believe today’s development and QA teams need a single source of truth, which can quantify the most important aspects of both test quality and software quality. In our experience, having a single, holistic dashboard readily available can make a huge difference to the learning process and actual performance of software testing activity.
At SeaLights, we developed a platform that helps agile teams holistically measure their software quality. Instead of focusing on isolated metrics, we gather data from all testing systems, both automated and manual, and combine it to show a single unified measure of test coverage and the software quality. This shows teams the current level of risk in their software projects and shows them the easiest paths to decrease that risk.