Tenderlove Making

Predicting Test Failures

Running tests is the worst. Seriously. It takes forever, and by the time they’re all done running, I forgot what I was doing. Some apps take 20 to 30 min to run all the tests, and I just can’t wait that long. What bothers me even more than waiting so long, is that after I make a change to the code, 99% of the tests aren’t even running the code that I changed! Why am I spending time running code that is unrelated to the change I made?

On the Rails team, we keep talking about ways to generally make Rails app test suites faster by introducing things like parallel testing. But what if we could predict which tests are likely to fail after we’ve changed the code? If we run just those tests, then we could send a pull request and let Travis run the thousands of unrelated tests in the whole suite.

An excellent way to cut your test time in half is to only run half the tests!

Regression Test Selection

Picking the tests that may fail depending on the code you’ve changed is called Regression Test Selection. Today we’re going to build a regression test selector for Minitest and RSpec. The goal of this program is to answer the question:

If I modify line A of file B, what tests should I run?

In order to build this, we need to:

  • Collect code coverage info on a per test basis
  • Figure out what code has changed
  • Map changed code back to existing tests

Lets start with collecting per test coverage.

Collecting per-test coverage

At a high level, in order to collect per-test coverage, what we’ll do is:

  1. Record a snapshot of the current coverage info
  2. Run the test
  3. Record a snapshot of the current coverage info

In order to do this, you’ll need to be running trunk Ruby which has this patch. Our code will look somewhat like this:

require 'coverage'

Coverage.start # start the coverage tool

before = Coverage.peek_result # grab the current coverage
after = Coverage.peek_result # grab the current coverage

save_the_test_info_with_coverage(before, after)

For each test framework we need to figure out how to wrap each test case with the code to collect coverage information, and record that coverage information along with enough information to rerun just that one test.

First we’ll implement this using Minitest, then implement with RSpec.

Here is our code for Minitest:

require 'coverage'
require 'json'


require 'minitest'

class Minitest::Runnable
  LOGS = []

  Minitest.after_run {
    File.open('run_log.json', 'w') { |f| f.write JSON.dump LOGS }

  class << self
    alias :old_run_one_method :run_one_method

    def run_one_method klass, method_name, reporter
      before = Coverage.peek_result
      old_run_one_method klass, method_name, reporter
      after = Coverage.peek_result
      LOGS << [ klass.name, method_name.to_s, before, after ]

To integrate in Minitest, we need to monkey patch it. I couldn’t figure out a better way to do this than by adding a monkey patch. Anyway, the run_one_method method is the method that will run one test case. We alias off Minitest’s implementation, then add our own. Our implementation grabs the current coverage info, then calls the old implementation which runs the test, then grabs coverage info again. Once we have coverage info and test info, we add that in to the LOGS array. When Minitest is done running, it will execute the block we provided to after_run, where we write out the test and coverage information.

Now the RSpec version:

require 'coverage'
require 'json'
require 'rspec'

LOGS = []

RSpec.configuration.after(:suite) {
  File.open('run_log.json', 'w') { |f| f.write JSON.dump LOGS }

RSpec.configuration.around(:example) do |example|
  before = Coverage.peek_result
  after = Coverage.peek_result
  LOGS << [ example.full_description, before, after ]

There’s really not much difference between the two. The main changes in the RSpec version are that I don’t have to monkey patch anything, and we record example.full_description rather than the class and method name.

Now that we’ve got this code, we can run the whole suite and collect coverage information that is split by test. We can figure out what code each test executes. Next we need to figure out what code changed.

What code changed?

This example is only going to work with git repositories. To figure out what code changed, we’ll be using the rugged gem. The rugged gem wraps up libgit2 and gives you access to information about git repositories. With it, we can figure out what files and lines were modified.

To keep these examples short, we’ll take a very naive approach and just say that we’re only interested in lines that have been added or deleted. If the lines were added or deleted, we want to run that same line.

require 'rugged'
require 'set'

repo = Rugged::Repository.new '.'
lines_to_run = Set.new

repo.index.diff.each_patch { |patch|
  file = patch.delta.old_file[:path]

  patch.each_hunk { |hunk|
    hunk.each_line { |line|
      case line.line_origin
      when :addition
        lines_to_run << [file, line.new_lineno]
      when :deletion
        lines_to_run << [file, line.old_lineno]
      when :context
        # do nothing

This code opens the git repository, gets a diff from the index, and iterates over each patch. For each patch, it looks at each hunk, and each line of the hunk. If the line was an addition or deletion, we store the file name and line number of the change.

So if the output of git diff looks like this:

diff --git a/lib/my_thing.rb b/lib/my_thing.rb
index 806deff..eb057b9 100644
--- a/lib/my_thing.rb
+++ b/lib/my_thing.rb
@@ -4,7 +4,7 @@ class Whatever
   def bar
-    "bar #{@foo}"
+    raise
   def baz

The lines_to_run set will contain one array like this:

#<Set: {["lib/my_thing.rb", 7]}>

Now that we have the lines to execute, lets map those back to tests.

Mapping back to tests

For each test, we recorded two pieces of coverage information. We recorded the coverage before the test ran, then we recorded the coverage after the test ran. We need to compute the difference between the two in order to figure out what lines the test ran.

The following function computes the difference:

def diff before, after
  after.each_with_object({}) do |(file_name,line_cov), res|
    before_line_cov = before[file_name]

    # skip arrays that are exactly the same
    next if before_line_cov == line_cov

    # subtract the old coverage from the new coverage
    cov = line_cov.zip(before_line_cov).map do |line_after, line_before|
      if line_after
        line_after - line_before

    # add the "diffed" coverage to the hash
    res[file_name] = cov

Coverage information is returned from the coverage tool as a hash, where the keys are file names, and the value is an array where each index of the array represents one line in the source. The value at that index represents how many times that line has been run.

The above function iterates through the before and after hashes, subtracting the “before” coverage from the “after” coverage and produces a hash where the keys are file names and the value is the coverage information just for that test.

Now that we can compute per-test coverage information, we need to map the code changes back to test methods. The modified file and line numbers are the key. We need to be able to look up tests by file name and line number.

cov_map = Hash.new { |h, file|
  h[file] = Hash.new { |i, line|
    i[line] = []

File.open('run_log.json') do |f|
  # Read in the coverage info
  JSON.parse(f.read).each do |desc, before, after|

    # calculate the per test coverage
    delta = diff before, after

    delta.each_pair do |file, lines|
      file_map = cov_map[file]

      lines.each_with_index do |val, i|
        # skip lines that weren't executed
        next unless val && val > 0

        # add the test name to the map. Multiple tests can execute the same
        # line, so we need to use an array.  Arrays are 0 indexed, but `rugged`
        # gives line numbers starting at 1, so we need to add one to `i`.
        file_map[i + 1] << desc

The above snippet reads in the coverage JSON, calculates the coverage for each test, then inserts the test in to cov_map, where the file name and line number are the key, and the value is a list of tests. More than one test can run any particular line of source, so we need to keep a list of tests for each file name and line number.

Now we need to combine the information from Rugged, and the information from our coverage map to produce a list of tests to run:

lines_to_run.each do |file, line|
  cov_map[File.expand_path(file)][line].each do |desc|
    puts desc

lines_to_run came from Rugged, and of course cov_map came from our coverage information. All this snippet does is iterate over the lines of code that have changed, and looks up tests that will execute that particular line, then prints it out.

I guess this is pretty anti-climactic, but now you are able to predict which tests will fail given a change to your codebase.

All of this code is available here. If you won’t actually want to type stuff, you can see a video of me predicting which tests will fail here.


I think that failure prediction and regression test selection can be a great tool for people like me that work on legacy code bases. Obviously, the value of a tool like this diminishes as your tests get faster, but if you have to work with slow tests, then I think this is a good way to save time.

Also, please take this idea. Please please please take this idea. I want this tool to exist as a gem, but I don’t have time to maintain it. So if you want to take this idea and package it up, then please do it!

Other ideas

Now that we can collect coverage information incrementally, I was thinking it would be nice if we made a Rack handler that recorded the coverage information on a per-request basis. Then we could do interesting things like cross reference code execution in the real world with what our tests actually execute.

I hope you enjoyed this article. Please have a good day!!

« go back