Predicting Test Failures
Feb 13, 2015 @ 2:58 pmRunning tests is the worst. Seriously. It takes forever, and by the time they’re all done running, I forgot what I was doing. Some apps take 20 to 30 min to run all the tests, and I just can’t wait that long. What bothers me even more than waiting so long, is that after I make a change to the code, 99% of the tests aren’t even running the code that I changed! Why am I spending time running code that is unrelated to the change I made?
On the Rails team, we keep talking about ways to generally make Rails app test suites faster by introducing things like parallel testing. But what if we could predict which tests are likely to fail after we’ve changed the code? If we run just those tests, then we could send a pull request and let Travis run the thousands of unrelated tests in the whole suite.
An excellent way to cut your test time in half is to only run half the tests!
Regression Test Selection
Picking the tests that may fail depending on the code you’ve changed is called Regression Test Selection. Today we’re going to build a regression test selector for Minitest and RSpec. The goal of this program is to answer the question:
If I modify line A of file B, what tests should I run?
In order to build this, we need to:
- Collect code coverage info on a per test basis
- Figure out what code has changed
- Map changed code back to existing tests
Lets start with collecting per test coverage.
Collecting per-test coverage
At a high level, in order to collect per-test coverage, what we’ll do is:
- Record a snapshot of the current coverage info
- Run the test
- Record a snapshot of the current coverage info
In order to do this, you’ll need to be running trunk Ruby which has this patch. Our code will look somewhat like this:
require 'coverage'
Coverage.start # start the coverage tool
before = Coverage.peek_result # grab the current coverage
somehow_run_the_test
after = Coverage.peek_result # grab the current coverage
save_the_test_info_with_coverage(before, after)
For each test framework we need to figure out how to wrap each test case with the code to collect coverage information, and record that coverage information along with enough information to rerun just that one test.
First we’ll implement this using Minitest, then implement with RSpec.
Here is our code for Minitest:
require 'coverage'
require 'json'
Coverage.start
require 'minitest'
class Minitest::Runnable
LOGS = []
Minitest.after_run {
File.open('run_log.json', 'w') { |f| f.write JSON.dump LOGS }
}
class << self
alias :old_run_one_method :run_one_method
def run_one_method klass, method_name, reporter
before = Coverage.peek_result
old_run_one_method klass, method_name, reporter
after = Coverage.peek_result
LOGS << [ klass.name, method_name.to_s, before, after ]
end
end
end
To integrate in Minitest, we need to monkey patch it. I couldn’t figure out a better way to do this than by adding a monkey patch. Anyway, the run_one_method
method is the method that will run one test case. We alias off Minitest’s implementation, then add our own. Our implementation grabs the current coverage info, then calls the old implementation which runs the test, then grabs coverage info again. Once we have coverage info and test info, we add that in to the LOGS
array. When Minitest is done running, it will execute the block we provided to after_run
, where we write out the test and coverage information.
Now the RSpec version:
require 'coverage'
require 'json'
require 'rspec'
LOGS = []
Coverage.start
RSpec.configuration.after(:suite) {
File.open('run_log.json', 'w') { |f| f.write JSON.dump LOGS }
}
RSpec.configuration.around(:example) do |example|
before = Coverage.peek_result
example.call
after = Coverage.peek_result
LOGS << [ example.full_description, before, after ]
end
There’s really not much difference between the two. The main changes in the RSpec version are that I don’t have to monkey patch anything, and we record example.full_description
rather than the class and method name.
Now that we’ve got this code, we can run the whole suite and collect coverage information that is split by test. We can figure out what code each test executes. Next we need to figure out what code changed.
What code changed?
This example is only going to work with git repositories. To figure out what code changed, we’ll be using the rugged
gem. The rugged
gem wraps up libgit2
and gives you access to information about git repositories. With it, we can figure out what files and lines were modified.
To keep these examples short, we’ll take a very naive approach and just say that we’re only interested in lines that have been added or deleted. If the lines were added or deleted, we want to run that same line.
require 'rugged'
require 'set'
repo = Rugged::Repository.new '.'
lines_to_run = Set.new
repo.index.diff.each_patch { |patch|
file = patch.delta.old_file[:path]
patch.each_hunk { |hunk|
hunk.each_line { |line|
case line.line_origin
when :addition
lines_to_run << [file, line.new_lineno]
when :deletion
lines_to_run << [file, line.old_lineno]
when :context
# do nothing
end
}
}
}
This code opens the git repository, gets a diff from the index, and iterates over each patch. For each patch, it looks at each hunk, and each line of the hunk. If the line was an addition or deletion, we store the file name and line number of the change.
So if the output of git diff
looks like this:
diff --git a/lib/my_thing.rb b/lib/my_thing.rb
index 806deff..eb057b9 100644
--- a/lib/my_thing.rb
+++ b/lib/my_thing.rb
@@ -4,7 +4,7 @@ class Whatever
end
def bar
- "bar #{@foo}"
+ raise
end
def baz
The lines_to_run
set will contain one array like this:
#<Set: {["lib/my_thing.rb", 7]}>
Now that we have the lines to execute, lets map those back to tests.
Mapping back to tests
For each test, we recorded two pieces of coverage information. We recorded the coverage before the test ran, then we recorded the coverage after the test ran. We need to compute the difference between the two in order to figure out what lines the test ran.
The following function computes the difference:
def diff before, after
after.each_with_object({}) do |(file_name,line_cov), res|
before_line_cov = before[file_name]
# skip arrays that are exactly the same
next if before_line_cov == line_cov
# subtract the old coverage from the new coverage
cov = line_cov.zip(before_line_cov).map do |line_after, line_before|
if line_after
line_after - line_before
else
line_after
end
end
# add the "diffed" coverage to the hash
res[file_name] = cov
end
end
Coverage information is returned from the coverage tool as a hash, where the keys are file names, and the value is an array where each index of the array represents one line in the source. The value at that index represents how many times that line has been run.
The above function iterates through the before
and after
hashes, subtracting the “before” coverage from the “after” coverage and produces a hash where the keys are file names and the value is the coverage information just for that test.
Now that we can compute per-test coverage information, we need to map the code changes back to test methods. The modified file and line numbers are the key. We need to be able to look up tests by file name and line number.
cov_map = Hash.new { |h, file|
h[file] = Hash.new { |i, line|
i[line] = []
}
}
File.open('run_log.json') do |f|
# Read in the coverage info
JSON.parse(f.read).each do |desc, before, after|
# calculate the per test coverage
delta = diff before, after
delta.each_pair do |file, lines|
file_map = cov_map[file]
lines.each_with_index do |val, i|
# skip lines that weren't executed
next unless val && val > 0
# add the test name to the map. Multiple tests can execute the same
# line, so we need to use an array. Arrays are 0 indexed, but `rugged`
# gives line numbers starting at 1, so we need to add one to `i`.
file_map[i + 1] << desc
end
end
end
end
The above snippet reads in the coverage JSON, calculates the coverage for each test, then inserts the test in to cov_map
, where the file name and line number are the key, and the value is a list of tests. More than one test can run any particular line of source, so we need to keep a list of tests for each file name and line number.
Now we need to combine the information from Rugged, and the information from our coverage map to produce a list of tests to run:
lines_to_run.each do |file, line|
cov_map[File.expand_path(file)][line].each do |desc|
puts desc
end
end
lines_to_run
came from Rugged, and of course cov_map
came from our coverage information. All this snippet does is iterate over the lines of code that have changed, and looks up tests that will execute that particular line, then prints it out.
I guess this is pretty anti-climactic, but now you are able to predict which tests will fail given a change to your codebase.
All of this code is available here. If you won’t actually want to type stuff, you can see a video of me predicting which tests will fail here.
Conclusion
I think that failure prediction and regression test selection can be a great tool for people like me that work on legacy code bases. Obviously, the value of a tool like this diminishes as your tests get faster, but if you have to work with slow tests, then I think this is a good way to save time.
Also, please take this idea. Please please please take this idea. I want this tool to exist as a gem, but I don’t have time to maintain it. So if you want to take this idea and package it up, then please do it!
Other ideas
Now that we can collect coverage information incrementally, I was thinking it would be nice if we made a Rack handler that recorded the coverage information on a per-request basis. Then we could do interesting things like cross reference code execution in the real world with what our tests actually execute.
I hope you enjoyed this article. Please have a good day!!