Tenderlove Making

Speeding up Ruby with Shared Strings

It’s not often I am able to write a patch that not only reduces memory usage, but increases speed as well. Usually I find myself trading memory for speed, so it’s a real treat when I can improve both in one patch. Today I want to talk about the patch I submitted to Ruby in this ticket. It decreases “after boot” memory usage of a Rails application by 4% and speeds up require by about 35%.

When I was writing this patch, I was actually focusing on trying to reduce memory usage. It just happens that reducing memory usage also resulted in faster runtime. So really I wanted to title this post “Reducing Memory Usage in Ruby”, but I already made a post with that title.

Shared String Optimization

As I mentioned in previous posts, Ruby objects are limited to 40 bytes. But a string can be much longer than 40 bytes, so how are they stored? If we look at the struct that represents strings, we’ll find there is a char * pointer:

struct RString {
    struct RBasic basic;
    union {
        struct {
            long len;
            char *ptr;
            union {
                long capa;
                VALUE shared;
            } aux;
        } heap;
        char ary[RSTRING_EMBED_LEN_MAX + 1];
    } as;
};

The ptr field in the string struct points to a byte array which is our string. So the actual memory usage of a string is approximately 40 bytes for the object, plus however long the string is. If we were to visualize the layout, it would look something like this:

RString pointing to char array

In this case, there are really two allocations: the RString object and the “hello world” character array. The RString object is the 40 byte Ruby object allocated using the GC, and the character array was allocated using the system’s malloc implementation.

Side note: There is another optimization called “embedding”. Without getting too far off track, “embedding” is just keeping strings that are “small enough” stored directly inside the RString structure. We can talk about that in a different post, but today pretend there are always two distinct allocations.

We can take advantage of this character array and represent substrings by just pointing at a different location. For example, we can have two Ruby objects, one representing the string “hello world” and the other representing the string “world” and only allocate one character array buffer:

RStrings sharing a char array

This example only has 3 allocations: 2 from the GC for the Ruby string objects, and one malloc for the character array. Using ObjectSpace, we can actually observe this optimization by measuring memory size of the objects after slicing them:

>> require 'objspace'
=> true
>> str = "x" * 9000; nil
=> nil
>> ObjectSpace.memsize_of str
=> 9041
>> substr = str[30, str.length - 30]; nil
=> nil
>> str.length
=> 9000
>> substr.length
=> 8970
>> ObjectSpace.memsize_of substr
=> 40

The example above first allocates a string that is 9000 characters. Next we measure the memory size of the string. The total size is 9000 for the characters, plus some overhead for the Ruby object for a total of 9041. Next we take a substring, slicing off the first 30 characters of the original. As expected, the original string is 9000 characters, and the substring is 8970. However, if we measure the size of the substring it is only 40 bytes! This is because the new string only requires a new Ruby object to be allocated, and the new object just points at a different location in the original string’s character buffer, just like the graph above showed.

This optimization isn’t limited to just strings, we can use it with arrays too:

>> list = ["x"] * 9000; nil
=> nil
>> ObjectSpace.memsize_of(list)
=> 72040
>> list2 = list[30, list.length - 30]; nil
=> nil
>> ObjectSpace.memsize_of(list2)
=> 40

In fact, functional languages where data structures are immutable can take great advantage of this optimization. In languages that allow mutations, we have to deal with the case that the original string might be mutated, where languages with immutable data structures can be even more aggressive about optimization.

Limits of the Shared String Optimization

This shared string optimization isn’t without limits though. To take advantage of this optimization, we have to always go to the end of the string. In other words, we can’t take a slice from the middle of the string and get the optimization. Lets take our sample string and slice 15 characters off each side and see what the memsize is:

>> str = "x" * 9000; nil
=> nil
>> str.length
=> 9000
>> substr = str[15, str.length - 30]; nil
=> nil
>> substr.length
=> 8970
>> ObjectSpace.memsize_of(substr)
=> 9011

We can see in the above example that the memsize of the substring is much larger than in the first example. That is because Ruby had to create a new buffer to store the substring. So our lesson here is: if you have to slice strings, start from the left and go all the way to the end.

Here is an interesting thing to think about. At the end of the following program, what is the memsize of substr? How much memory is this program actually consuming? Is the str object still alive, and how can we find out?

require 'objspace'

str = "x" * 9000
substr = str[30, str.length - 30]
str = nil
GC.start

# What is the memsize of substr?
# How much memory is this program actually consuming?
# Is `str` still alive even though we did a GC?
# Hint: use `ObjectSpace.dump_all`
# (if you try this out, I recommend running the program with `--disable-gems`)

The optimization I explained above works exactly the same way for strings in C as it does in Ruby. We will use this optimization to reduce memory usage and speed up require in Ruby.

Reducing Memory Usage and Speeding Up require

I’ve already described the technique we’re going to use to speed up require, so lets take a look at the problem. After that, we’ll apply the shared string optimization to improve performance of require.

Every time a program requires a file, Ruby has to check to see if that file has already been required. The global variable $LOADED_FEATURES is a list of all the files that have been required so far. Of course, searching through a list for a file would be quite slow and get slower as the list grows, so Ruby keeps a hash to look up entries in the $LOADED_FEATURES list. This hash is called the loaded_features_index, and it’s stored on the virtual machine structure here.

The keys of this hash are strings that could be passed to require to require a particular file, and the value is the index in the $LOADED_FEATURES array of the file that actually got required. So, for example if you have a file on your system: /a/b/c.rb, the keys to the hash will be:

  • “/a/b/c.rb”
  • “a/b/c.rb”
  • “b/c.rb”
  • “c.rb”
  • “/a/b/c”
  • “a/b/c”
  • “b/c”
  • “c”

Given a well crafted load path, any of the strings above could be used to load the /a/b/c.rb file, so the index needs to keep all of them. For example, you could do ruby -I / -e"require 'a/b/c'", or ruby -I /a -e"require 'b/c'"', etc, and they all point to the same file.

The loaded_features_index hash is built in the features_index_add function. Lets pick apart this function a little.

static void
features_index_add(VALUE feature, VALUE offset)
{
    VALUE short_feature;
    const char *feature_str, *feature_end, *ext, *p;

    feature_str = StringValuePtr(feature);
    feature_end = feature_str + RSTRING_LEN(feature);

    for (ext = feature_end; ext > feature_str; ext--)
        if (*ext == '.' || *ext == '/')
            break;
    if (*ext != '.')
        ext = NULL;
    /* Now `ext` points to the only string matching %r{^\.[^./]*$} that is
       at the end of `feature`, or is NULL if there is no such string. */

This function takes a feature and an offset as parameters. The feature is the full name of the file that was required, extension and everything. offset is the index in the loaded features list where this string is. The first part of this function starts at the end of the string and scans backwards looking for a period or a forward slash. If it finds a period, we know the file has an extension (it is possible to require a Ruby file without an extension!), if it finds a forward slash, it gives up and assumes there is no extension.

    while (1) {
        long beg;

        p--;
        while (p >= feature_str && *p != '/')
            p--;
        if (p < feature_str)
            break;
        /* Now *p == '/'.  We reach this point for every '/' in `feature`. */
        beg = p + 1 - feature_str;
        short_feature = rb_str_subseq(feature, beg, feature_end - p - 1);
        features_index_add_single(short_feature, offset);
        if (ext) {
            short_feature = rb_str_subseq(feature, beg, ext - p - 1);
            features_index_add_single(short_feature, offset);
        }
    }

Next we scan backwards in the string looking for forward slashes. Every time it finds a forward slash, it uses rb_str_subseq to get a substring and then calls features_index_add_single to register that substring. rb_str_subseq gets substrings in the same way we were doing above in Ruby, and applies the same optimizations.

The if (ext) conditional deals with files that have an extension, and this is really where our problems begin. This conditional gets a substring of feature, but it doesn’t go all the way to the end of the string. It must exclude the file extension. This means it will copy the underlying string. So these two calls to rb_str_subseq do 3 allocations total: 2 Ruby objects (the function returns a Ruby object) and one malloc to copy the string for the “no extension substring” case.

This function calls features_index_add_single to add the substring to the index. I want to call out one excerpt from the features_index_add_single function:

    features_index = get_loaded_features_index_raw();
    st_lookup(features_index, (st_data_t)short_feature_cstr, (st_data_t *)&this_feature_index);

    if (NIL_P(this_feature_index)) {
        st_insert(features_index, (st_data_t)ruby_strdup(short_feature_cstr), (st_data_t)offset);
    }

This code looks up the string in the index, and if the string isn’t in the index, it adds it to the index. The caller allocated a new Ruby string, and that string could get garbage collected, so this function calls ruby_strdup to copy the string for the hash key. It’s important to note that the keys to this hash aren’t Ruby objects, but char * pointers that came from Ruby objects (the char *ptr field that we were looking at earlier).

Lets count the allocations. So far, we have 2 Ruby objects: one with a file extension and one without, 1 malloc for the non-sharable substring, then 2 more mallocs to copy the string in to the hash. So each iteration of the while loop in features_index_add will do 5 allocations: 2 Ruby objects, and 3 mallocs.

In cases like this, a picture might help explain better. Below is a diagram of the allocated memory and how they relate to each other.

Allocations on Trunk

This diagram shows what the memory layout looks like when adding the path /a/b/c.rb to the index, resulting in 8 hash entries.

Blue nodes are allocations that were alive before the call to add the path to the index. Red nodes are intermediate allocations done while populating the index, and will be freed at some point. Black nodes are allocations made while adding the path to the index but live after we’ve finished adding the path to the index. Solid arrows represent actual references, where dotted lines indicate a relationship but not actually a reference (like one string was ruby_strdup‘d from another).

The graph has lots of nodes and is very complicated, but we will clean it up!

Applying the Shared String Optimization

I’ve translated the C code to Ruby code so that we can more easily see the optimization at work:

$features_index = {}

def features_index_add(feature, index)
  ext = feature.index('.')
  p = ext ? ext : feature.length

  loop do
    p -= 1
    while p > 0 && feature[p] != '/'
      p -= 1
    end
    break if p == 0

    short_feature = feature[p + 1, feature.length - p - 1] # New Ruby Object
    features_index_add_single(short_feature, index)

    if ext # slice out the file extension if there is one
      short_feature = feature[p + 1, ext - p - 1] # New Ruby Object + malloc
      features_index_add_single(short_feature, index)
    end
  end
end

def features_index_add_single(str, index)
  return if $features_index.key?(str)

  $features_index[str.dup] = index # malloc
end

features_index_add "/a/b/c.rb", 1

As we already learned, the shared string optimization only works when the substrings include the end of the shared string. That is, we can only take substrings from the left side of the string.

The first change we can make is to split the strings in to two cases: one with an extension, and one without. Since the “no extension” if statement does not scan to the end of the string, it always allocates a new string. If we make a new string that doesn’t contain the extension, then we can eliminate one of the malloc cases:

$features_index = {}

def features_index_add(feature, index)
  no_ext_feature = nil
  p              = feature.length
  ext            = feature.index('.')

  if ext
    p = ext
    no_ext_feature = feature[0, ext] # New Ruby Object + malloc
  end

  loop do
    p -= 1
    while p > 0 && feature[p] != '/'
      p -= 1
    end
    break if p == 0

    short_feature = feature[p + 1, feature.length - p - 1] # New Ruby Object
    features_index_add_single(short_feature, index)

    if ext
      len = no_ext_feature.length
      short_feature = no_ext_feature[p + 1, len - p - 1] # New Ruby Object
      features_index_add_single(short_feature, index)
    end
  end
end

def features_index_add_single(str, index)
  return if $features_index.key?(str)

  $features_index[str.dup] = index # malloc
end

features_index_add "/a/b/c.rb", 1

This changes the function to allocate one new string, but always scan to the end of both strings. Now we have two strings that we can use to “scan from the left”, we’re able to avoid new substring mallocs in the loop. You can see this change, where I allocate a new string without an extension here.

Below is a graph of what the memory layout and relationships look like after pulling up one slice, then sharing the string:

Allocations after shared slice

You can see from this graph that we were able to eliminate string buffers by allocating the “extensionless” substring first, then taking slices from it.

There are two more optimizations I applied in this patch. Unfortunately they are specific to the C language and not easy to explain using Ruby.

Eliminating Ruby Object Allocations

The existing code uses Ruby to slice strings. This allocates a new Ruby object. Now that we have two strings, we can always take substrings from the left, and that means we can use pointers in C to “create” substrings. Rather than asking Ruby APIs to slice the string for us, we simply use a pointer in C to point at where we want the substring to start. The hash table that maintains the index uses C strings as keys, so instead of passing Ruby objects around, we’ll just pass a pointer in to the string:

-       short_feature = rb_str_subseq(feature, beg, feature_end - p - 1);
-       features_index_add_single(short_feature, offset);
+       features_index_add_single(feature_str + beg, offset);
        if (ext) {
-           short_feature = rb_str_subseq(feature, beg, ext - p - 1);
-           features_index_add_single(short_feature, offset);
+           features_index_add_single(feature_no_ext_str + beg, offset);
        }
     }
-    features_index_add_single(feature, offset);
+    features_index_add_single(feature_str, offset);
     if (ext) {
-       short_feature = rb_str_subseq(feature, 0, ext - feature_str);
-       features_index_add_single(short_feature, offset);
+       features_index_add_single(feature_no_ext_str, offset);

In this case, using a pointer in to the string simplifies our code. feature_str is a pointer to the head of the string that has a file extension, and feature_no_ext_str is a pointer to the head of the string that doesn’t have a file extension. beg is the number of characters from the head of the string where we want to slice. All we have to do now is just add beg to the head of each pointer and pass that to features_index_add_single.

In this graph you can see we no longer need the intermediate Ruby objects because the “add single” function directly accesses the underlying char * pointer:

Allocations after pointer substrings

Eliminating malloc Calls

Finally, lets eliminate the ruby_strdup calls. As we covered earlier, new Ruby strings could get allocated. These Ruby strings would get free’d by the garbage collector, so we had to call ruby_strdup to keep a copy around inside the index hash. The feature string passed in is also stored in the $LOADED_FEATURES global array, so there is no need to copy that string as the array will prevent the GC from collecting it. However, we created a new string that does not have an extension, and that object could get collected. If we can prevent the GC from collecting those strings, then we don’t need to copy anything.

To keep these new strings alive, I added an array to the virtual machine (the virtual machine lives for the life of the process):

     vm->loaded_features = rb_ary_new();
     vm->loaded_features_snapshot = rb_ary_tmp_new(0);
     vm->loaded_features_index = st_init_strtable();
+    vm->loaded_features_index_pool = rb_ary_new();

Then I add the new string to the array via rb_ary_push right after allocation:

+       short_feature_no_ext = rb_fstring(rb_str_freeze(rb_str_subseq(feature, 0, ext - feature_str)));
+       feature_no_ext_str = StringValuePtr(short_feature_no_ext);
+       rb_ary_push(get_loaded_features_index_pool_raw(), short_feature_no_ext);

Now all strings in the index hash are shared and kept alive. This means we can safely remove the ruby_strdup without any strings getting free’d by the GC:

     if (NIL_P(this_feature_index)) {
-       st_insert(features_index, (st_data_t)ruby_strdup(short_feature_cstr), (st_data_t)offset);
+       st_insert(features_index, (st_data_t)short_feature_cstr, (st_data_t)offset);
     }

After this change, we don’t need to copy any memory because the hash keys can point directly in to the underlying character array inside the Ruby string object:

Use string indexes for keys

This new algorithm does 2 allocations: one to create a “no extension” copy of the original string, and one RString object to wrap it. The “loaded features index pool” array keeps the newly created string from being garbage collected, and now we can point directly in to the string arrays without needing to copy the strings.

For any file added to the “loaded features” array, we changed it from requiring O(N) allocations (where N is the number of slashes in a string) to always requiring only two allocations regardless of the number of slashes in the string.

END

By using shared strings I was able to eliminate over 76000 system calls during the Rails boot process on a basic app, reduce the memory footprint by 4%, and speed up require by 35%. Next week I will try to get some statistics from a large application and see how well it performs there!

Thanks for reading!

read more »

Reducing Memory Usage in Ruby

I’ve been working on building a compacting garbage collector in Ruby for a while now, and one of the biggest hurdles for implementing a compacting GC is updating references. For example, if Object A points to Object B, but the compacting GC moves Object B, how do we make sure that Object A points to the new location?

Solving this problem has been fairly straight forward for most objects. Ruby’s garbage collector knows about the internals of most Ruby Objects, so after the compactor runs, it just walks through all objects and updates their internals to point at new locations for any moved objects. If the GC doesn’t know about the internals of some object (for example an Object implemented in a C extension), it doesn’t allow things referred to by that object to move. For example, Object A points to Object B. If the GC doesn’t know how to update the internals of Object A, it won’t allow Object B to move (I call this “pinning” an object).

Of course, the more objects we allow to move, the better.

Earlier I wrote that updating references for most objects is fairly straight forward. Unfortunately there has been one thorn in my side for a while, and that has been Instruction Sequences.

Instruction Sequences

When your Ruby code is compiled, it is turned in to instruction sequence objects, and those objects are Ruby objects. Typically you don’t interact with these Ruby objects, but they are there. These objects store byte code for your Ruby program, any literals in your code, and some other miscellaneous information about the code that was compiled (source location, coverage info, etc).

Internally, these instruction sequence objects are referred to as “IMEMO” objects. There are multiple sub-types of IMEMO objects, and the instruction sequence sub-type is “iseq”. If you are using Ruby 2.5, and you dump the heap using ObjectSpace, you’ll see the dump now contains these IMEMO subtypes. Lets look at an example.

I’ve been using the following code to dump the heap in a Rails application:

require 'objspace'
require 'config/environment'

File.open('output.txt', 'w') do |f|
  ObjectSpace.dump_all(output: f)
end

The above code outputs all objects in memory to a file called “output.txt” in JSON lines format. Here are a couple IMEMO records from a Rails heap dump:

{
  "address": "0x7fc89d00c400",
  "type": "IMEMO",
  "class": "0x7fc89e95c130",
  "imemo_type": "ment",
  "memsize": 40,
  "flags": {
    "wb_protected": true,
    "old": true,
    "uncollectible": true,
    "marked": true
  }
}
{
  "address": "0x7fc89d00c2e8",
  "type": "IMEMO",
  "imemo_type": "iseq",
  "references": [
    "0x7fc89d00c270",
    "0x7fc89e989a68",
    "0x7fc89e989a68",
    "0x7fc89d00ef48"
  ],
  "memsize": 40,
  "flags": {
    "wb_protected": true,
    "old": true,
    "uncollectible": true,
    "marked": true
  }
}

This example came from Ruby 2.5, so both records contain an imemo_type field. The first example is a “ment” or “method entry”, and the second example is an “iseq” or an “instruction sequence”. Today we’ll look at instruction sequences.

Format of Instruction Sequence

The instruction sequences are the result of compiling our Ruby code. The instruction sequences are a binary representation of our Ruby code. These instructions are stored on the instruction sequence object, specifically this iseq_encoded field (iseq_size is the length of the iseq_encoded field).

If you were to examine iseq_encoded, you’ll find it’s just a list of numbers. The list of numbers is virtual machine instructions as well as parameters (operands) for the instructions.

If we examine the iseq_encoded list, it might look something like this:

  Address Description
0 0x00000001001cddad Instruction (0 operands)
1 0x00000001001cdeee Instruction (2 operands)
2 0x00000001001cdf1e Operand
3 0x000000010184c400 Operand
4 0x00000001001cdeee Instruction (2 operands)
5 0x00000001001c8040 Operand
6 0x0000000100609e40 Operand
7 0x0000000100743d10 Instruction (1 operand)
8 0x00000001001c8040 Operand
9 0x0000000100609e50 Instruction (1 operand)
10 0x0000000100743d38 Operand

Each element of the list corresponds to either an instruction, or the operands for an instruction. All of the operands for an instruction follow that instruction in the list. The operands are anything required for executing the corresponding instruction, including Ruby objects. In other words, some of these addresses could be addresses for Ruby objects.

Since some of these addresses could be Ruby objects, it means that instruction sequences reference Ruby objects. But, if instruction sequences reference Ruby objects, how do the instruction sequences prevent those Ruby objects from getting garbage collected?

Liveness and Code Compilation

As I said, instruction sequences are the result of compiling your Ruby code. During compilation, some parts of your code are converted to Ruby objects and then the addresses for those objects are embedded in the byte code. Lets take a look at an example of when a Ruby object will be embedded in instruction sequences, then look at how those objects are kept alive.

Our sample code is just going to be puts "hello world". We can use RubyVM::InstructionSequence to compile the code, then disassemble it. Disassembly decodes iseq_encoded and prints out something more readable.

>> insns = RubyVM::InstructionSequence.compile 'puts "hello world"'
=> <RubyVM::InstructionSequence:<compiled>@<compiled>>
>> puts insns.disasm
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putself          
0003 putstring        "hello world"
0005 opt_send_without_block <callinfo!mid:puts, argc:1, FCALL|ARGS_SIMPLE>, <callcache>
0008 leave            
=> nil
>>

Instruction 003 is the putstring instruction. Lets look at the definition of the putstring instruction which can be found in insns.def:

/* put string val. string will be copied. */
DEFINE_INSN
putstring
(VALUE str)
()
(VALUE val)
{
    val = rb_str_resurrect(str);
}

When the virtual machine executes, it will jump to the location of the putstring instruction, decode operands, and provide those operands to the instruction. In this case, the putstring instruction has one operand called str which is of type VALUE, and one return value called val which is also of type VALUE. The instruction body itself simply calls rb_str_resurrect, passing in str, and assigning the return value to val. rb_str_resurrect just duplicates a Ruby string. So this instruction takes a Ruby object (a string which has been stored in the instruction sequences), duplicates that string, then the virtual machines pushes that duplicated string on to the stack. For a fun exercise, try going through this process with puts "hello world".freeze and take a look at the difference.

Now, how does the string “hello world” stay alive until this instruction is executed? Something must mark the string object so the garbage collector knows that a reference is being held.

The way the instruction sequences keep these objects alive is through the use of what it calls a “mark array”. As the compiler converts your code in to instruction sequences, it will allocate a string for “hello world”, then push that string on to an array. Here is an excerpt from compile.c that does this:

case TS_VALUE:    /* VALUE */
{
    VALUE v = operands[j];
    generated_iseq[code_index + 1 + j] = v;
    /* to mark ruby object */
    iseq_add_mark_object(iseq, v);
    break;
}

All iseq_add_mark_object does is push the VALUE on to an array which is stored on the instruction sequence object. iseq is the instruction sequence object, and v is the VALUE we want to keep alive (in this case the string “hello world”). If you look in vm_core.h, you can find the location of that mark array with a comment that says:

VALUE mark_ary;     /* Array: includes operands which should be GC marked */

Instruction Sequence References and Compaction

So, instruction sequences contain two references to a string literal: one in the instructions in iseq_encoded, and one via the mark array. If the string literal moves, then both locations will need to be updated. Updating array internals is fairly trivial: it’s just a list. Updating instruction sequences on the other hand is not so easy.

To update references in the instruction sequences, we have to disassemble the instructions, locate any VALUE operands, and update those locations. There wasn’t any code to walk these instructions, so I introduced a function that would disassemble instructions and call a function pointer with those objects. This allows us to find new locations of Ruby objects and update the instructions. But what if we could use this function for something more?

Reducing Memory

Now we’re finally on to the part about saving memory. The point of the mark arrays stored on the instruction sequence objects is to keep any objects referred to by instruction sequences alive:

ISeq and Array marking paths

We can reuse the “update reference” function to mark references contained directly in instruction sequences. This means we can reduce the size of the mark array:

Mark Literals via disasm

Completely eliminating the mark array is a different story as there are things stored in the mark array that aren’t just literals. However, if we directly mark objects from the instruction sequences, then we rarely have to grow the array. The amount of memory we save is the size of the array plus any unused extra capacity in the array.

I’ve made a patch that implements this strategy, and you can find it on the GitHub fork of Ruby.

I found that this saves approximately 3% memory on a basic Rails application set to production mode. Of course, the more code you load, the more memory you save. I expected the patch to impact GC performance because disassembling instructions and iterating through them should be harder than just iterating an array. However, since instruction sequences get old, and we have a generational garbage collector, the impact to real apps is very small.

I’m working to upstream this patch to Ruby, and you can follow along and read more information about the analysis here.

Anyway, I hope you found this blurgh post informative, and please have a good day!

<3 <3 <3

I want to give a huge thanks to Allison McMillan. Every week she’s been helping me figure out what is going on with this complex code. I definitely recommend that you follow her on Twitter.

read more »

Visualizing Your Ruby Heap

In a previous post, I wrote a bit about how Ruby objects are laid out in memory. Today we’ll use that information to write a program that will allow us to take a Ruby heap dump and visualize the layout and fragmentation of that heap.

Ruby Object Layout Recap

Just as a recap, Ruby objects are fixed width. That is, every Ruby object is the same size: 40 bytes. Objects are not really allocated with malloc, but instead they are placed inside of pages.

A Ruby process has many pages. Pages have many objects.

Which Page Does This Object Belong To?

Objects are allocated in to a page. Each page is 2 ^ 14th bytes. In other words, Ruby objects aren’t allocated one at a time, but the GC allocates one page (also known as an “arena”), and when a new Ruby object is requested, it is placed inside that page.

Pages aren’t exactly 2 ^ 14 bytes. When we allocate a page, we want that page to be aligned with Operating System memory pages, so the total malloc’d size needs to be some number less than a multiple of 4kb (which is the OS page size). Since the malloc system call has some overhead, we have to subtract some some amount from the actual malloc’d size so that the Ruby page aligns and fits on contiguous OS pages. The padding size we use is sizeof(size_t) * 5. So the size of a page is actually (2 ^ 14) - (sizeof(size_t) * 5).

Each page has a header that contains some information about the page. The size of that header is sizeof(void *).

This means the maximum number of Ruby objects that can be stored on a Ruby page is ((2 ^ 14) - (sizeof(size_t) * 5) - sizeof(void *)) / 40.

Since the number of objects per page is bounded, we can apply a bitmask to the bottom 14 bits (remember page sizes are 2 ^ 14, IOW 1 left shifted 14) of a Ruby object address and calculate the page that object lives on. That bitmask is ~0 << 14.

In ASCII art, say we have a Ruby object address 0x7fcc6c845108. In binary:

11111111100110001101100100001000101000100001000
^---------- page address --------^- object id ^

“object id” in the above chart isn’t the traditional Ruby object id, it’s just the part of the bits that represent that individual object on the page. The entire address is considered the traditional “object id”.

Lets extract these numbers to some Ruby code:

require 'fiddle'

SIZEOF_HEAP_PAGE_HEADER_STRUCT = Fiddle::SIZEOF_VOIDP

SIZEOF_RVALUE           = 40
HEAP_PAGE_ALIGN_LOG     = 14
HEAP_PAGE_ALIGN         = 1 << HEAP_PAGE_ALIGN_LOG      # 2 ^ 14
HEAP_PAGE_ALIGN_MASK    = ~(~0 << HEAP_PAGE_ALIGN_LOG)  # Mask for getting page address
REQUIRED_SIZE_BY_MALLOC = Fiddle::SIZEOF_SIZE_T * 5     # padding needed by malloc
HEAP_PAGE_SIZE          = HEAP_PAGE_ALIGN - REQUIRED_SIZE_BY_MALLOC # Actual page size
HEAP_PAGE_OBJ_LIMIT     = (HEAP_PAGE_SIZE - SIZEOF_HEAP_PAGE_HEADER_STRUCT) / SIZEOF_RVALUE

I mentioned this a little earlier, but I will be explicit here: Ruby pages are allocated with aligned mallocs. In other words, when a Ruby page is allocated it’s allocated on an address that is divisible by 2 ^ 14, and the size of the page is slightly smaller than 2 ^ 14.

Lets write a function that, given an object address, returns the address of the page where that object was placed:

def page_address_from_object_address object_address
  object_address & ~HEAP_PAGE_ALIGN_MASK
end

Now lets print the page addresses for 3 object addresses:

p page_address_from_object_address(0x7fcc6c8367e8) # => 140515970596864
p page_address_from_object_address(0x7fcc6c836838) # => 140515970596864
p page_address_from_object_address(0x7fcc6c847b88) # => 140515970662400

We can see from the output that the first two objects live on the same page, but the third object lives on a different page.

How many objects are on this page?

Ruby objects are also aligned, but they are aligned inside the existing page. They are aligned at 40 bytes (which is also the size of the object). That means that every Ruby object address is guaranteed to be divisible by 40 (this statement isn’t true for non heap allocated objects like numbers).

Ruby objects are never allocated, but they are placed inside a page that has been allocated. The pages are aligned on 2 ^ 14, but not every number divisible by 2 ^ 14 is also evenly divisible by 40. That means some pages store more objects than others. Pages that are evenly divisible by 40 will store one more object than objects that aren’t.

Lets write a function that, given a page address, calculates the number of objects it can store as well as where those object are placed, and returns an object that represents the page information.

Page = Struct.new :address, :obj_start_address, :obj_count

def page_info page_address
  limit = HEAP_PAGE_OBJ_LIMIT # Max number of objects per page

  # Pages have a header with information, so we have to take that in to account
  obj_start_address = page_address + SIZEOF_HEAP_PAGE_HEADER_STRUCT

  # If the object start address isn't evenly divisible by the size of a
  # Ruby object, we need to calculate the padding required to find the first
  # address that is divisible by SIZEOF_RVALUE
  if obj_start_address % SIZEOF_RVALUE != 0
    delta = SIZEOF_RVALUE - (obj_start_address % SIZEOF_RVALUE)
    obj_start_address += delta # Move forward to first address

    # Calculate the number of objects this page can actually hold
    limit = (HEAP_PAGE_SIZE - (obj_start_address - page_address)) / SIZEOF_RVALUE
  end

  Page.new page_address, obj_start_address, limit
end

Now that we can get information about the page where the object is stored, lets examine page information for the object addresses we were working with in the previous example.

page_address = page_address_from_object_address(0x7fcc6c8367e8)
p page_info(page_address)
# => #<struct Page address=140515970596864, obj_start_address=140515970596880, obj_count=408>

page_address = page_address_from_object_address(0x7fcc6c836838)
p page_info(page_address)
# => #<struct Page address=140515970596864, obj_start_address=140515970596880, obj_count=408>

page_address = page_address_from_object_address(0x7fcc6c847b88)
p page_info(page_address)
# => #<struct Page address=140515970662400, obj_start_address=140515970662440, obj_count=407>

The first two objects live on the same page, and that page can store 408 objects. The third object lives on a different page, and that page can only store 407 objects.

It may not seem like it, but we now have all of the key pieces of information we need in order to visualize the contents of our heap.

Data Acquisition

In order to visualize a heap, we actually need a heap to visualize. I will use ObjectSpace to dump the heap to a JSON file, and we’ll used the code from above along with a JSON parser and ChunkyPNG to generate a graph.

Here is our test program:

require 'objspace'

x = 100000.times.map { Object.new }
GC.start
File.open('heap.json', 'w') { |f|
  ObjectSpace.dump_all(output: f)
}

All it does is allocate a bunch of objects, GC, then dump the heap to a JSON file called heap.json. Each line in the JSON document is one object in the Ruby heap.

Now lets write a program to process the JSON file. What we’ll do is change the Page class so that it can keep track of objects that are living on that page, then iterate over the JSON document and add each object to its respective page.

class Page < Struct.new :address, :obj_start_address, :obj_count
  def initialize address, obj_start_address, obj_count
    super
    @live_objects = []
  end

  def add_object address
    @live_objects << address
  end
end

# Keep track of pages
pages = {}

File.open("heap.json") do |f|
  f.each_line do |line|
    object = JSON.load line

    # Skip roots. I don't want to cover this today :)
    if object["type"] != "ROOT"
      # The object addresses are stored as strings in base 16
      address      = object["address"].to_i(16)

      # Get the address for the page
      page_address = page_address_from_object_address(address)

      # Get the page, or make a new one
      page         = pages[page_address] ||= page_info(page_address)

      page.add_object address
    end
  end
end

Visualizing the Heap

So far our processing program has divided the objects up by which pages they belong to. Now it’s time to turn that data in to a visualization of the heap. Unfortunately, we have one slight problem: the heap dump gave us information about live objects in the system. How can we visualize empty spaces in the heap?

We have a few bits of information we can use to figure out where the empty spots are in our heap. First, we know that object addresses are divisible by 40. Second, we know which address is the first address for storage (Page#obj_start_address). Third, we know how many objects a page can store (Page#obj_count). So if we start at obj_start_address, and increment by SIZEOF_RVALUE, we should either find that we read that address from the JSON file, or not. If we read the address from the JSON file, we know it’s a live object, if not, then it’s an empty slot.

So lets add a method to the Page object that iterates over the possible object addresses on the page, and yields :full if there is an object, or :empty if there is no object:

class Page < Struct.new :address, :obj_start_address, :obj_count
  def each_slot
    return enum_for(:each_slot) unless block_given?

    objs = @live_objects.sort

    obj_count.times do |i|
      expected = obj_start_address + (i * SIZEOF_RVALUE)
      if objs.any? && objs.first == expected
        objs.shift
        yield :full
      else
        yield :empty
      end
    end
  end
end

Now, from page to page, we’re able to differentiate empty slots from full slots. Lets use ChunkyPNG to generate a PNG file where each column in the PNG represent one page, and each 2 x 2 pixel square in each page represents an object. We’ll color the object red if it’s full, but just leave it empty if it’s empty:

require 'chunky_png'

pages = pages.values

# We're using 2x2 pixel squares to represent objects, so the height of
# the PNG will be 2x the max number of objects, and the width will be 2x the
# number of pages
height = HEAP_PAGE_OBJ_LIMIT * 2
width = pages.size * 2

png = ChunkyPNG::Image.new(width, height, ChunkyPNG::Color::TRANSPARENT)

pages.each_with_index do |page, i|
  i = i * 2

  page.each_slot.with_index do |slot, j|
    # If the slot is full, color it red
    if slot == :full
      j = j * 2
      png[i, j] = ChunkyPNG::Color.rgba(255, 0, 0, 255)
      png[i + 1, j] = ChunkyPNG::Color.rgba(255, 0, 0, 255)
      png[i, j + 1] = ChunkyPNG::Color.rgba(255, 0, 0, 255)
      png[i + 1, j + 1] = ChunkyPNG::Color.rgba(255, 0, 0, 255)
    end
  end
end

png.save('heap.png', :interlace => true)

After running this, we should have a file output called heap.png. Here’s the one I generated:

filled heap

This one doesn’t look as neat because we filled the heap up. Lets try dumping the heap from a relatively empty process and see what it looks like:

$ ruby -robjspace -e'File.open("heap.json", "wb") { |t| ObjectSpace.dump_all(output: t) }'

If I process this heap, the output looks like this:

empty heap

Ok that’s the end. Thank you!

You can find the full code listing from this post here.

<3<3<3<3<3

read more »

Object ID in MRI

Objects in Ruby are 40 bytes. Objects are allocated in to pages (or arenas) that are 2^14 (2 to the 14th power) bytes. Pages are allocated with an aligned malloc where the divisor is the size of a page. The first object is allocated at the first address inside the page that is divisible by 40 (or page_start + (40 - page_start % 40) % 40). This means all Ruby object addresses are divisible by 40, and that some pages hold one more object than others.

page layout

The code to find the “first object inside the page” can be found here. If the page address isn’t divisible by 40, this calculates the first offset inside the page where an object can be allocated. Objects are allocated at 40 byte offsets in order to support tagged pointers.

Tagged pointers

Every number divisible by 40 when represented in binary always ends in three 0s:

irb(main):012:0> sprintf("%0#15b", 80)
=> "0b0000001010000"
irb(main):013:0> sprintf("%0#15b", 120)
=> "0b0000001111000"
irb(main):014:0> sprintf("%0#15b", 160)
=> "0b0000010100000"
irb(main):015:0> sprintf("%0#15b", 200)
=> "0b0000011001000"

MRI exploits this fact in order to represent some objects like integers without allocating anything. Pointers are just numbers, so if the number is not divisible by 40 (in other words doesn’t end in 000), then we know it is something special.

Integers are examples of tagged pointers. Integers are encoded by shifting the number left one, then setting the last bit to 1. So the number 1 will be encoded as 3 (or in binary 11), and the number 40 will be encoded as 81 (or 1010001 in binary). If the pointer we’re dealing with has a 1 in the last bit, we know it’s a Ruby integer, and we can decode it (convert to a C integer) by shifting right 1.

Just to reiterate, a Ruby integer 20 (0b10100), is encoded to 41 (0b101001) by shifting left, then adding 1.

So C integers are converted to Ruby integers by shifting left by one, then adding one. Ruby integers are converted to C integers by shifting right by one. This is one reason why Ruby can’t represent a full 64bit number as an “immediate” value. One bit is used for encoding (the other bit is used for the sign).

A diagram of the tagging scheme is here.

Calculating Object ID

“Non special objects” are objects that don’t use the 3 bits on the right side for any meaning. An example of a “non special object” would be Object.new.

Non special objects encode their object id as their address in memory + 1. The encoding code is here. Normally, to convert a C integer to a Ruby integer, the integer is shifted left, then add one. But the address of a non special Ruby object will always be divisible by 40, so we know that the last bit is 0. So this code simply changes the last bit from a 0 to a 1. Clobbering the last bit means that when Ruby side of the program see it, it will be the address of the object shifted right by one.

If object X is at memory location 40, then the object id (in Ruby) will be 20. The Ruby integer 20 (0b10100) is encoded by shifting left then adding one, which results in 41 (0b101001). Since this code simply takes the location (in this case 40) and adds one, the result is 41 (0b101001) which is the same as 20 on the Ruby side.

In other words, object_id returns the address of the object shifted right one and we can get the actual address of the object back by shifting left one.

Calling inspect shows the actual address as hex. We can see that shifting left one, then converting to hex will give us the same number:

irb(main):021:0> x = Object.new
=> #<Object:0x00000100827400>
irb(main):022:0> x.object_id.to_s(16)
=> "80413a00" # not what we want
irb(main):023:0> (x.object_id << 1).to_s(16)
=> "100827400" # this is it!

Calculating Page Number

We can use the address of an object to calculate the arena (or page) in to which the object was allocated. Pages are aligned at 2^14, so the page header will always be divisible by 2^14. The page size is never larger than 2^14, so any offset inside the page will not be evenly divisible by 2^14. Knowing this, we can remove the lower bits of the object by using a mask that is 14 bits wide:

irb(main):001:0> MASK = ~(0) << 14
=> -16384
irb(main):002:0> 10.times { p (Object.new.object_id << 1) & MASK }
4305141760
4305141760
4305141760
4305141760
4305141760
4305141760
4305141760
4305141760
4305141760
4305141760

Now we can group objects by what page they were allocated on.

The End.

read more »

Evented GPIO on Raspberry PI with Ruby

I need to know when my cats are pooping so over the weekend I hooked up a motion sensor to my Raspberry PI. This is the code I used to get an interrupt when the motion sensor turns on or off:

require 'epoll'

def watch pin, on:
  # Export the pin we want to watch
  File.binwrite "/sys/class/gpio/export", pin.to_s

  # It takes time for the pin support files to appear, so retry a few times
  retries = 0
  begin
    # `on` should be "none", "rising", "falling", or "both"
    File.binwrite "/sys/class/gpio/gpio#{pin}/edge", on
  rescue
    raise if retries > 3
    sleep 0.1
    retries += 1
    retry
  end

  # Read the initial pin value and yield it to the block
  fd = File.open "/sys/class/gpio/gpio#{pin}/value", 'r'
  yield fd.read.chomp

  epoll = Epoll.create
  epoll.add fd, Epoll::PRI

  loop do
    fd.seek 0, IO::SEEK_SET
    epoll.wait # put the program to sleep until the status changes
    yield fd.read.chomp
  end
ensure
  # Unexport the pin when we're done
  File.binwrite "/sys/class/gpio/unexport", pin.to_s
end

pin = 5

watch pin, on: 'both' do |value|
  p value
end

Whenever an event happens on the GPIO pin, the block will be executed. I want the block to be executed when the sensor detects movement and when it detects no movement (if you imagine that as a wave, I want to know about the rising and falling edges), so I passed “both” to the watch function.

I am very new to developing on Raspberry PI, and I’m not sure what people normally use for Ruby + GPIO on Raspberry PI. I looked at the rpi_gpio gem. It gives access to read values of GPIO, but doesn’t give you any events. In other words, you can use it to read the current value of a pin, but it won’t let you know when the value of a pin has changed. It looks like there is code to support this, but it’s not fully hooked up yet. I noticed that the C code is just using Epoll, so I tried using the epoll gem, and it works.

The rpi_gpio gem is cool because it allows your program to read from a pin without echoing to “exports” and reading from a file. The gem just mmaps a special device, and then reads from memory. Unfortunately, it doesn’t seem like there is a way to generate “on change” events with that system. That means we have to write to the “export” file and run poll on the “value” file. As you can see from the example above, waiting for events (the thing we want to do) accounts for only a few lines of code where managing export / value files accounts for most of the function.

I am new to Raspberry PI development, so maybe there is an easier way, but I haven’t found it. At least this works so I can know when my cats are pooping.

The End.

read more »

I am a puts debuggerer

I love puts debugging

I am a puts debuggerer. I don’t say this to disparage people that use a Real Debugger. I think Real Debuggers are great, I’ve just never taken the time to learn one well. Every time I try to lean one, I end up not using it for a while and have to re-learn how to use it. Anyway, I want to share some tricks I use for puts debugging with you. I use these tricks whenever I don’t understand how something works and I want to know more. Most of the stuff below is definitely not a “best practice” and you should never leave this code around when you’re done with a debugging session. However, I think it’s important to say that when you’re debugging anything goes. I mean anything. Global variables, redefining methods, adding conditionals, manipulating the load path, monkey patching, printing the call stack, anything.

I’ve tried to stick to real world examples in this post. However, many of these examples came from when I was trying to debug security issues in Rails, so please reuse the techniques I’m showing here but not the target code. The target code I’m trying to debug is broken, and I don’t want you to use it.

I’ve tried to organize each section in to a headline with the problem I have, and the body as the solution I used.

Also, I gave up on using a consistent voice because this is a blog post and I don’t care.

I know where I am but not how I got here

Sometimes I’ll be debugging an issue and know where the problem is, but not how I got there. For those times I’ll just do puts caller to get the callstack.

For example I have this code:

  LOOKUP           = Hash.new { |h, k| h[k] = Type.new(k) unless k.blank? }

But I need to know how the default block is getting called, I do this:

  LOOKUP           = Hash.new { |h, k|
    puts "#" * 90
    puts caller
    puts "#" * 90
    h[k] = Type.new(k) unless k.blank?
  }

The above code will print out 90 hash tags, then the call stack, then 90 more hash tags. I print out the hash tags so that if it gets called multiple times, I can easily differentiate the call stacks. Note that I am calling them “hash tags” in order to annoy you.

I do this pretty frequently, so I have a Vim shortcut to do it:

" puts the caller
nnoremap <leader>wtf oputs "#" * 90<c-m>puts caller<c-m>puts "#" * 90<esc>

That lets me do <leader>wtf and inserts those three lines below the line that I’m on.

I only want to print the call stack once

Just use exit! after printing the callstack, or call raise. raise will show an exception so you see the stack anyway.

I only want to see the stack in certain cases

This is debugging code, so you can do whatever you want. Lets say I just want to see the stack when adding something to the hash. Just do this:

  LOOKUP           = Hash.new { |h, k|
    unless k.blank?
      puts "#" * 90
      puts caller
      puts "#" * 90
      h[k] = Type.new(k)
    end
  }

Since I’m going to throw this code away anyway, I’m free to add any kind of weird conditionals I’d like!

I’m calling a method, but I don’t know where it goes

In the case where I’m calling a method, but I don’t know where that method is implemented, I’ll use the method method along with the source_location method. For example, I have an action in a controller that calls render and I want to find that method:

def show
  render params[:id]
end

I’ll change the above code to this:

def index
  p method(:render).source_location
  render params[:id]
end

Run this:

$ curl http://localhost:3000/users/xxxx

Then in the logs I see this:

Processing by UsersController#show as */*
  Parameters: {"id"=>"xxxx"}
["/Users/aaron/git/rails/actionpack/lib/action_controller/metal/instrumentation.rb", 40]
Completed 500 Internal Server Error in 35ms (ActiveRecord: 0.0ms)

Now I know that the render method is in instrumentation.rb on line 40.

I’m calling super but I don’t know where that goes

Here I can see that render calls super, but I don’t know where that is implemented. In that case, I use super_method on the return value of method.

So I change this:

def index
  p method(:render).source_location
  render params[:id]
end

To this:

def index
  p method(:render).super_method.source_location
  render params[:id]
end

Run the same request through, and I get this for the output:

Processing by UsersController#show as */*
  Parameters: {"id"=>"xxxx"}
["/Users/aaron/git/rails/actionpack/lib/action_controller/metal/rendering.rb", 34]
Completed 500 Internal Server Error in 34ms (ActiveRecord: 0.0ms)

Now I see that super goes here. That method also calls super, but I can just rinse and repeat the above process (or use a loop!) to find the method I actually care about.

What if the thing implements the method method?

Sometimes the method trick doesn’t work because the object I want to know about implements it’s own version of method. For example, I am trying to find where the headers method is implemented on the request object with this code:

def index
  p request.method(:headers).source_location
  @users = User.all
end

When I run a request, I get this error:

ArgumentError (wrong number of arguments (given 1, expected 0)):
  app/controllers/users_controller.rb:5:in `index'

This is because the request object implements it’s own method called method. To figure out where the headers method is, I will unbind the method from Kernel, rebind it to the request, then execute it like this:

def index
  method = Kernel.instance_method(:method)
  p method.bind(request).call(:headers).source_location
  @users = User.all
end

Running the request again, I get this for the output:

Processing by UsersController#index as */*
["/Users/aaron/git/rails/actionpack/lib/action_dispatch/http/request.rb", 201]

Now I know the headers method is implemented here.

I can even find the implementation of the method method like this:

def index
  method = Kernel.instance_method(:method)
  p method.bind(request).call(:method).source_location
  @users = User.all
end

I’m calling something but I don’t know where it goes (again)

Sometimes the immediate method isn’t the method I actually care about, so using the method trick to figure out where I’m going won’t help. In that case, I’ll use a larger hammer which is TracePoint. We can redo the render example to get a list of all methods that are called from render. The methods we’ll see listed may not be called directly from render but from somewhere.

  # GET /users
  # GET /users.json
  def index
    @users = User.all
    tp = TracePoint.new(:call) do |x|
      p x
    end
    tp.enable
    render 'index'
  ensure
    tp.disable
  end

The TracePoint allocated here will fire on every “call” event and the block will print out the method name and location of the call. A “call” even it when a Ruby method gets called (not a C method). If you want to see C methods as well, use :c_call. This example will produce a ton of output. I really only use this technique when the number of methods being called is fairly small, or I don’t even know where to begin looking.

I know an exception is getting raised, but I don’t know where

Sometimes an exception gets raised, but I don’t know where the exception actually got raised. This example uses Rails 3.0.0 (we’ve since fixed this issue), but say you have this code:

ActiveRecord::Base.logger = Logger.new $stdout

User.connection.execute "oh no!"

I know this SQL won’t work, and there will be an exception. But lets see what the exception actually looks like:

  SQL (0.1ms)  oh no!
SQLite3::SQLException: near "oh": syntax error: oh no!
activerecord-3.0.0/lib/active_record/connection_adapters/abstract_adapter.rb:202:in `rescue in log': SQLite3::SQLException: near "oh": syntax error: oh no! (ActiveRecord::StatementInvalid)
	from activerecord-3.0.0/lib/active_record/connection_adapters/abstract_adapter.rb:194:in `log'
	from activerecord-3.0.0/lib/active_record/connection_adapters/sqlite_adapter.rb:135:in `execute'
	from test.rb:6:in `<top (required)>'
	from railties-3.0.0/lib/rails/commands/runner.rb:48:in `eval'
	from railties-3.0.0/lib/rails/commands/runner.rb:48:in `<top (required)>'
	from railties-3.0.0/lib/rails/commands.rb:39:in `require'
	from railties-3.0.0/lib/rails/commands.rb:39:in `<top (required)>'
	from script/rails:6:in `require'
	from script/rails:6:in `<main>'

If you read this backtrace, it looks like the exception came from abstract_adapter.rb line 202. However, what you’ll quickly notice is that this code is rescuing an exception, and then re-raising. So where did the real exception get raised? To find it, we could add some puts statements, or use the -d flag on Ruby like this:

[aaron@TC okokok (master)]$ bundle exec ruby -d script/rails runner test.rb

The -d flag will enable warnings and also print out every line where an exception was raised. Yes, this definitely prints a bunch of stuff, but if you look at the output near the end it will look something like this:

Exception `NameError' at activesupport-3.0.0/lib/active_support/core_ext/module/remove_method.rb:3 - method `_validators' not defined in Class
Exception `SQLite3::SQLException' at sqlite3-1.3.11/lib/sqlite3/database.rb:91 - near "oh": syntax error
Exception `SQLite3::SQLException' at activesupport-3.0.0/lib/active_support/notifications/instrumenter.rb:24 - near "oh": syntax error
  SQL (0.1ms)  oh no!
SQLite3::SQLException: near "oh": syntax error: oh no!
Exception `ActiveRecord::StatementInvalid' at activerecord-3.0.0/lib/active_record/connection_adapters/abstract_adapter.rb:202 - SQLite3::SQLException: near "oh": syntax error: oh no!
Exception `ActiveRecord::StatementInvalid' at railties-3.0.0/lib/rails/commands/runner.rb:48 - SQLite3::SQLException: near "oh": syntax error: oh no!
activerecord-3.0.0/lib/active_record/connection_adapters/abstract_adapter.rb:202:in `rescue in log': SQLite3::SQLException: near "oh": syntax error: oh no! (ActiveRecord::StatementInvalid)
	from activerecord-3.0.0/lib/active_record/connection_adapters/abstract_adapter.rb:194:in `log'
	from activerecord-3.0.0/lib/active_record/connection_adapters/sqlite_adapter.rb:135:in `execute'
	from test.rb:6:in `<top (required)>'
	from railties-3.0.0/lib/rails/commands/runner.rb:48:in `eval'
	from railties-3.0.0/lib/rails/commands/runner.rb:48:in `<top (required)>'
	from railties-3.0.0/lib/rails/commands.rb:39:in `require'
	from railties-3.0.0/lib/rails/commands.rb:39:in `<top (required)>'
	from script/rails:6:in `require'
	from script/rails:6:in `<main>'

You’ll see the original exception happened here:

Exception `SQLite3::SQLException' at sqlite3-1.3.11/lib/sqlite3/database.rb:91 - near "oh": syntax error

And it was re-raised here:

Exception `SQLite3::SQLException' at activesupport-3.0.0/lib/active_support/notifications/instrumenter.rb:24 - near "oh": syntax error

Cases where exceptions are getting wrapped and re-raised should expose the originating backtrace. So this is a bug, but it has been fixed, and we can cover how to fix that another day.

I want to run a command line tool with -d

Say you want to use the above technique to run your RSpec tests with the -d flag, you can do this:

$ ruby -d -S rspec

I want to use the -d flag but I don’t know how to run the process

The default Rake test task will run your tests in a sub process. This means that running ruby -d -S rake will not set the debug flag in the sub-process that runs your tests. In this case I use the RUBYOPT environment variable like this:

[aaron@TC okokok (master)]$ RUBYOPT=-d bundle exec rake test

The RUBYOPT environment variable will get applied to every Ruby program that is executed in this shell, even sub shells executed by rake. This means the rspec command above could be rewritten as:

$ RUBYOPT=-d rspec

I need to find where this object came from

Usually printing the callstack will tell me where an object came from. But sometimes the object is allocated outside of the stack, so it’s difficult to figure out where it came from. Take this code for example:

def foo
  x = baz
  bar x
end

def bar x
  puts "#" * 90
  puts caller
  puts "#" * 90
  p x
end

def baz; zot;        end
def zot; Object.new; end

foo

I used the “wtf" technique from "I know where I am but not how I got here", and printed out the value of `x`. From that, I learned that I really care how that `x` value was allocated, but if I trace back up the stack to the `foo` method, I can see that it calls `baz` to get the value. In large code bases, it can be very tricky to follow all the calls and logic through a sibling tree (if you think about the code as a graph, then the `foo` method has two children, `baz` and `bar`, so you can think of `baz` as being a sibling to `bar`). I am lazy and I don't want to go chasing through all the methods to find how this object came to be, so I like to use Ruby's object allocation tracer. Ruby's allocation tracer has been available since Ruby 2.1 (don't quote me on that). When I use it, I require and enable it as soon as possible. Then I print out the allocation location at the place I care about, like this:

require 'objspace'
ObjectSpace.trace_object_allocations_start

def foo
  x = baz
  bar x
end

def bar x
  p ObjectSpace.allocation_sourcefile(x) => ObjectSpace.allocation_sourceline(x)
end

def baz; zot;        end
def zot; Object.new; end

foo

When I run the program, the output is like this:

[aaron@TC tlm.com (master)]$ ruby x.rb 
{"x.rb"=>14}
[aaron@TC tlm.com (master)]$ 

I see that the object is allocated on line 14 in x.rb. So I go to that line and that method, then repeat the “wtf" trick or this trick until I figure out what is wrong with the program.

I usually start object tracing ASAP because I don’t know where my object will be allocated. The tracing slows down the program, but not enough to really matter while I’m debugging.

I need to require something really, really early

The above technique will only give you allocation information for objects that have been allocated after trace_object_allocations_start has been called. Lets say the object is being allocated at file require time, but you don’t know what file or where. In this case, I want to execute some code way before anything in the framework gets required. To do that, I’ll use the -r flag and write a script.

Say there’s a script where we’re trying to find the allocation location of User::BLACKLISTED_CLASS_METHODS.

require 'active_record'

ActiveRecord::Base.establish_connection adapter: 'sqlite3', database: ':memory:'

ActiveRecord::Base.connection.instance_eval do
  create_table :users
end

class User < ActiveRecord::Base; end
p ObjectSpace.allocation_sourcefile(User::BLACKLISTED_CLASS_METHODS) =>
    ObjectSpace.allocation_sourceline(User::BLACKLISTED_CLASS_METHODS)

But we don’t know which file allocates that constant, and don’t want to figure it out (yes this is slightly contrived). I will write a file, in this case named y.rb that looks like this:

require 'objspace'
ObjectSpace.trace_object_allocations_start

Then I’ll use the command line arguments to Ruby to require this file and execute my program:

[aaron@TC tlm.com (master)]$ ruby -I. -ry x.rb
{"/Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.4.0/gems/activerecord-5.0.0.beta1/lib/active_record/attribute_methods.rb"=>35}
[aaron@TC tlm.com (master)]$ 

If we break down the arguments to Ruby, the -I. says “add . to the load path”, -ry is the same as require 'y', then run x.rb. So . is added to the load path, and y.rb is required before Ruby even starts running x.rb. Then we see that the BLACKLISTED_CLASS_METHODS was allocated in attribute_methods.rb line 35. You can combine this technique with the RUBYOPT technique in case you code is run in a sub process like this:

$ RUBYOPT='-I. -ry' rake test

An object is being mutated, but I don’t know where

Sometimes I run in to an object that I know is being mutated, but I don’t know where the mutation is happening. In that case, I will call freeze on the object, run my test, and then see where an exception occurs. As an example, I needed to figure out where this variable is being mutated. I knew it was being mutated, I could see in the code it’s being mutated, but I didn’t know where that was happening first. So I changed the code to be like this:

def initialize
  super()

  @cv = new_cond

  @sharing = Hash.new(0)
  @sharing.freeze
  @waiting = {}
  @exclusive_thread = nil
  @exclusive_depth = 0
end

Booting the server gave me a exception with stack trace:

active_support/concurrency/share_lock.rb:151:in `delete': can't modify frozen Hash (RuntimeError)
	from active_support/concurrency/share_lock.rb:151:in `yield_shares'
	from active_support/concurrency/share_lock.rb:79:in `block in stop_exclusive'

Now I know where the first mutation happened. If that’s not the one I care about, I can remove the freeze I added earlier, and add it after this mutation.

I have a deadlock, but I don’t know where

When I’m having issues with threads and I don’t know where it is, I’ll add some code to inspect the live threads. For example, I was debugging this issue. The application server was hanging, but I didn’t know where. To find where the threads were hanging, I wrote a file called x.rb with code like this:

trap(:INFO) {
  Thread.list.each do |t|
    puts "#" * 90
    p t
    puts t.backtrace
    puts "#" * 90
  end
}

Then booted the app server like this:

$ ruby -I. -rx bin/rails s

Now when the app server hangs, I hit Ctrl-T (sorry, this only works on OS X, you’ll need to use kill on Linux), and I’ll see a stack trace for every thread:

##########################################################################################
#<Thread:0x007f90bc07cb38 run>
omglolwut/x.rb:7:in `backtrace'
omglolwut/x.rb:7:in `block (2 levels) in <top (required)>'
omglolwut/x.rb:4:in `each'
omglolwut/x.rb:4:in `block in <top (required)>'
gems/puma-2.16.0/lib/rack/handler/puma.rb:43:in `join'
gems/puma-2.16.0/lib/rack/handler/puma.rb:43:in `run'
gems/rack-2.0.0.alpha/lib/rack/server.rb:296:in `start'
rails/commands/server.rb:78:in `start'
rails/commands/commands_tasks.rb:90:in `block in server'
rails/commands/commands_tasks.rb:85:in `tap'
rails/commands/commands_tasks.rb:85:in `server'
rails/commands/commands_tasks.rb:49:in `run_command!'
rails/command.rb:20:in `run'
rails/commands.rb:19:in `<top (required)>'
bin/rails:4:in `require'
bin/rails:4:in `<main>'
##########################################################################################
##########################################################################################
#<Thread:0x007f90bef3b668@/Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.4.0/gems/puma-2.16.0/lib/puma/reactor.rb:136 sleep>
lib/puma/reactor.rb:29:in `select'
lib/puma/reactor.rb:29:in `run_internal'
lib/puma/reactor.rb:138:in `block in run_in_thread'
##########################################################################################
##########################################################################################
#<Thread:0x007f90bef3b500@/Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.4.0/gems/puma-2.16.0/lib/puma/thread_pool.rb:216 sleep>
lib/puma/thread_pool.rb:219:in `sleep'
lib/puma/thread_pool.rb:219:in `block in start!'
##########################################################################################
##########################################################################################
#<Thread:0x007f90bef3b3c0@/Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.4.0/gems/puma-2.16.0/lib/puma/thread_pool.rb:187 sleep>
lib/puma/thread_pool.rb:190:in `sleep'
lib/puma/thread_pool.rb:190:in `block in start!'
##########################################################################################
##########################################################################################
#<Thread:0x007f90bef3b258@/Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.4.0/gems/puma-2.16.0/lib/puma/server.rb:296 sleep>
lib/puma/server.rb:322:in `select'
lib/puma/server.rb:322:in `handle_servers'
lib/puma/server.rb:296:in `block in run'
##########################################################################################
##########################################################################################
#<Thread:0x007f90c1ef9a08@/Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.4.0/gems/puma-2.16.0/lib/puma/thread_pool.rb:61 sleep>
lib/ruby/2.4.0/monitor.rb:111:in `sleep'
lib/ruby/2.4.0/monitor.rb:111:in `wait'
lib/ruby/2.4.0/monitor.rb:111:in `wait'
lib/ruby/2.4.0/monitor.rb:132:in `wait_until'
action_dispatch/http/response.rb:170:in `block in await_commit'
lib/ruby/2.4.0/monitor.rb:214:in `mon_synchronize'
action_dispatch/http/response.rb:169:in `await_commit'
action_controller/metal/live.rb:270:in `process'
action_controller/metal.rb:193:in `dispatch'
action_controller/metal.rb:265:in `dispatch'
action_dispatch/routing/route_set.rb:50:in `dispatch'
action_dispatch/routing/route_set.rb:32:in `serve'
##########################################################################################
##########################################################################################
#<Thread:0x007f90bd1d5f38@/Users/aaron/git/rails/actionpack/lib/action_controller/metal/live.rb:279 sleep>
lib/ruby/2.4.0/monitor.rb:111:in `sleep'
lib/ruby/2.4.0/monitor.rb:111:in `wait'
lib/ruby/2.4.0/monitor.rb:111:in `wait'
lib/ruby/2.4.0/monitor.rb:123:in `wait_while'
active_support/concurrency/share_lock.rb:57:in `block (2 levels) in start_exclusive'
active_support/concurrency/share_lock.rb:154:in `yield_shares'
active_support/concurrency/share_lock.rb:56:in `block in start_exclusive'
lib/ruby/2.4.0/monitor.rb:214:in `mon_synchronize'
active_support/concurrency/share_lock.rb:51:in `start_exclusive'
active_support/concurrency/share_lock.rb:113:in `exclusive'
active_support/dependencies/interlock.rb:12:in `loading'
active_support/dependencies.rb:37:in `load_interlock'
active_support/dependencies.rb:369:in `require_or_load'
active_support/dependencies.rb:529:in `load_missing_constant'
active_support/dependencies.rb:212:in `const_missing'
active_support/dependencies.rb:561:in `load_missing_constant'
active_support/dependencies.rb:212:in `const_missing'
app/controllers/users_controller.rb:9:in `index'
##########################################################################################

I’ve trimmed the output a bit to make it easier to read, but you can now see what each thread is doing, and it’s clear which two threads are deadlocking (it’s the last two).

I want to know when a method is executed, but only at a certain time

Sometimes I want to know when a method is executed, but I want to do it after a certain time. For example, I want to know when start_exclusive is called, but after the application boots up. In these cases, I’ll mix global variables along with the trap trick I showed earlier.

First I modify start_exclusive to look like this:

def start_exclusive(purpose: nil, compatible: [], no_wait: false)
  if $foo
    puts "#" * 90
    puts caller
    puts "#" * 90
  end
  # ..
end

Then I write an x.rb file like this:

trap(:INFO) {
  puts "turning on debugging!"
  $foo = true
}

Then I boot the app server, and once everything has warmed up, I hit Ctrl-T and now my global variable is set to true and I start seeing logging messages.

END

This is everything I can think of right now. You should also check out Richard’s post on puts debugging. Have a good day. The End.

read more »

Inline caching in MRI

Inline Caching

Every time you call a method, Ruby must look up the method you want to call before actually calling that method. Trying to figure out what method needs to be called can be a costly operation. So one thing that virtual machine authors try to do is to speed up that process by caching the method lookup. MRI (and most other Ruby virtual machines) store this cache “inline”. Most of us know what a cache is, but what is “inline”?

What is an “inline cache”?

Inline caches are just caches that are stored “inline” with the bytecode generated from your Ruby program. If we write a simple program and disassemble it, we can see the inline caches.

Try out this program:

def foo bar, baz
  bar.baz
  baz.baz
end

ins = RubyVM::InstructionSequence.of method(:foo)
puts ins.disasm

If you run the above program, you’ll get output that looks very similar to this:

$ ruby x.rb 
== disasm: #<ISeq:foo@x.rb>=============================================
local table (size: 3, argc: 2 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 3] bar<Arg>   [ 2] baz<Arg>   
0000 trace            8                                               (   1)
0002 trace            1                                               (   2)
0004 getlocal_OP__WC__0 3
0006 opt_send_without_block <callinfo!mid:baz, argc:0, ARGS_SIMPLE>, <callcache>
0009 pop              
0010 trace            1                                               (   3)
0012 getlocal_OP__WC__0 2
0014 opt_send_without_block <callinfo!mid:baz, argc:0, ARGS_SIMPLE>, <callcache>
0017 trace            16                                              (   4)
0019 leave                                                            (   3)

On lines 0006 and 0014, you’ll see three values. The first value opt_send_without_block is the virtual machine instruction, and the next two values are the parameters to the instruction. The first parameter is the method you want to call along with some information about the method parameters. The second parameter <callcache> is an instance of a “call cache” and it is stored inline with the bytecode (hence “inline cache”).

What is the cache key and value?

This bit differs from VM to VM, but in MRI, there are two keys to the cache. The two keys are the “global method state” and a “class serial number”. The value of the cache is the method we’re going to call.

Global Method State

The “global method state” is a global serial number that increments every time certain classes get mutated (for example with monkey patching). You can see the current value of the global method state with RubyVM.stat. Here is an example to demonstrate changing the global state:

p RubyVM.stat

module Kernel
  def aaron
  end
end

p RubyVM.stat

If you run the above code, you’ll see output like this:

{:global_method_state=>132, :global_constant_state=>824, :class_serial=>5664}
{:global_method_state=>133, :global_constant_state=>824, :class_serial=>5664}

Adding a method to Kernel caused the global method state number to change. Changing this value at runtime is very bad because it invalidates all of the call caches (remember the call cache key is the “class serial number” and the “global method state”).

Class Serial Number

The other cache key is called the “class serial number”. This number is derived from the class (also any singleton classes associated with the object) of the instance which is the recipient of the method call at a particular call site. Each class in Ruby has a serial number associated with that class. We can see the class serial number change by looking at RubyVM.stat again:

p RubyVM.stat

class A
end

p RubyVM.stat

class B
end

p RubyVM.stat

If you run the above program, it will look something like this:

{:global_method_state=>132, :global_constant_state=>824, :class_serial=>5664}
{:global_method_state=>132, :global_constant_state=>825, :class_serial=>5666}
{:global_method_state=>132, :global_constant_state=>826, :class_serial=>5668}

You’ll notice that the class serial number is increasing. Each class gets assigned a serial number. Now, doing certain things to the class will increase the serial number. For example, if we reopen the class, the serial number will change:

p RubyVM.stat

class A
  def foo; end
end

p RubyVM.stat

class A
  def bar; end
end

p RubyVM.stat

Output:

{:global_method_state=>132, :global_constant_state=>824, :class_serial=>5664}
{:global_method_state=>132, :global_constant_state=>825, :class_serial=>5667}
{:global_method_state=>132, :global_constant_state=>825, :class_serial=>5668}

We can’t inspect the serial number associated with each class, but inspecting the output of RubyVM.stat does show the side effects of reopening the class on the serial numbers. The lesson from this is don’t reopen classes at runtime because this will invalidate caches that use that class as a key.

Recap

Lets have a short recap. Given this code:

class A
  def baz; end
end

def foo bar
  bar.baz
end

foo(A.new)

There will be one call cache at the bar.baz line, the key for that cache will be the “global method state” (just some global value) and the class serial number for A. The value of the cache will be the location of the method baz for that class.

Types of call sites and caches

So far, we’ve talk about where the inline caches are, as well as the key and value for the cache. But we haven’t talked about the size of the cache, or different types of caches. The size of the cache in MRI is one. Yes, one. It caches exactly one key and one value. This is called a “monomorphic” cache. Lets take the following example:

class A
  def baz; end
end

def foo bar
  bar.baz
end

foo(A.new)
foo(A.new)

The first time foo is called, MRI compares the cache entry’s global method state with the current global method state, and the serial number for the A class with the caches serial number. You can see the code for that here. Since there is no entry in the cache at this point, it is a cache miss, so Ruby goes through the slow path and looks up the method and caches it. You can see the cache value being stored here. The second time the method is called in this example, it’s a hit and we don’t have to do the slow path.

Now consider the following code:

class A
  def baz; end
end

class B
  def baz; end
end

def foo bar
  bar.baz
end

loop do
  foo(A.new)
  foo(B.new)
end

The above example will never hit the cache because the type for bar changes every time foo is called. Remember that part of the cache key is the class serial number. Since the serial number for A is different than the serial number for B, the cache at the bar.baz call site is never hit. We can call this call site “polymorphic” because it has multiple types for the variable bar.

Lets do one more example:

def foo bar
  bar.baz
end

loop do
  klass = Class.new {
    def baz
    end
  }
  foo(klass.new)
end

This example never hits the cache either because every instance of the class has it’s own serial number. Also, in the previous example, the bar.baz call site only ever saw two types, A and B. This case sees an infinite number of types. We can call this case “megamorphic”. I consider this case to be “too hot to handle”.

Recap

So, we essentially have 3 types of call sites: “monomorphic” is a call site that only sees one type, “polymorpic” sees many types, and “megamorphic” that sees too many types. Megamorphic call sites are also polymorphic, but they just see so many types that we don’t want to deal with them.

Ruby’s inline cache only caches one value, and is most effective when the call site is monomorphic. However, Ruby’s cache always stores the last value seen. This means that if a call site is polymorphic, but doesn’t switch types very frequently, then the cost of method lookup when switching types will be amortized.

Inspecting cache hits / misses

I’ve posted a fork of Ruby that you can find here that has a patch to let you see cache hits and misses. It adds a tracepoint for tracking those inline cache hits and misses. Don’t use this in production as it slows down your interpreter, but it is fun for testing.

Lets take our first example, and trace the cache hit and misses:

class A
  def baz; end
end

def foo bar
  bar.baz
end

tp = TracePoint.new(:inline_cache_hit, :inline_cache_miss) do |x|
  p x => x.callcache_id
end

tp.enable
foo(A.new)
foo(A.new)

If you run with my fork, you’ll see this output:

{#<TracePoint:inline_cache_miss@x.rb:14>=>[3386, 5666, "Class"]}
{#<TracePoint:inline_cache_miss@x.rb:14>=>[3387, 4052, "Object"]}
{#<TracePoint:inline_cache_miss@x.rb:6>=>[3380, 5667, "A"]}
{#<TracePoint:inline_cache_miss@x.rb:15>=>[3388, 5666, "Class"]}
{#<TracePoint:inline_cache_miss@x.rb:15>=>[3389, 4052, "Object"]}
{#<TracePoint:inline_cache_hit@x.rb:6>=>[3380, 5667, "A"]}

The left side of the hash is the tracepoint type, and the right side is an array containing a unique ID for that call site, the serial number of the class, and the class name itself. You can see the very last line is a cache hit for A (as we expected).

Lets try with the second example:

class A
  def baz; end
end

class B
  def baz; end
end

def foo bar
  bar.baz
end

tp = TracePoint.new(:inline_cache_hit, :inline_cache_miss) do |x|
  p x => x.callcache_id
end

tp.enable
2.times do
  foo(A.new)
  foo(B.new)
end

The output looks like this:

{#<TracePoint:inline_cache_miss@x.rb:18>=>[3391, 4869, "Fixnum"]}
{#<TracePoint:inline_cache_miss@x.rb:19>=>[3384, 5666, "Class"]}
{#<TracePoint:inline_cache_miss@x.rb:19>=>[3385, 4052, "Object"]}
{#<TracePoint:inline_cache_miss@x.rb:10>=>[3381, 5667, "A"]}
{#<TracePoint:inline_cache_miss@x.rb:20>=>[3386, 5669, "Class"]}
{#<TracePoint:inline_cache_miss@x.rb:20>=>[3387, 4052, "Object"]}
{#<TracePoint:inline_cache_miss@x.rb:10>=>[3381, 5670, "B"]}
{#<TracePoint:inline_cache_hit@x.rb:19>=>[3384, 5666, "Class"]}
{#<TracePoint:inline_cache_hit@x.rb:19>=>[3385, 4052, "Object"]}
{#<TracePoint:inline_cache_miss@x.rb:10>=>[3381, 5667, "A"]}
{#<TracePoint:inline_cache_hit@x.rb:20>=>[3386, 5669, "Class"]}
{#<TracePoint:inline_cache_hit@x.rb:20>=>[3387, 4052, "Object"]}
{#<TracePoint:inline_cache_miss@x.rb:10>=>[3381, 5670, "B"]}

You’ll see some cache hits in here, but none of them are inside the foo method.

END

I’m getting tired of writing this post. For those that want a sneak peek at what I’ll write about next, you can find a patch I wrote to give MRI a polymorphic inline cache here. The TL;DR for that post will be “yes the PIC works, no it doesn’t speed up our app”.

read more »

Predicting Test Failures

Running tests is the worst. Seriously. It takes forever, and by the time they’re all done running, I forgot what I was doing. Some apps take 20 to 30 min to run all the tests, and I just can’t wait that long. What bothers me even more than waiting so long, is that after I make a change to the code, 99% of the tests aren’t even running the code that I changed! Why am I spending time running code that is unrelated to the change I made?

On the Rails team, we keep talking about ways to generally make Rails app test suites faster by introducing things like parallel testing. But what if we could predict which tests are likely to fail after we’ve changed the code? If we run just those tests, then we could send a pull request and let Travis run the thousands of unrelated tests in the whole suite.

An excellent way to cut your test time in half is to only run half the tests!

Regression Test Selection

Picking the tests that may fail depending on the code you’ve changed is called Regression Test Selection. Today we’re going to build a regression test selector for Minitest and RSpec. The goal of this program is to answer the question:

If I modify line A of file B, what tests should I run?

In order to build this, we need to:

  • Collect code coverage info on a per test basis
  • Figure out what code has changed
  • Map changed code back to existing tests

Lets start with collecting per test coverage.

Collecting per-test coverage

At a high level, in order to collect per-test coverage, what we’ll do is:

  1. Record a snapshot of the current coverage info
  2. Run the test
  3. Record a snapshot of the current coverage info

In order to do this, you’ll need to be running trunk Ruby which has this patch. Our code will look somewhat like this:

require 'coverage'

Coverage.start # start the coverage tool

before = Coverage.peek_result # grab the current coverage
somehow_run_the_test
after = Coverage.peek_result # grab the current coverage

save_the_test_info_with_coverage(before, after)

For each test framework we need to figure out how to wrap each test case with the code to collect coverage information, and record that coverage information along with enough information to rerun just that one test.

First we’ll implement this using Minitest, then implement with RSpec.

Here is our code for Minitest:

require 'coverage'
require 'json'

Coverage.start

require 'minitest'

class Minitest::Runnable
  LOGS = []

  Minitest.after_run {
    File.open('run_log.json', 'w') { |f| f.write JSON.dump LOGS }
  }

  class << self
    alias :old_run_one_method :run_one_method

    def run_one_method klass, method_name, reporter
      before = Coverage.peek_result
      old_run_one_method klass, method_name, reporter
      after = Coverage.peek_result
      LOGS << [ klass.name, method_name.to_s, before, after ]
    end
  end
end

To integrate in Minitest, we need to monkey patch it. I couldn’t figure out a better way to do this than by adding a monkey patch. Anyway, the run_one_method method is the method that will run one test case. We alias off Minitest’s implementation, then add our own. Our implementation grabs the current coverage info, then calls the old implementation which runs the test, then grabs coverage info again. Once we have coverage info and test info, we add that in to the LOGS array. When Minitest is done running, it will execute the block we provided to after_run, where we write out the test and coverage information.

Now the RSpec version:

require 'coverage'
require 'json'
require 'rspec'

LOGS = []
Coverage.start

RSpec.configuration.after(:suite) {
  File.open('run_log.json', 'w') { |f| f.write JSON.dump LOGS }
}

RSpec.configuration.around(:example) do |example|
  before = Coverage.peek_result
  example.call
  after = Coverage.peek_result
  LOGS << [ example.full_description, before, after ]
end

There’s really not much difference between the two. The main changes in the RSpec version are that I don’t have to monkey patch anything, and we record example.full_description rather than the class and method name.

Now that we’ve got this code, we can run the whole suite and collect coverage information that is split by test. We can figure out what code each test executes. Next we need to figure out what code changed.

What code changed?

This example is only going to work with git repositories. To figure out what code changed, we’ll be using the rugged gem. The rugged gem wraps up libgit2 and gives you access to information about git repositories. With it, we can figure out what files and lines were modified.

To keep these examples short, we’ll take a very naive approach and just say that we’re only interested in lines that have been added or deleted. If the lines were added or deleted, we want to run that same line.

require 'rugged'
require 'set'

repo = Rugged::Repository.new '.'
lines_to_run = Set.new

repo.index.diff.each_patch { |patch|
  file = patch.delta.old_file[:path]

  patch.each_hunk { |hunk|
    hunk.each_line { |line|
      case line.line_origin
      when :addition
        lines_to_run << [file, line.new_lineno]
      when :deletion
        lines_to_run << [file, line.old_lineno]
      when :context
        # do nothing
      end
    }
  }
}

This code opens the git repository, gets a diff from the index, and iterates over each patch. For each patch, it looks at each hunk, and each line of the hunk. If the line was an addition or deletion, we store the file name and line number of the change.

So if the output of git diff looks like this:

diff --git a/lib/my_thing.rb b/lib/my_thing.rb
index 806deff..eb057b9 100644
--- a/lib/my_thing.rb
+++ b/lib/my_thing.rb
@@ -4,7 +4,7 @@ class Whatever
   end
 
   def bar
-    "bar #{@foo}"
+    raise
   end
 
   def baz

The lines_to_run set will contain one array like this:

#<Set: {["lib/my_thing.rb", 7]}>

Now that we have the lines to execute, lets map those back to tests.

Mapping back to tests

For each test, we recorded two pieces of coverage information. We recorded the coverage before the test ran, then we recorded the coverage after the test ran. We need to compute the difference between the two in order to figure out what lines the test ran.

The following function computes the difference:

def diff before, after
  after.each_with_object({}) do |(file_name,line_cov), res|
    before_line_cov = before[file_name]

    # skip arrays that are exactly the same
    next if before_line_cov == line_cov

    # subtract the old coverage from the new coverage
    cov = line_cov.zip(before_line_cov).map do |line_after, line_before|
      if line_after
        line_after - line_before
      else
        line_after
      end
    end

    # add the "diffed" coverage to the hash
    res[file_name] = cov
  end
end

Coverage information is returned from the coverage tool as a hash, where the keys are file names, and the value is an array where each index of the array represents one line in the source. The value at that index represents how many times that line has been run.

The above function iterates through the before and after hashes, subtracting the “before” coverage from the “after” coverage and produces a hash where the keys are file names and the value is the coverage information just for that test.

Now that we can compute per-test coverage information, we need to map the code changes back to test methods. The modified file and line numbers are the key. We need to be able to look up tests by file name and line number.

cov_map = Hash.new { |h, file|
  h[file] = Hash.new { |i, line|
    i[line] = []
  }
}

File.open('run_log.json') do |f|
  # Read in the coverage info
  JSON.parse(f.read).each do |desc, before, after|

    # calculate the per test coverage
    delta = diff before, after

    delta.each_pair do |file, lines|
      file_map = cov_map[file]

      lines.each_with_index do |val, i|
        # skip lines that weren't executed
        next unless val && val > 0

        # add the test name to the map. Multiple tests can execute the same
        # line, so we need to use an array.  Arrays are 0 indexed, but `rugged`
        # gives line numbers starting at 1, so we need to add one to `i`.
        file_map[i + 1] << desc
      end
    end
  end
end

The above snippet reads in the coverage JSON, calculates the coverage for each test, then inserts the test in to cov_map, where the file name and line number are the key, and the value is a list of tests. More than one test can run any particular line of source, so we need to keep a list of tests for each file name and line number.

Now we need to combine the information from Rugged, and the information from our coverage map to produce a list of tests to run:

lines_to_run.each do |file, line|
  cov_map[File.expand_path(file)][line].each do |desc|
    puts desc
  end
end

lines_to_run came from Rugged, and of course cov_map came from our coverage information. All this snippet does is iterate over the lines of code that have changed, and looks up tests that will execute that particular line, then prints it out.

I guess this is pretty anti-climactic, but now you are able to predict which tests will fail given a change to your codebase.

All of this code is available here. If you won’t actually want to type stuff, you can see a video of me predicting which tests will fail here.

Conclusion

I think that failure prediction and regression test selection can be a great tool for people like me that work on legacy code bases. Obviously, the value of a tool like this diminishes as your tests get faster, but if you have to work with slow tests, then I think this is a good way to save time.

Also, please take this idea. Please please please take this idea. I want this tool to exist as a gem, but I don’t have time to maintain it. So if you want to take this idea and package it up, then please do it!

Other ideas

Now that we can collect coverage information incrementally, I was thinking it would be nice if we made a Rack handler that recorded the coverage information on a per-request basis. Then we could do interesting things like cross reference code execution in the real world with what our tests actually execute.

I hope you enjoyed this article. Please have a good day!!

read more »

Weird stuff with hashes

Ok, so this isn’t really weird stuff, I don’t think. But maybe people don’t know about it so I thought I would post.

Hashes dup string keys

When you assign a string key to a hash, the hash copies the string and freezes it. Here is an example:

>> x = 'string'
=> "string"
>> y = {}
=> {}
>> y[x] = :value
=> :value
>> { x => y.keys.first }
=> {"string"=>"string"}
>> { x.object_id => y.keys.first.object_id }
=> {70157126548460=>70157126526760} # object ids are different
>> { x.frozen? => y.keys.first.frozen? }
=> {false=>true} # frozen value is different

You can prevent the string from being duped by freezing the string before putting it in the hash:

>> x = 'string'
=> "string"
>> x.freeze
=> "string"
>> y = {}
=> {}
>> y[x] = :value
=> :value
>> { x.object_id => y.keys.first.object_id }
=> {70157126414020=>70157126414020} # note that both object ids are the same

Why is this important?

I’ve been working with @eileencodes to improve the performance of integration tests in Rails. It turns out that the tests are spending a significant amount of time in the garbage collector. We started tracking down allocations, and found that setting values in the header hash was creating many strings.

To partially fix this, we added a freeze to the string before inserting it in to the header hash. This cut the allocations down. The code used to allocate one string for the “downcased” version, then another when the downcased version was inserted in to the hash.

Setting strings as hash keys may be a source of unnecessary object allocation in your application. Now, don’t go freezing all the strings. Make sure to measure where object allocations are happening in your app first (a subject that I’ll cover in a later blog post), then freeze strings where appropriate.

Why does the Hash dup strings?

I think there are a few reasons.

One, if you mutate the string after adding it to the hash, it would be surprising (to me anyway) if the key for the hash was mutated too. For example:

>> x = 'string'
=> "string"
>> y = {}
=> {}
>> y[x] = :value
=> :value
>> x.sub!(/r/, '')
=> "sting"
>> y.key? x
=> false
>> { y.keys => x }
=> {["string"]=>"sting"}

I think it would be pretty surprising if, after mutating the object in place, it would still be part of the hash.

I think the other more technical reason is that mutating the string changes the hash value of the string, and means your hash would need to be rehashed in order to find the same key:

>> x.hash
=> 4078764570319932244
>> x.sub!(/r/, '')
=> "sting"
>> x.hash
=> -1518905602476999392

Note that the hash value changed.

Hash value, what does that have to do with anything?

If you change the hash value of a hash key, then the key will be in the wrong place in the hash. I think this might be more clear with a code example.

You may know about the two methods you need to implement for creating a custom hash key: hash indicates where in the hash the object will be stored, and eql? is used in the case there is a hash collision.

Lets make an object that allows us to mutate the hash value, and see how it behaves after inserting it in a hash:

class Foo
  attr_accessor :hash
end

x = Foo.new
x.hash = 10

hash = {}
hash[x] = :hello

p hash.key?(x)          # => true
p hash.keys.include?(x) # => true

x.hash = 11 # change the hash value

p hash.key?(x)          # => false
p hash.keys.include?(x) # => true

hash.rehash

p hash.key?(x)          # => true
p hash.keys.include?(x) # => true

First we see that x is a key to the hash, and is in the list of keys. After we mutate the hash value, Hash#key? returns false, yet Array#include? returns true when we check for the object in the list of keys.

The reason Hash#key? returns false is because it’s using the value of hash to find the object in the hash buckets. When we first inserted the object in the hash, it went in to the “10” bucket. But now that we’ve changed the value to “11”, the hash will look for the object in the “11” bucket, which is the wrong place!

To fix this, we call the rehash method on the hash. The rehash method will redistribute the keys. Since the object in the “10” bucket is now supposed to be in the “11” bucket, it will move the object to the right place, and Hash#key? will work correctly.

Wouldn’t it be annoying if you had to do the same song and dance as this if you mutated your strings? I think this is the main reason why hashes dup and freeze string keys.

What is the lesson?

Don’t mutate the hash value on hash keys! You’ll have a bad time.

Anyway, I don’t think this is weird or anything, it’s just something that we don’t have to deal with on a day to day basis. Maybe I should have titled the post “Uncommon things to do with hashes”.

Thanks for reading!!! <3<3<3<3<3

Performance P.S. (is that a P.P.S.?)

An interesting thing is that Hash#dup will rehash the newly created hash, where Hash[] will not:

class Foo
  attr_accessor :hash
end

x = Foo.new
x.hash = 10

hash = {}
hash[x] = :hello

p hash.key?(x)          # => true

x.hash = 11 # change the hash value

p hash.key?(x)          # => false
p hash.dup.key?(x)      # => true
p Hash[hash].key?(x)    # => false

This means that duping a hash with Hash[] can end up being faster than using Hash#dup. Here is a benchmark to demonstrate:

require 'benchmark/ips'

hash = Hash[*('a'..'z').to_a]
Benchmark.ips do |x|
  x.report("Hash#dup") do
    hash.dup
  end

  x.report("Hash[]") do
    Hash[hash]
  end
end

Results on my machine:

Calculating -------------------------------------
            Hash#dup     7.705k i/100ms
              Hash[]    15.978k i/100ms
-------------------------------------------------
            Hash#dup     93.648k (± 4.9%) i/s -    470.005k
              Hash[]    230.497k (±11.2%) i/s -      1.150M

Does this mean that you should switch to Hash[]? Only if your benchmarks can prove that it’s a bottleneck. Please please please don’t change all of your code because this shows it’s faster. Make sure to measure your app performance first. But you already knew that, which is why you’re reading the PS section. ;-)

read more »

My experience with Minitest and RSpec

I’ve been using RSpec in earnest for the past 6 months now, so I thought it’s time to write a blurrrgh poast comparing RSpec with Minitest. I’ve used Minitest for years, and RSpec only for 6 months, so please keep that in mind when reading!

PLEASE REMEMBER, I don’t care what test framework you use, as long as you are testing your code. This is a post just about my experience with these two frameworks. In other words, this is an opinion piece.

INTRO!!!

I really believe that all test frameworks are essentially the same. No test framework can do something that’s impossible for another to do. It’s just code. So what differentiates test frameworks? I would say the main difference would be user interface. So I’m really going to compare user interfaces between these frameworks.

Things I like about RSpec.

By far and away, my favorite thing about RSpec is that if there is a failure, it prints how to run just that one test at the bottom of the output. I can easily copy and paste that line and run just the one test. For people who don’t know, here is an example:

describe "something" do
  it "works" do
    expect(10).to equal(11)
  end

  it "really works" do
    expect(11).to equal(11)
  end
end

When you run this, the output looks like this:

[aaron@TC tlm.com (master)]$ rspec code/fail_spec.rb 
F.

Failures:

  1) something works
     Failure/Error: expect(10).to eq(11)
       
       expected: 11
            got: 10
       
       (compared using ==)
     # ./code/fail_spec.rb:3:in `block (2 levels) in <top (required)>'

Finished in 0.00504 seconds (files took 0.20501 seconds to load)
2 examples, 1 failure

Failed examples:

rspec ./code/fail_spec.rb:2 # something works
[aaron@TC tlm.com (master)]$ 

All you have to do to rerun just the one failing spec is to copy and paste that one line like this:

rerun demo

I don’t have to think or type, I just “copy, paste, run”. I like to think of this as a “learn as you play” feature.

The other thing I like about RSpec is that it ships with a colored output formatter, so if you do rspec --color, you’ll get colored output. I’ve used Minitest for years without colored output, but I’ve come to really like the colored output from RSpec. It helps me quickly see the most important parts of the test output, the assertion failure, and stack traces (though I highly suspect that the RSpec core team all use black backgrounds in their terminals because sometimes the colors aren’t so nice on my white terminal).

Things I like about Minitest

The main thing that I like about Minitest is that a Minitest test is just a Ruby class. Here is an example test similar to the RSpec sample I showed earlier:

require 'minitest/autorun'

class Something < Minitest::Test
  def test_works
    assert_equal 11, 10
  end

  def test_really_works
    assert_equal 11, 11
  end
end

I like this because I know exactly where the test_works method is defined. It’s no different than any other class in Ruby, so I don’t have to learn anything new. I like it because I am a heavy user of CTags, so I can easily jump to test methods or classes inside my editor. Another thing that’s nice about using regular Ruby classes is that when I need to refactor my tests, I just use my normal refactoring skills: I extract methods, classes, modules, change to inheritance, etc. I use the same refactoring techniques on my test files as I do on my library files. I think that is where Minitest really shines.

Things I dislike about RSpec

RSpec is a DSL for writing tests. I think this is probably RSpec’s largest weakness. I am a programmer, I can read code, so I don’t particularly care whether or not my test code “reads like English”. I don’t understand the value of the DSL, especially when I have a nearly 4000 line specfile that I’d like to refactor. How do I go about refactoring? I can extract methods:

describe "something" do
  def helper
    11
  end

  it "works" do
    expect(10).to eq(helper)
  end

  it "really works" do
    expect(11).to eq(helper)
  end
end

But where is the helper method defined? What’s its visibility? Can I put it in a module? Can I use inheritance? Who can call it? Can I call super from the extracted method? If so, where does super go? These are not the types of questions I want to be pondering when I have 3000 test failures and really long spec files to read through. My mind spins off in to thoughts like “I wonder how RSpec works?”, and “I wonder what my cat is doing?”

From what I can gather, calling describe in RSpec essentially defines a class. But if that’s the case, why not just use Ruby classes? Then I don’t need to guess about method visibility, modules, inheritance, etc. Not to mention, the library code that I’m working with is just using Ruby classes. Why do I need to switch to some other language for testing?

Nesting a describe seems to perform inheritance, but only for certain things like before blocks. If you run this code:

describe "something" do
  before do
    puts "hi!"
  end

  it "works" do
    expect(10).to eq(helper)
  end

  it "really works" do
    expect(11).to eq(helper)
  end

  context "another thing" do
    before do
      puts "hello!"
    end

    it "really really works" do
      expect(11).to eq(helper)
    end
  end

  def helper
    11
  end
end

You’ll see “hi!” printed 3 times, once for each it, and “hello!” printed once. For the nested context, I would like to print “hello!” before it prints “hi!”. In a normal inheritance situation, I would change the call to super. I am not sure how to do what I just described without learning a new thing (I think I would need let but I don’t know). For now, I’m coping with this by refactoring to classes outside of the specs themselves, then calling the objects from the specs. Essentially “composition” in my tests.

Another thing that bugs me is that if I do try to refactor my RSpec tests, and I change any lines, then the line that I copy and pasted to run just the one test won’t work anymore. I have to figure out the new line. I can use the name of the test by specifying the -e flag on RSpec, but then I lose the “copy, paste, run” feature that I love so much.

Anyway, I will stop picking on RSpec now. To me, RSpec seems to have some sort of identity crisis and is caught somewhere between “a language for writing tests” and “just Ruby”. That is RSpec’s biggest weakness.

Things I dislike about Minitest

If RSpec does one thing better than Minitest, I would say that the command line UI easily outshines Minitest. Here is one example.

I can run one test at a time like so:

$ ruby code/fail_test.rb -n Something#test_works

This will run the test_works method on the Something class. The problem for me is that it’s not clear to do that when you see the failing test output. If we run the whole suite, the output looks like this:

[aaron@TC tlm.com (master)]$ ruby code/fail_test.rb 
Run options: --seed 5990

# Running:

.F

Finished in 0.002047s, 977.0396 runs/s, 977.0396 assertions/s.

  1) Failure:
Something#test_works [code/fail_test.rb:5]:
Expected: 11
  Actual: 10

2 runs, 2 assertions, 1 failures, 0 errors, 0 skips
[aaron@TC tlm.com (master)]$

Looking at this, how would you know how to run just that one test? With RSpec, it’s no problem. I was able to learn how to run just one test the first time I ever used RSpec. I didn’t read any docs, just ran the tests, and the info was right there. Coming from RSpec, you might need to think you could do this:

$ ruby code/fail_test.rb:5

But of course that doesn’t work because you’re just using Ruby on the command line.

Here is a short video of me running the single test:

demo

The disadvantages of this over RSpec’s output is I have to type a few characters and know a commandline interface in advance. You might think “it’s just a few characters”, but every time I have to type something is a potential for a mistake that I’ll have to correct, which takes even more time. I really think the advantages of “copy, paste, run” can’t be understated. (Yes, I know there are tools like autotest for both Minitest and RSpec that will run tests for you, but sometimes I can’t / don’t use them)

The other thing I wish Minitest would do is to ship with a color formatter like RSpec does. I am used to the color output from RSpec, and I don’t really want to go back. It really helps me focus on what’s important about my test failures.

Conclusion

Both of these frameworks have things I like and dislike. Being able to reuse my debugging and refactoring skills in my tests is pretty important to me, especially since I deal with lots of legacy code. This is why I’ll stick with Minitest in my personal projects. However, I am sure I will get better at refactoring and debugging in RSpec tests, and when I get better at it, I will share what I learn.

Now, do I care which one you use? No. As long as you test your code, I am happy. A professional developer should be able to work in either one of these because they essentially do the same thing: test your code.

read more »