Tenderlove Making

I want DTrace probes in Ruby

Recently I was debugging a performance regression in Rails 3.1. The ticket reported for the regression just indicated a speed problem. Namely allocating new active record objects was very slow in Rails 3.1 compared to Rails 3.0.

The benchmark program

Here is the program I was using for benchmarking. It's slightly modified from the original program that @paul so kindly submitted:

require 'active_record'
require 'benchmark'

p ActiveRecord::VERSION::STRING

  :adapter  => "sqlite3",
  :database => ":memory:"

ActiveRecord::Base.connection.execute("CREATE TABLE active_record_models (id INTEGER UNIQUE, title STRING, text STRING)")

class ActiveRecordModel < ActiveRecord::Base; end

N = 100_000
Benchmark.bm { |x| x.report('new') { N.times { ActiveRecordModel.new } } }

I could tell from perftools.rb output we were calling new methods, and those new methods took plenty of time.

Here is the call graph for this benchmark on Rails 3.0:


Compared to the call graph on Rails 3.1:


After some work, I was able to remove the method calls to scope when allocating new AR objects, but the benchmark still showed 3.1 to be slower than 3.0. I noticed we were spending more time in the Garbage Collector on 3.1 than on 3.0. This lead me to believe that we were creating more objects in the newer version.

Counting Object Allocations

Counting object allocations with Ruby 1.9 is actually pretty easy. We can call ObjectSpace.count_objects to get a list of the live objects in the system. That method returns a hash where the keys are the object type and the values are the number of that object in the system (there are other keys, be we don't care about them for now).

Knowing this information, we can write a function that will calculate a rough difference in object allocations:

def allocate_count
  before = ObjectSpace.count_objects
  after = ObjectSpace.count_objects
  after.each { |k,v| after[k] = v - before[k] }

p allocate_count { 100.times { {} } }

The first thing we do in this function is disable the garbage collector. It's possible that the code we're investigating could trigger GC, and that will impact our allocated object count.

Then we grab our current list of objects, yield to the code we want to profile, grab the list again, calculate our delta, enable GC and return!

If you run this code, you should see output similar to this (I've removed keys I don't care about):

{ ... :T_HASH=&gt;101, ... }

You can see from the output that we've allocated 101 hashes. 100 came from our 100.times call, and one came from calling the method on ObjectSpace.

Since inspecting our code can impact object allocations, these counts can't be 100% accurate. Another annoying thing is that we can't determine where our code is allocating these hashes. It's easy enough to see the hash literals in this code, but imagine a project like rails. You may be able to find hash literals in your source, but determining which ones are being called is difficult.

After running this against the profile, I found that ActiveRecord in 3.1 was allocating 2x the number of hashes that 3.0 allocated. Now the problem is figuring out where these hashes were being allocated.


DTrace gives us hooks in to our processes. Many of the hooks provided are for system calls, or other various functions. DTrace requires that we write probes in our code, and unfortunately Ruby does not have DTrace probes built in.

Adding DTrace to Ruby

To find these Hash allocations, I decided to add DTrace support. I usually run Ruby built from Ruby trunk, so running a modified Ruby seemed acceptible. It turns out adding DTrace support is pretty easy.

First I added a probe definition file. This file describes the probes that I want to add to Ruby. It contains the probe names and the signatures of the probes:

provider ruby {
  probe hash__alloc(const char *, int);

This definition says we're declaring a probe called hash-alloc and it will take a string (the file name) and an int (the line number).

Next I used this definition file to generate a header file that ruby would use:

$ dtrace -o probes.h -h -s probes.d

Finally, I modified hash.c to trigger the probe:

diff --git a/hash.c b/hash.c
index b49aff8..c40d94d 100644
--- a/hash.c
+++ b/hash.c
@@ -15,6 +15,7 @@
 #include "ruby/st.h"
 #include "ruby/util.h"
 #include "ruby/encoding.h"
+#include "probes.h"
 #include <errno.h>
 #ifdef __APPLE__
@@ -221,6 +222,9 @@ hash_alloc(VALUE klass)
     OBJSETUP(hash, klass, T_HASH);
     RHASH_IFNONE(hash) = Qnil;
+       RUBY_HASH_ALLOC(rb_sourcefile(), rb_sourceline());
+    }
     return (VALUE)hash;

After this, I built and installed ruby. Then I wrote a dtrace script to display the filename and line number where hash allocations were happening:

  printf("%s:%d", copyinstr(arg0), arg1);

Then I ran the benchmark with dtrace:

$ sudo dtrace -s x.d -c 'ruby -I lib test.rb'

Here is an excerpt of the output:

~~~ 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1528 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1529 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1534 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1535 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1525 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/persistence.rb:322 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1527 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1528 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1529 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1534 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1535 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1525</code></pre>

I was able to compare the output from this script on Rails 3.0 vs Rails 3.1. From this output, I was able to determine:

  • The number of hash allocations
  • Where the hashes were allocated
  • Which allocations were new in Rails 3.1

Armed with this information, I was able to eliminate some allocations and return speed in Rails 3.1 to that of Rails 3.0.

DTrace vs gdb vs memprof

I probably could have accomplished this task with memprof, but it requires that I use 1.8.7 from rvm. Working with RVM on my machine is difficult (don't ask, let's just say I'm a "special needs" user), and I wanted to use 1.9. So memprof was out the window.

I was able to get the same information by scripting gdb. I set a breakpoint at the correct function, called rb_sourcefile() and rb_sourceline() and redirected to a file. The problem with gdb is that it seemed very slow, and scripting it was a pain (though I am not a gdb expert!).

DTrace was fast, relatively easy to use, and very scriptable. It made me happy!


I would like to see dtrace probes officially added to ruby trunk. The ruby that ships with OS X has them built in, but they don't give us information like hash literal allocations. It looks like they were in ruby trunk at one point, but were then reverted. I would like to see them added again.

If you want to play with them on ruby trunk, here is my full patch. Make sure to run make probes.h before make && make install.

HAPPY WEDNESDAY!!!! <3 <3 <3 <3

(it feels good to blog on a non “professional” blog)

« go back