Tenderlove Making

Monkey Patch Detection in Ruby

My last post detailed one way that CRuby will eliminate some intermediate array allocations when using methods like Array#hash and Array#max. Part of the technique hinges on detecting when someone monkey patches array. Today, I thought we’d dive a little bit in to how CRuby detects and de-optimizes itself when these “important” methods get monkey patched.

Monkey Patching Problem

The optimization in the previous post made the assumption that the implementation Array#max was the original definition (as defined in Ruby itself). But the Ruby language allows us to reopen classes, redefine any methods we want, and that those methods will “just work”.

For example, if someone were to reopen Array and define a new max method, we would need to respect that monkey patch:

class Array
  def max
    "hello!"
  end
end

puts [1, 2].max # => "hello!"

In fact, a monkey patch implementation could mutate the array itself, so we’re definitely required to allocate an array in the case that someone added their own max method:

class Array
  def max
    self << :neat
    self
  end
end

x = [1, 2].max
p x # => [1, 2, :neat]

So how does CRuby detect that a method has been monkey patched?

Method Definition Time

Every time a method is defined, an entry is stored in a hash table pointed to by the current class. We call this the “method table”, but you’ll see it referred to as M_TBL or RCLASS_M_TBL in the code. The key to the hash is simply the method name as an ID type (an integer which represents a Ruby Symbol), and the value of the hash is a method entry structure. If there was already an entry in the table, then we know it’s a “redefinition” (a.k.a. “monkey patch”), and we end up calling rb_vm_check_redefinition_opt_method here.

rb_vm_check_redefinition_opt_method checks to see if this is a method we “care” about. Methods we “care” about are typically ones where we’ve made some kind of optimization and we need to deoptimize if someone redefines them.

If the redefined method is something we care to detect, then we set a flag in a global variable ruby_vm_redefined_flag, which is an array of integers.

The indexes of the ruby_vm_redefined_flag array correspond to “basic operators”, or BOPs. So for example, the 0th element is for BOP_PLUS, the 1th element is BOP_MINUS, etc. You can see the full list of basic operators here. These basic operators correspond to method names that we care about. So if someone monkey patches the + operator, we’ll set a flag in ruby_vm_redefined_flag[BOP_PLUS].

The values of the ruby_vm_redefined_flag array correspond to a bitmap that maps to classes we care about. You can see the list of classes and their corresponding bits here, as well as a function for mapping “classes we care about” to their corresponding bit flag.

For example, if someone monkey patches Array#pack, we would set a bit in ruby_vm_redefined_flag like this:

ruby_vm_redefined_flag[BOP_PACK] |= ARRAY_REDEFINED_OP_FLAG;

Then, when we execute our optimized instruction (opt_newarray_send which was introduced in the last post), we can check the bitmap to decide whether or not to take our fast path:

if ((ruby_vm_redefined_flag[BOP_PACK] & ARRAY_REDEFINED_OP_FLAG) == 0) {
  // It _hasn't_ been monkey patched, so take the fast path
}
else {
  // It _has_ been monkey patched, do the slow path
}

Of course this bitmask checking is wrapped in a macro that looks more like this:

if (BASIC_OP_UNREDEFINED_P(BOP_PACK, ARRAY_REDEFINED_OP_FLAG)) {
  // It _hasn't_ been monkey patched, so take the fast path
}
else {
  // It _has_ been monkey patched, do the slow path
}

You can see the actual code for Array#pack redefinition checking here.

Bonus Stuff

A cool thing (at least I think it’s cool) is that the function rb_vm_check_redefinition_opt_method not only sets up the “monkey patch detection” bits, it’s also a natural place to inform the JIT compiler that someone has done something catastrophic and that it should de-optimize. In fact, you can see those calls right here.

A weird thing is that since ruby_vm_redefined_flag is just a list bitmaps, it’s technically possible for us to track the definition of Integer#pack even though that method doesn’t exist:

ruby_vm_redefined_flag[BOP_PACK] |= INTEGER_REDEFINED_OP_FLAG;

I guess that means there’s a lot of bit space that isn’t used, but I don’t really think it’s a big deal.

Anyway, have a good day!

« go back