Inline caching in MRI
Dec 23, 2015 @ 9:58 amInline Caching
Every time you call a method, Ruby must look up the method you want to call before actually calling that method. Trying to figure out what method needs to be called can be a costly operation. So one thing that virtual machine authors try to do is to speed up that process by caching the method lookup. MRI (and most other Ruby virtual machines) store this cache “inline”. Most of us know what a cache is, but what is “inline”?
What is an “inline cache”?
Inline caches are just caches that are stored “inline” with the bytecode generated from your Ruby program. If we write a simple program and disassemble it, we can see the inline caches.
Try out this program:
def foo bar, baz
bar.baz
baz.baz
end
ins = RubyVM::InstructionSequence.of method(:foo)
puts ins.disasm
If you run the above program, you’ll get output that looks very similar to this:
$ ruby x.rb
== disasm: #<ISeq:foo@x.rb>=============================================
local table (size: 3, argc: 2 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 3] bar<Arg> [ 2] baz<Arg>
0000 trace 8 ( 1)
0002 trace 1 ( 2)
0004 getlocal_OP__WC__0 3
0006 opt_send_without_block <callinfo!mid:baz, argc:0, ARGS_SIMPLE>, <callcache>
0009 pop
0010 trace 1 ( 3)
0012 getlocal_OP__WC__0 2
0014 opt_send_without_block <callinfo!mid:baz, argc:0, ARGS_SIMPLE>, <callcache>
0017 trace 16 ( 4)
0019 leave ( 3)
On lines 0006
and 0014
, you’ll see three values.
The first value opt_send_without_block
is the virtual machine instruction, and the next two values are the parameters to the instruction.
The first parameter is the method you want to call along with some information about the method parameters.
The second parameter <callcache>
is an instance of a “call cache” and it is stored inline with the bytecode (hence “inline cache”).
What is the cache key and value?
This bit differs from VM to VM, but in MRI, there are two keys to the cache. The two keys are the “global method state” and a “class serial number”. The value of the cache is the method we’re going to call.
Global Method State
The “global method state” is a global serial number that increments every time certain classes get mutated (for example with monkey patching).
You can see the current value of the global method state with RubyVM.stat
.
Here is an example to demonstrate changing the global state:
p RubyVM.stat
module Kernel
def aaron
end
end
p RubyVM.stat
If you run the above code, you’ll see output like this:
{:global_method_state=>132, :global_constant_state=>824, :class_serial=>5664}
{:global_method_state=>133, :global_constant_state=>824, :class_serial=>5664}
Adding a method to Kernel
caused the global method state number to change.
Changing this value at runtime is very bad because it invalidates all of the call caches (remember the call cache key is the “class serial number” and the “global method state”).
Class Serial Number
The other cache key is called the “class serial number”.
This number is derived from the class (also any singleton classes associated with the object) of the instance which is the recipient of the method call at a particular call site.
Each class in Ruby has a serial number associated with that class.
We can see the class serial number change by looking at RubyVM.stat
again:
p RubyVM.stat
class A
end
p RubyVM.stat
class B
end
p RubyVM.stat
If you run the above program, it will look something like this:
{:global_method_state=>132, :global_constant_state=>824, :class_serial=>5664}
{:global_method_state=>132, :global_constant_state=>825, :class_serial=>5666}
{:global_method_state=>132, :global_constant_state=>826, :class_serial=>5668}
You’ll notice that the class serial number is increasing. Each class gets assigned a serial number. Now, doing certain things to the class will increase the serial number. For example, if we reopen the class, the serial number will change:
p RubyVM.stat
class A
def foo; end
end
p RubyVM.stat
class A
def bar; end
end
p RubyVM.stat
Output:
{:global_method_state=>132, :global_constant_state=>824, :class_serial=>5664}
{:global_method_state=>132, :global_constant_state=>825, :class_serial=>5667}
{:global_method_state=>132, :global_constant_state=>825, :class_serial=>5668}
We can’t inspect the serial number associated with each class, but inspecting the output of RubyVM.stat
does show the side effects of reopening the class on the serial numbers.
The lesson from this is don’t reopen classes at runtime because this will invalidate caches that use that class as a key.
Recap
Lets have a short recap. Given this code:
class A
def baz; end
end
def foo bar
bar.baz
end
foo(A.new)
There will be one call cache at the bar.baz
line, the key for that cache will be the “global method state” (just some global value) and the class serial number for A
.
The value of the cache will be the location of the method baz
for that class.
Types of call sites and caches
So far, we’ve talk about where the inline caches are, as well as the key and value for the cache. But we haven’t talked about the size of the cache, or different types of caches. The size of the cache in MRI is one. Yes, one. It caches exactly one key and one value. This is called a “monomorphic” cache. Lets take the following example:
class A
def baz; end
end
def foo bar
bar.baz
end
foo(A.new)
foo(A.new)
The first time foo
is called, MRI compares the cache entry’s global method state with the current global method state, and the serial number for the A
class with the caches serial number.
You can see the code for that here.
Since there is no entry in the cache at this point, it is a cache miss, so Ruby goes through the slow path and looks up the method and caches it.
You can see the cache value being stored here.
The second time the method is called in this example, it’s a hit and we don’t have to do the slow path.
Now consider the following code:
class A
def baz; end
end
class B
def baz; end
end
def foo bar
bar.baz
end
loop do
foo(A.new)
foo(B.new)
end
The above example will never hit the cache because the type for bar
changes every time foo
is called.
Remember that part of the cache key is the class serial number.
Since the serial number for A
is different than the serial number for B
, the cache at the bar.baz
call site is never hit.
We can call this call site “polymorphic” because it has multiple types for the variable bar
.
Lets do one more example:
def foo bar
bar.baz
end
loop do
klass = Class.new {
def baz
end
}
foo(klass.new)
end
This example never hits the cache either because every instance of the class has it’s own serial number.
Also, in the previous example, the bar.baz
call site only ever saw two types, A
and B
.
This case sees an infinite number of types.
We can call this case “megamorphic”.
I consider this case to be “too hot to handle”.
Recap
So, we essentially have 3 types of call sites: “monomorphic” is a call site that only sees one type, “polymorpic” sees many types, and “megamorphic” that sees too many types. Megamorphic call sites are also polymorphic, but they just see so many types that we don’t want to deal with them.
Ruby’s inline cache only caches one value, and is most effective when the call site is monomorphic. However, Ruby’s cache always stores the last value seen. This means that if a call site is polymorphic, but doesn’t switch types very frequently, then the cost of method lookup when switching types will be amortized.
Inspecting cache hits / misses
I’ve posted a fork of Ruby that you can find here that has a patch to let you see cache hits and misses. It adds a tracepoint for tracking those inline cache hits and misses. Don’t use this in production as it slows down your interpreter, but it is fun for testing.
Lets take our first example, and trace the cache hit and misses:
class A
def baz; end
end
def foo bar
bar.baz
end
tp = TracePoint.new(:inline_cache_hit, :inline_cache_miss) do |x|
p x => x.callcache_id
end
tp.enable
foo(A.new)
foo(A.new)
If you run with my fork, you’ll see this output:
{#<TracePoint:inline_cache_miss@x.rb:14>=>[3386, 5666, "Class"]}
{#<TracePoint:inline_cache_miss@x.rb:14>=>[3387, 4052, "Object"]}
{#<TracePoint:inline_cache_miss@x.rb:6>=>[3380, 5667, "A"]}
{#<TracePoint:inline_cache_miss@x.rb:15>=>[3388, 5666, "Class"]}
{#<TracePoint:inline_cache_miss@x.rb:15>=>[3389, 4052, "Object"]}
{#<TracePoint:inline_cache_hit@x.rb:6>=>[3380, 5667, "A"]}
The left side of the hash is the tracepoint type, and the right side is an array containing a unique ID for that call site, the serial number of the class, and the class name itself.
You can see the very last line is a cache hit for A
(as we expected).
Lets try with the second example:
class A
def baz; end
end
class B
def baz; end
end
def foo bar
bar.baz
end
tp = TracePoint.new(:inline_cache_hit, :inline_cache_miss) do |x|
p x => x.callcache_id
end
tp.enable
2.times do
foo(A.new)
foo(B.new)
end
The output looks like this:
{#<TracePoint:inline_cache_miss@x.rb:18>=>[3391, 4869, "Fixnum"]}
{#<TracePoint:inline_cache_miss@x.rb:19>=>[3384, 5666, "Class"]}
{#<TracePoint:inline_cache_miss@x.rb:19>=>[3385, 4052, "Object"]}
{#<TracePoint:inline_cache_miss@x.rb:10>=>[3381, 5667, "A"]}
{#<TracePoint:inline_cache_miss@x.rb:20>=>[3386, 5669, "Class"]}
{#<TracePoint:inline_cache_miss@x.rb:20>=>[3387, 4052, "Object"]}
{#<TracePoint:inline_cache_miss@x.rb:10>=>[3381, 5670, "B"]}
{#<TracePoint:inline_cache_hit@x.rb:19>=>[3384, 5666, "Class"]}
{#<TracePoint:inline_cache_hit@x.rb:19>=>[3385, 4052, "Object"]}
{#<TracePoint:inline_cache_miss@x.rb:10>=>[3381, 5667, "A"]}
{#<TracePoint:inline_cache_hit@x.rb:20>=>[3386, 5669, "Class"]}
{#<TracePoint:inline_cache_hit@x.rb:20>=>[3387, 4052, "Object"]}
{#<TracePoint:inline_cache_miss@x.rb:10>=>[3381, 5670, "B"]}
You’ll see some cache hits in here, but none of them are inside the foo
method.
END
I’m getting tired of writing this post. For those that want a sneak peek at what I’ll write about next, you can find a patch I wrote to give MRI a polymorphic inline cache here. The TL;DR for that post will be “yes the PIC works, no it doesn’t speed up our app”.