Tenderlove Making

AdequateRecord Pro™: Like ActiveRecord, but more adequate

TL;DR: AdequateRecord is a set of patches that adds cache stuff to make ActiveRecord 2x faster

I’ve been working on speeding up Active Record, and I’d like to share what I’ve been working on! First, here is a graph:

transformations

This graph shows the number of times you can call Model.find(id) and Model.find_by_name(name) per second on each stable branch of Rails. Since it is “iterations per second”, a higher value is better. I tried running this benchmark with Rails 1.15.6, but it doesn’t work on Ruby 2.1.

Here is the benchmark code I used:

require 'active_support'
require 'active_record'

p ActiveRecord::VERSION::STRING

ActiveRecord::Base.establish_connection adapter: 'sqlite3', database: ':memory:'
ActiveRecord::Base.connection.instance_eval do
  create_table(:people) { |t| t.string :name }
end

class Person < ActiveRecord::Base; end

person = Person.create! name: 'Aaron'

id   = person.id
name = person.name

Benchmark.ips do |x|
  x.report('find')         { Person.find id }
  x.report('find_by_name') { Person.find_by_name name }
end

Now let’s talk about how I made these performance improvements.

What is AdequateRecord Pro™?

AdequateRecord Pro™ is a fork of ActiveRecord with some performance enhancements. In this post, I want to talk about how we achieved high performance in this branch. I hope you find these speed improvements to be “adequate”.

Group discounts for AdequateRecord Pro™ are available depending on the number of seats you wish to purchase.

How Does ActiveRecord Work?

ActiveRecord constructs SQL queries after doing a few transformations. Here’s an overview of the transformations:

transformations

The first transformation comes from your application code. When you do something like this in your application:

Post.where(...).where(...).order(..)

Active Record creates an instance of an ActiveRecord::Relation that contains the information that you passed to where, or order, or whatever you called. As soon as you call a method that turns this Relation instance in to an array, Active Record does a transformation on the relation objects. It turns the relation objects in to ARel objects which represent the SQL query AST. Finally, it converts the AST to an actually SQL string and passes that string to the database.

These same transformations happen when you run something like Post.find(id), or Post.find_by_name(name).

Separating Static Data

Let’s consider this statement:

Post.find(params[:id])

In previous versions of Rails, when this code was executed, if you watched your log files, you would see something like this go by:

SELECT * FROM posts WHERE id = 10
SELECT * FROM posts WHERE id = 12
SELECT * FROM posts WHERE id = 22
SELECT * FROM posts WHERE id = 33

In later versions of Rails, you would see log messages that looked something like this:

SELECT * FROM posts WHERE id = ? [id, 10]
SELECT * FROM posts WHERE id = ? [id, 12]
SELECT * FROM posts WHERE id = ? [id, 22]
SELECT * FROM posts WHERE id = ? [id, 33]

This is because we started separating the dynamic parts of the SQL statement from the static parts of the SQL statement. In the first log file, the SQL statement changed on every call. In the second log file, you see the SQL statement never changes.

Now, the problem is that even though the SQL statement never changes, Active Record still performs all the translations we discussed above. In order to gain speed, what do we do when a known input always produces the same output? Cache the computation.

Keeping the static data separated from the dynamic data allows AdequateRecord to cache the static data computations. What’s even more cool is that even databases that don’t support prepared statements will see an improvement.

Supported Forms

Not every call can benefit from this caching. Right now the only forms that are supported look like this:

Post.find(id)
Post.find_by_name(name)
Post.find_by(name: name)

This is because calculating a cache key for these calls is extremely easy. We know these statements don’t do any joins, have any “OR” clauses, etc. Both of these statements indicate the table to query, the columns to select, and the where clauses right in the Ruby code.

This isn’t to say that queries like this:

Post.where(...).where(...).etc

can’t benefit from the same techniques. In those cases we just need to be smarter about calculating our cache keys. Also, this type of query will never be able to match speeds with the find_by_XXX form because the find_by_XXX form can completely skip creating the ActiveRecord::Relation objects. The “finder” form is able to skip the translation process completely.

Using the “chained where” form will always create the relation objects, and we would have to calculate our cache key from those. In the “chained where” form, we could possibly skip the “relation -> AST” and “AST -> SQL statement” translations, but you still have to pay the price of allocating ActiveRecord::Relation objects.

When can I use this?

You can try the code now by using the adequaterecord branch on GitHub. I think we will merge this code to the master branch after Rails 4.1 has been released.

What’s next?

Before merging this to master, I’d like to do this:

  1. The current incarnation of AdequateRecord needs to be refactored a bit. I have finished the “red” and “green” phases, and now it’s time for the “refactor” step.
  2. The cache should probably be an LRU. Right now, it just caches all of the things, when we should probably be smarter about cache expiry. The cache should be bounded by number of tables and combination of columns, but that may get too large.

After merging to master I’d like to start exploring how we can integrate this cache to the “chained where” form.

On A Personal Note

Feel free to quit reading now. :-)

The truth is, I’ve been yak shaving on this performance improvement for years. I knew it was possible in theory, but the code was too complex. Finally I’ve payed off enough technical debt to the point that I was able to make this improvement a reality. Working on this code was at times extremely depressing. Paying technical debt is really not fun, but at least it is very challenging. Some time I will blurrrgh about it, but not today!

Thanks to work (AT&T) for giving me the time to do this. I think we can make the next release of Rails (the release after 4.1) the fastest version ever.

EDIT: I forgot to add that newer Post.find_by(name: name) syntax is supported, so I put it in the examples.

« go back