October - 2011 - Tender Lovemaking | tenderlovemaking.com

2011-10-20 @ 12:14

Connection Management in ActiveRecord

OMG! Happy Thursday! I am trying to be totally enthusiastic, but the truth is that I have a cold, so there will be fewer uppercase letters and exclamation points than usual.

Anyway, I want to talk about database connection management in ActiveRecord. I am not too pleased with its current state of affairs. I would like to describe how ActiveRecord connection management works today, how I think it should work, and steps towards fixing the current system.

TL;DR: database connection API in ActiveRecord should be more similar to File API

Thinking in terms of files

It's convenient to think of our database connection as a file. Dealing with files is very common. When we work with files, the basic sequence goes something like this:

Open the file
Do some work on the file handle
Close the file

We're very used to doing these steps when dealing with files. Typically our code will look something like this:

1
2
3

File.open('somefile.txt', 'wb') do |fh| # Open the file
  fh.write "hello world"                # Do some work with the file
end                                     # Close file when block returns

We don't want to share open files among threads because dealing with synchronization around reading and writing to the file is too difficult (and time consuming). So maybe we'll store the handle in a thread local or something until we're ready to close it.

Our basic requirements for dealing with a database connection are essentially the same as when dealing with files. We need to open our database connection, do some work with the connection (send and receive queries), and close the connection. We have these similarities, yet the API for dealing with database connections in ActiveRecord is vastly different. Let's look at how each of these steps are performed in ActiveRecord today.

Opening a connection

Opening a connection to the database is very easy. First we configure ActiveRecord with the database specification, then we call connection to actually get back a database handle:

ActiveRecord::Base.establish_connection(
  :adapter  => "sqlite",
  :database => "path/to/dbfile")

connection_handle = ActiveRecord::Base.connection

The main difference between this API and the File API is that we've separated the connection specification from actually opening the connection. In the case of opening a file, we call open along with a "specification" which includes the file name and how we want to open it. In this case, we've separated the two; essentially storing the specification in a global place, then opening the connection later.

This leads to two questions:

Where is the specification stored?
When I call connection, what specification is used?

The answer to the first question can be found by reading the establish_connection method. Specifically if we look at line 63 we'll find a clue. Since this method is a class method, the call to name returns the class name of the recipient. This name (along with our actual spec) is passed in to the connection handler object. If we jump through a few more layers of indirection, we'll find that what we have is essentially a one to one mapping of class name to connection specification.

Armed with this information, we can tackle the second question. If we look at the implementation of connection, it calls retrieve_connection on itself, which calls retrieve_connection on the connection handler with itself. A few more method calls later, and we see that each ActiveRecord subclass walks up the inheritance tree looking for a connection:

def retrieve_connection_pool(klass)
  pool = @connection_pools[klass.name]
  return pool if pool
  return nil if ActiveRecord::Base == klass
  retrieve_connection_pool klass.superclass
end

If we read this code carefully, we'll notice that not only are connection specifications mapped to classes so are database connections!

Why is this bad?

This behavior smells bad to me. The reason is because we're tightly coupling classes along with database connections when really this relationship doesn't need to exist.

How can it be improved?

If this tight coupling is removed, the complexity of ActiveRecord can be reduced and at the same time increasing the features available! The way we can reduce this coupling is by passing the connection specification to the method that actually opens the connection. Specifications can be stored on each class as a convenience, but nothing more.

What if opening a connection looked more like this?

spec = ActiveRecord::Base.specificiation
ActiveRecord::ConnectionPool.open(spec) do |conn|
  ...
end

We could maintain the current behavior by storing specifications on each class, but eliminate the coupling between connection and class. We would be able to delete all of the code that looks up connections by class hierarchy, and open the doors to having features like this:

spec = database_a
ActiveRecord::ConnectionPool.open(spec) do |conn|
  User.find_all
end

spec = database_b
ActiveRecord::ConnectionPool.open(spec) do |conn|
  User.find_all
end

Working with the connection

Working with our connection should remain the same. We have one place to retrieve our connection and work with it. Woo!

Dealing with thread safety

Sharing open file handles among threads probably isn't a good idea and the same can be said about open database connections. So how does ActiveRecord keep connections localized to one thread? If we jump through many, many, method calls, we'll find where the connection is actually checked out of the connection pool. It is here we see how thread safety is handled:

# Retrieve the connection associated with the current thread, or call
# #checkout to obtain one if necessary.
#
# #connection can be called any number of times; the connection is
# held in a hash keyed by the thread id.
def connection
  @reserved_connections[current_connection_id] ||= checkout
end

A hash is kept where the key is the current_connection_id. The implementation of current_connection_id looks up the current id. If the id isn't set, it sets it to the object id of the current thread:

1
2
3

def current_connection_id #:nodoc:
  ActiveRecord::Base.connection_id ||= Thread.current.object_id
end

Next we look at the implementation of connection_id to find that it just gets and sets a thread local:

def connection_id
  Thread.current['ActiveRecord::Base.connection_id']
end

def connection_id=(connection_id)
  Thread.current['ActiveRecord::Base.connection_id'] = connection_id
end

These methods ensure that we have a one to one relationship of open connection and thread.

Closing the connection

Finally we reach our last step: closing the connection. How many of you have closed your connection to the database in ActiveRecord? My guess is that it's very few. I think the reason people don't typically close their connections with ActiveRecord is twofold. One, you don't have to because it just does it for you, and two, the API to close a particular connection is pretty convoluted.

So how is the connection closed today? There are two ways, the easy way and the hard way.

The easy way

The easy way is good enough in a non-threaded application. A rack middleware clears out all of the connections at the end of the request. The source for clear_active_connections! is pretty simple. For each connection pool in the system (remember it's one pool per AR class and connection spec), release that connection:

# Returns any connections in use by the current thread back to the pool,
# and also returns connections to the pool cached by threads that are no
# longer alive.
def clear_active_connections!
  @connection_pools.each_value {|pool| pool.release_connection }
end

Each pool releases the connection it has using the current_connection_id (which happens to be the current thread id):

# Signal that the thread is finished with the current connection.
# #release_connection releases the connection-thread association
# and returns the connection to the pool.
def release_connection(with_id = current_connection_id)
  conn = @reserved_connections.delete(with_id)
  checkin conn if conn
end

Not bad. But what if our system has multiple threads?

The hard way

Believe it or not, the connection pool in ActiveRecord will check in connections in the checkout method. Let me say that again: the checkout method checks in connections and checks out connections. If you're not facepalming yet, let's look at a small part of the checkout method:

@queue.wait(@timeout)

if(@checked_out.size < @connections.size)
  next
else
  clear_stale_cached_connections!
  if @size == @checked_out.size
    raise ConnectionTimeoutError, "could not obtain a database connection#{" within #{@timeout} seconds" if @timeout}. The max pool size is currently #{@size}; consider increasing it."
  end
end

This bit of the checkout method is not called unless our connection pool has become full. First we wait for other threads to check in their connection. While we're waiting, if other threads checked in their connection, the first branch of the if statement executes, and a connection is returned. If no threads have checked in their connection, we call clear_stale_cached_connections!:

def clear_stale_cached_connections!
  keys = @reserved_connections.keys - Thread.list.find_all { |t|
    t.alive?
  }.map { |thread| thread.object_id }
  keys.each do |key|
    checkin @reserved_connections[key]
    @reserved_connections.delete(key)
  end
end

This method walks through every thread in your system, looking for connections that were allocated to threads that no longer exist. Then it checks in connections associated with those dead threads. Since there is really no easy way for users to check in their own connections, this is actually a common code path for systems that use threads.

Why is this bad?

It should be pretty clear why this behavior is bad. Walking through every thread in the system, and asking if it's alive isn't very cheap. Even worse is that we're coupling ourselves to the threading system. We cannot change the connection pool to work with other concurrency solutions (like Fibers) because those solutions may not give us the introspection we need to perform this operation!

But really, this is treating a symptom. The real problem is that checking in connections is too difficult, so people don't do it.

How can we fix this?

I think the best solution for this is to mimic the File API. If we do this, it will become natural for people dealing with the database connection to actually close the connection.

We should make ActiveRecord::Base.connection consult a thread local. That thread local is set in the rack middleware where the connection is opened. If someone creates a new thread, they must populate that thread local, and close the connection at the end of the thread.

Simplified, our middleware would become something like this:

class ConnectionManagement
  def call env
    spec       = ActiveRecord::Base.spec
    connection = ActiveRecord::ConnectionPool.open spec
    ActiveRecord::Base.connection = connection

    @app.call env

    connection.close
  end
end

When people create a new thread, it would look something like this:

Thread.new do
  spec = ActiveRecord::Base.spec
  ActiveRecord::ConnectionPool.open(spec) do |connection|
    ActiveRecord::Base.connection = connection

    # do some stuff
  end
end

What does this buy us?

This buys us two important things: simple connection pool management, and freedom of choice on our concurrency model.

omg the end.

I hope I've convinced you that by simply learning to treat our database connection like a file, we can reduce code complexity and at the same time increase the features available. I think I can add this feature to Rails 3.2 and mostly maintain backwards compatibility. I think we can keep 100% backwards compatibility if we add some sort of flag like config.i_suck_and_will_not_close_my_database_connections = true or, config.my_app_is_awesome = true.

Anyway, I'm totally sick and I'll stop blllluuurrrrggghhhing now.

<3 <3 <3 <3 <3