Category: rails

Connection Management in ActiveRecord

Posted by – October 20, 2011

OMG! Happy Thursday! I am trying to be totally enthusiastic, but the truth is that I have a cold, so there will be fewer uppercase letters and exclamation points than usual.

Anyway, I want to talk about database connection management in ActiveRecord. I am not too pleased with its current state of affairs. I would like to describe how ActiveRecord connection management works today, how I think it should work, and steps towards fixing the current system.

TL;DR: database connection API in ActiveRecord should be more similar to File API

Thinking in terms of files

It’s convenient to think of our database connection as a file. Dealing with files is very common. When we work with files, the basic sequence goes something like this:

  • Open the file
  • Do some work on the file handle
  • Close the file

We’re very used to doing these steps when dealing with files. Typically our code will look something like this:

File.open('somefile.txt', 'wb') do |fh| # Open the file
  fh.write "hello world"                # Do some work with the file
end                                     # Close file when block returns

We don’t want to share open files among threads because dealing with synchronization around reading and writing to the file is too difficult (and time consuming). So maybe we’ll store the handle in a thread local or something until we’re ready to close it.

Our basic requirements for dealing with a database connection are essentially the same as when dealing with files. We need to open our database connection, do some work with the connection (send and receive queries), and close the connection. We have these similarities, yet the API for dealing with database connections in ActiveRecord is vastly different. Let’s look at how each of these steps are performed in ActiveRecord today.

Opening a connection

Opening a connection to the database is very easy. First we configure ActiveRecord with the database specification, then we call connection to actually get back a database handle:

ActiveRecord::Base.establish_connection(
  :adapter  => "sqlite",
  :database => "path/to/dbfile")

connection_handle = ActiveRecord::Base.connection

The main difference between this API and the File API is that we’ve separated the connection specification from actually opening the connection. In the case of opening a file, we call open along with a “specification” which includes the file name and how we want to open it. In this case, we’ve separated the two; essentially storing the specification in a global place, then opening the connection later.

This leads to two questions:

  1. Where is the specification stored?
  2. When I call connection, what specification is used?

The answer to the first question can be found by reading the establish_connection method. Specifically if we look at line 63 we’ll find a clue. Since this method is a class method, the call to name returns the class name of the recipient. This name (along with our actual spec) is passed in to the connection handler object. If we jump through a few more layers of indirection, we’ll find that what we have is essentially a one to one mapping of class name to connection specification.

Armed with this information, we can tackle the second question. If we look at the implementation of connection, it calls retrieve_connection on itself, which calls retrieve_connection on the connection handler with itself. A few more method calls later, and we see that each ActiveRecord subclass walks up the inheritance tree looking for a connection:

def retrieve_connection_pool(klass)
  pool = @connection_pools[klass.name]
  return pool if pool
  return nil if ActiveRecord::Base == klass
  retrieve_connection_pool klass.superclass
end

If we read this code carefully, we’ll notice that not only are connection specifications mapped to classes so are database connections!

Why is this bad?

This behavior smells bad to me. The reason is because we’re tightly coupling classes along with database connections when really this relationship doesn’t need to exist.

How can it be improved?

If this tight coupling is removed, the complexity of ActiveRecord can be reduced and at the same time increasing the features available! The way we can reduce this coupling is by passing the connection specification to the method that actually opens the connection. Specifications can be stored on each class as a convenience, but nothing more.

What if opening a connection looked more like this?

spec = ActiveRecord::Base.specificiation
ActiveRecord::ConnectionPool.open(spec) do |conn|
  ...
end

We could maintain the current behavior by storing specifications on each class, but eliminate the coupling between connection and class. We would be able to delete all of the code that looks up connections by class hierarchy, and open the doors to having features like this:

spec = database_a
ActiveRecord::ConnectionPool.open(spec) do |conn|
  User.find_all
end

spec = database_b
ActiveRecord::ConnectionPool.open(spec) do |conn|
  User.find_all
end

Working with the connection

Working with our connection should remain the same. We have one place to retrieve our connection and work with it. Woo!

Dealing with thread safety

Sharing open file handles among threads probably isn’t a good idea and the same can be said about open database connections. So how does ActiveRecord keep connections localized to one thread? If we jump through many, many, method calls, we’ll find where the connection is actually checked out of the connection pool. It is here we see how thread safety is handled:

# Retrieve the connection associated with the current thread, or call
# #checkout to obtain one if necessary.
#
# #connection can be called any number of times; the connection is
# held in a hash keyed by the thread id.
def connection
  @reserved_connections[current_connection_id] ||= checkout
end

A hash is kept where the key is the current_connection_id. The implementation of current_connection_id looks up the current id. If the id isn’t set, it sets it to the object id of the current thread:

def current_connection_id #:nodoc:
  ActiveRecord::Base.connection_id ||= Thread.current.object_id
end

Next we look at the implementation of connection_id to find that it just gets and sets a thread local:

def connection_id
  Thread.current['ActiveRecord::Base.connection_id']
end

def connection_id=(connection_id)
  Thread.current['ActiveRecord::Base.connection_id'] = connection_id
end

These methods ensure that we have a one to one relationship of open connection and thread.

Closing the connection

Finally we reach our last step: closing the connection. How many of you have closed your connection to the database in ActiveRecord? My guess is that it’s very few. I think the reason people don’t typically close their connections with ActiveRecord is twofold. One, you don’t have to because it just does it for you, and two, the API to close a particular connection is pretty convoluted.

So how is the connection closed today? There are two ways, the easy way and the hard way.

The easy way

The easy way is good enough in a non-threaded application. A rack middleware clears out all of the connections at the end of the request. The source for clear_active_connections! is pretty simple. For each connection pool in the system (remember it’s one pool per AR class and connection spec), release that connection:

# Returns any connections in use by the current thread back to the pool,
# and also returns connections to the pool cached by threads that are no
# longer alive.
def clear_active_connections!
  @connection_pools.each_value {|pool| pool.release_connection }
end

Each pool releases the connection it has using the current_connection_id (which happens to be the current thread id):

# Signal that the thread is finished with the current connection.
# #release_connection releases the connection-thread association
# and returns the connection to the pool.
def release_connection(with_id = current_connection_id)
  conn = @reserved_connections.delete(with_id)
  checkin conn if conn
end

Not bad. But what if our system has multiple threads?

The hard way

Believe it or not, the connection pool in ActiveRecord will check in connections in the checkout method. Let me say that again: the checkout method checks in connections and checks out connections. If you’re not facepalming yet, let’s look at a small part of the checkout method:

@queue.wait(@timeout)

if(@checked_out.size < @connections.size)
  next
else
  clear_stale_cached_connections!
  if @size == @checked_out.size
    raise ConnectionTimeoutError, "could not obtain a database connection#{" within #{@timeout} seconds" if @timeout}. The max pool size is currently #{@size}; consider increasing it."
  end
end

This bit of the checkout method is not called unless our connection pool has become full. First we wait for other threads to check in their connection. While we’re waiting, if other threads checked in their connection, the first branch of the if statement executes, and a connection is returned. If no threads have checked in their connection, we call clear_stale_cached_connections!:

def clear_stale_cached_connections!
  keys = @reserved_connections.keys - Thread.list.find_all { |t|
    t.alive?
  }.map { |thread| thread.object_id }
  keys.each do |key|
    checkin @reserved_connections[key]
    @reserved_connections.delete(key)
  end
end

This method walks through every thread in your system, looking for connections that were allocated to threads that no longer exist. Then it checks in connections associated with those dead threads. Since there is really no easy way for users to check in their own connections, this is actually a common code path for systems that use threads.

Why is this bad?

It should be pretty clear why this behavior is bad. Walking through every thread in the system, and asking if it’s alive isn’t very cheap. Even worse is that we’re coupling ourselves to the threading system. We cannot change the connection pool to work with other concurrency solutions (like Fibers) because those solutions may not give us the introspection we need to perform this operation!

But really, this is treating a symptom. The real problem is that checking in connections is too difficult, so people don’t do it.

How can we fix this?

I think the best solution for this is to mimic the File API. If we do this, it will become natural for people dealing with the database connection to actually close the connection.

We should make ActiveRecord::Base.connection consult a thread local. That thread local is set in the rack middleware where the connection is opened. If someone creates a new thread, they must populate that thread local, and close the connection at the end of the thread.

Simplified, our middleware would become something like this:

class ConnectionManagement
  def call env
    spec       = ActiveRecord::Base.spec
    connection = ActiveRecord::ConnectionPool.open spec
    ActiveRecord::Base.connection = connection

    @app.call env

    connection.close
  end
end

When people create a new thread, it would look something like this:

Thread.new do
  spec = ActiveRecord::Base.spec
  ActiveRecord::ConnectionPool.open(spec) do |connection|
    ActiveRecord::Base.connection = connection

    # do some stuff
  end
end

What does this buy us?

This buys us two important things: simple connection pool management, and freedom of choice on our concurrency model.

omg the end.

I hope I’ve convinced you that by simply learning to treat our database connection like a file, we can reduce code complexity and at the same time increase the features available. I think I can add this feature to Rails 3.2 and mostly maintain backwards compatibility. I think we can keep 100% backwards compatibility if we add some sort of flag like config.i_suck_and_will_not_close_my_database_connections = true or, config.my_app_is_awesome = true.

Anyway, I’m totally sick and I’ll stop blllluuurrrrggghhhing now.

<3 <3 <3 <3 <3

Rack API is awkward

Posted by – March 3, 2011

TL;DR: Rack API is poor when you consider streaming response bodies.

ZOMG!!!! HAPPY THURSDAY!!!! Maybe I shouldn’t be so excited now. I want to talk about stuff I’ve been working on in Rails 3.1, and problems I’m encountering today. I want to use this blllurrrggghhh blog post to talk through through the problems I’ve been having, and to share the pain with others.

Pie is delicious!

One feature that would be useful to add to Rails is having a streaming response body. When Rails processes a response, the entire response is buffered in memory before it can be sent to the user. Some information like Content Length (among other things) is derived, and the response is sent.

Sometimes buffering a response is less than ideal. It would be nice if we could send the head tag along with any css or script includes to the browser as quickly as possible. Then the browser can download external resources while we’re still processing data on the server. If this were possible, total response time may remain the same, but the time to first byte would be decreased and the page would load faster as external resource can be downloaded in parallel.

This feature sounds great, but there are many things to think about before it can be implemented. We need to support infinite streams, chunked encoding, prevent header manipulation, ensure database connections, blah, blah blah.

Rack interface

I’m getting ahead of myself. Before we get to our ultimate “pie in the sky” streaming solution, let’s take a look at the Rack API. Rack defines an interface for writing web applications. A rack handler must respond to call which takes one parameter, the request environment. call must return a three item list of:

  • Response code
  • Headers
  • Body

The response code should be a number (like 200), the headers are a hash (like { ‘X-Omg’ => ‘hello!’ }). The body must respond to each and take a block. The body must yield a string to the block, and the string will be output to the client. Optionally, the body may respond to close, and rack will call close when output is complete.

An Example Rack application

Let’s write an example application. Our sample application will simulate an ERb page. We’ll add some sleep statements to simulate work happening during the ERb rendering process:

class FooApplication
  class ErbPage
    def to_a
      head = "the head tag"
      sleep(2)
      body = "the body tag"
      sleep(2)
      [head, body]
    end
  end

  def call(env)
    [200, {}, ErbPage.new.to_a]
  end
end

For the purposes of demonstration, we’ll be using a fake implementation of rack:

class FakeRack
  def serve(application)
    status, headers, body = application.call({})
    p :status  => status
    p :headers => headers

    body.each do |string|
      p string
    end

    body.close if body.respond_to?(:close)
  end
end

If we feed our application through FakeRack like this:

app  = FooApplication.new
rack = FakeRack.new

rack.serve app

We’ll see output from the rack application, and the total program run time is about 4 seconds:

$ time ruby foo.rb
{:status=>200}
{:headers=>{}}
"the head tag"
"the body tag"

real    0m4.008s
user    0m0.003s
sys     0m0.003s

Great! So far, no problem. Why don’t we add a middleware to time how long the response takes.

Rack Middleware

Rack Middleware is simply another Rack application. With Rack, we set up a linked list of middleware that eventually point to the real application. We give the head of the linked list to Rack, Rack calls call on the head of the list, and it is the list’s responsibility to call call on it’s link.

Here, we’ll write a Rack middleware to measure how long the “ERb render” takes and add a header indicating the response time.

class ResponseTimer
  def initialize(app)
    @app = app
  end

  def call(env)
    now                        = Time.now
    status, headers, body      = @app.call(env)
    headers['X-Response-Took'] = Time.now - now

    [status, headers, body]
  end
end

When we construct the ResponseTimer, we pass it the real application. Then we pass the response timer instance to rack:

app   = FooApplication.new
timer = ResponseTimer.new app
rack  = FakeRack.new

rack.serve timer

When rack calls call on the response timer, it records the current time, then calls call on the real application. When the real application returns, the response timer then adds a header with the time delta. The output of this program will look like this:

$ time ruby foo.rb
{:status=>200}
{:headers=>{"X-Response-Took"=>3.999937}}
"the head tag"
"the body tag"

real    0m4.010s
user    0m0.004s
sys     0m0.004s

Speeding up our response time

We’ve noticed a problem with our Rack application. When a client connects, it takes 4 seconds before they receive any data! It would be nice if we could feed our client the head tag ASAP so they can download external resources.

We know that Rack will call each and (depending on your webserver) immediately send data to the client. Rather than computing values in ERb ahead of time, we’ll compute them when Rack asks for them (when each is called).

Let’s refactor the ERb page to be lazy about calculating values:

class FooApplication
  class ErbPage
    def each
      head = "the head tag"
      yield head

      sleep(2)

      body = "the body tag"
      yield body

      sleep(2)
    end
  end

  def call(env)
    [200, {}, ErbPage.new]
  end
end

Now no values are calculated until rack calls each on our body. If we run the program, we’ll see output from the application more quickly than before.

However, the output is somewhat strange:

$ time ruby foo.rb
{:status=>200}
{:headers=>{"X-Response-Took"=>1.1e-05}}
"the head tag"
"the body tag"

real    0m4.032s
user    0m0.027s
sys     0m0.016s

The time command reports that our response was about 4 seconds. But our response header says that the response took nearly 0 seconds! Why is this?

If we look closely at our timer middleware, we can see it is only timing how long it took for call to return.

We cannot guarantee that any processing happened during the call method.

Let me say that again:

We cannot guarantee that any processing happened during the call method.

We wanted our response timer to time how long the ERb took to render, but really it is just timing how long the call method took.

ZOMG HOW FIX?!?

Iterating over the body

One way to fix is to iterate over the body. If the timer iterates over the body, then we can calculate the real time:

class ResponseTimer
  def initialize(app)
    @app = app
  end

  def call(env)
    now                        = Time.now
    status, headers, body      = @app.call(env)

    newbody = []
    body.each { |str| newbody << str }
    headers['X-Response-Took'] = Time.now - now

    [status, headers, newbody]
  end
end

But this solution is no good! Our response timer now buffers the response, and our client ends up waiting for 4 seconds before they get any data.

We know that Rack calls close on the body after it’s done processing the request. Why don’t we try hooking on that method?

Introducing a Proxy Object

One way we can hook on to the close method is by wrapping the response body in a proxy object. Then we can intercept calls made on the body and perform any work we need done:

class ResponseTimer
  class TimerProxy
    def initialize(body)
      @now     = Time.now
      @body    = body
    end

    def close
      @body.close if @body.respond_to?(:close)

      $stderr.puts({'X-Response-Took' => (Time.now - @now)})
    end

    def each(&block)
      @body.each(&block)
    end
  end

  def initialize(app)
    @app = app
  end

  def call(env)
    status, headers, body = @app.call(env)

    [status, headers, TimerProxy.new(body)]
  end
end

Wow! Suddenly our middleware is not so simple. This proxy solution is sub-optimal for a few reasons. We’re required to make a new object for every request, and our proxy object will add another stack frame between calls from rack to the response body. Even worse, every middleware that needs to do work after the response is finished must define this proxy object.

This solution does get the job done. If we look at the output from the program, we’ll see that the TimerProxy in fact measures ERb processing time correctly:

$ time ruby foo.rb
{:status=>200}
{:headers=>{}}
"the head tag"
"the body tag"
{"X-Response-Took"=>4.000268}

real    0m4.044s
user    0m0.029s
sys     0m0.015s

Diligent readers will note that the response time is no longer part of the response headers. This is because when the body is flushed, the headers must be flushed too. We no longer have the opportunity to add extra headers when each is called on the body.

Our solution isn’t too bad, but it actually isn’t complete. The full awkwardness of this API along with a complete solution can actually be felt (and read) in the Rack source itself.

Lady Gaga Solution

Another possible solution is to decorate the body using a module. We can define a module, then simply call extend on the body with the module:

class ResponseTimer
  def initialize(app)
    @app = app
  end

  def call(env)
    status, headers, body      = @app.call(env)
    body.extend(Module.new {
      now = Time.now

      define_method(:close) do
        super if defined?(super)

        $stderr.puts({'X-Response-Took' => (Time.now - now)})
      end
    })

    [status, headers, body]
  end
end

The body is extended with an anonymous module. During module definition, the time is recorded. We use define_method because it uses a lambda which will keep a reference to the previously calculated time. In the close method, we call super if it’s defined, then output our time.

This example also works, but has a few downsides. It is different than previous examples because we are timing only the ERb rendering and not call plus ERb rendering. Using this solution, we’re required to create a new module on every request, and also break method caching on every request. Similar to the proxy object solution, we must create a new module and extend for every middleware that must to processing after the response is finished.

ZOMG YOUR EXAMPLE IS CONTRIVED

Yup. But I merely simplified a real world problem. As I mentioned earlier, you can see the awkwardness of this API in rack.

But now that we know about this problem, we can identify middleware that will break streaming responses. For example, Rails defines a middleware that checks connections back in to the connection pool. If our ERb in Rails was streaming, we would lose the database connection during ERb render. The same is true with the query cache in active record. Surely, these cannot be the only middleware that will break when a streaming body is used!

Lifecycle hooks

I think a good solution to this problem would be if Rack provided lifecycle hooks. A Place where we can say “run this when the response is done”. We can define something like that today using middleware:

class EndOfLife
  attr_reader :callbacks

  def initialize(app)
    @app       = app
    @callbacks = []
  end

  def call(env)
    status, headers, body = @app.call(env)
    body.extend(Module.new {
      attr_accessor :eol

      def close
        super if defined?(super)
        eol.callbacks.each { |cb| cb.call }
      end
    })
    body.eol = self

    [status, headers, body]
  end
end

app = FooApplication.new
eol = EndOfLife.new app
eol.callbacks << lambda { puts "it finished!" }

rack  = FakeRack.new

rack.serve eol

This keeps us from defining many proxy objects or module extensions during a response. We only define one module extension, and hook any “end of life” hooks on to this instance. The downside is that we cannot guarantee the position of this middleware in the middleware linked list. That means that the “end of life” middleware may not actually execute at the end of the response!

A “real” solution

Rack’s interface is simple, and I like that. The simplicity is attractive, but the API seems to fall on it’s face when we start talking about streaming web servers. If I remember correctly, Apache 1.0 modules suffered the same problems that Rack is presenting us today. Maybe we should look at Apache 2.0 buckets and filters and design our API using patterns from a project that has already solved this problem.

ZOMG I AM TIRED OF TYPING!!

I’m not happy with any of the solutions I’ve presented. All of them have downsides that I find unattractive. We can live with the downsides, but life will suck. If any of you dear readers have better solutions for me, I am all ears!

Thanks for listening, and HAVE A GREAT DAY!!!!

<3 <3 <3 <3 <3

Edit: I just noticed that Rack contains a “timer” middleware similar to the one I’ve implemented in this blog post. You can view the broken middleware here.

Easy Markup Validation

Posted by – June 12, 2009

I wanted a test helper that would assert that my XHTML was valid XHTML. So I wrote one and called it “markup_validity“. You can use it too, and I will show you how.

First, install the gem:

  $ sudo gem install markup_validity

Then, use it in your tests:

require 'test/unit'
require 'rubygems'
require 'markup_validity'

class ValidHTML < Test::Unit::TestCase
  def test_i_can_has_valid_xhtml
    assert_xhtml_transitional xhtml_document
  end
end

Oh. You use RSpec? It supports that too:

require 'rubygems'
require 'markup_validity'

describe "my XHTML document" do
  it "can has transitional xhtml" do
    xhtml_document.should be_xhtml_transitional
  end
end

Debugging invalid markup can be a pain. MarkupValidity tries to give you helpful errors to make your life easier. Say you have an invalid piece of XHTML like this:

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  </head>
  <body>
    <p>
      <p>
        Hello
      </p>
    </p>
  </body>
</html>

The error output from MarkupValidity will be this:

.Error on line: 2:
Element 'head': Missing child element(s). Expected is one of ( script, style, meta, link, object, isindex, title, base ).

1: <html xmlns="http://www.w3.org/1999/xhtml">
2:   <head>
3:   </head>
4:   <body>
5:     <p>

Error on line: 6:
Element 'p': This element is not expected. Expected is one of ( a, br, span, bdo, object, applet, img, map, iframe, tt ).

5:     <p>
6:       <p>
7:         Hello
8:       </p>
9:     </p>

MarkupValidity provides a few assertions for test/unit:

  • assert_xhtml_transitional(xhtml) for asserting valid transitional XHTML
  • assert_xhtml_strict(xhtml) for asserting valid strict XHTML
  • assert_schema(schema, xml) for asserting that your xml validates against a schema
  • assert_xhtml which is an alias for assert_xhtml_transitional

The methods provided for RSpec are quite similar:

  • be_xhtml_transitional for asserting valid transitional XHTML
  • be_xhtml_strict for asserting valid strict XHTML
  • be_valid_with_schema(schema) for asserting that your xml validates against a schema
  • be_xhtml which is an alias for be_xhtml_transitional

MarkupValidity even works well with rails. Here is an example rails controller test:

require 'test_helper'
require 'markup_validity'

class AwesomeControllerTest < ActionController::TestCase
  test "valid markup" do
    get :new
    assert_xhtml_transitional @response.body
  end
end

Nokogiri’s Slop Feature

Posted by – December 4, 2008

Oops! When I released nokogiri version 1.0.7, I totally forgot to talk about Nokogiri::Slop() feature that was added. Why is it called “slop”? It lets you sloppily explore documents. Basically, it decorates your document with method_missing() that allows you to search your document via method calls.

Given this document:

doc = Nokogiri::Slop(<<-eohtml)
<html>
  <body>
    <p>hello</p>
    <p class="bold">bold hello</p>
  <body>
</html>
eohtml

You may look through the tree like so:

doc.html.body.p('.bold').text # => 'bold hello'

The way this works is that method missing is implemented on every node in the document tree. That method missing method creates an xpath or css query by using the method name and method arguments. This means that a new search is executed for every method call. It’s fun for playing around, but you definitely won’t get the same performance as using one specific CSS search.

My favorite part is that method missing is actually in the slop decorator. When you use the Nokogiri::Slop() method, it adds the decorator to a list that gets mixed in to every node instance at runtime using Module#extend. That lets me have sweet method missing action, without actually putting method missing in my Node class.

Here is a simplified example:

module Decorator
  def method_a
    "method a"
  end

  def method_b
    "method b: #{super}"
  end
end

class Foo
  def method_b
    "inside foo"
  end
end

foo = Foo.new
foo.extend(Decorator)

puts foo.method_a # => 'method a'
puts foo.method_b # => 'method b: inside foo'

foo2 = Foo.new
puts foo2.method_b # => 'inside foo'
puts foo2.method_a # => NoMethodError

Module#extend is used to add functionality to the instance ‘foo’, but not ‘foo2′. Both ‘foo’ and ‘foo2′ are instances of Foo, but using Module#extend, we can conditionally add functionality without monkey patching and keeping a clean separation of concerns. You can even reach previous functionality by calling super.

But wait! There’s more! You can stack up these decorators as much as you want. For example:

module AddAString
  def method
    "Added a string: #{super}"
  end
end

module UpperCaseResults
  def method
    super.upcase
  end
end

class Foo
  def method
    "foo"
  end
end

foo = Foo.new
foo.extend(AddAString)
foo.extend(UpperCaseResults)

puts foo.method # => 'ADDED A STRING: FOO'

Conditional functionality added to methods with no weird “alias method chain” involvement. Awesome!

I love ruby!

Write your Rails view in……. JavaScript?

Posted by – May 6, 2008

In my last post about Johnson, I said that next time I would talk about the JavaScript parse tree that Johnson provides. Well, I changed my mind. Sorry.

I want to write about a rails plugin that I added to Johnson. Brohuda Katz wrote an ERb type parser in JavaScript, and added it to the (yet to be released) Johnson distribution. With that in mind, and looking at the new template handlers in edge rails, I was able to throw together a rails plugin that allows me to use JavaScript in my rails view code.

Lets get to the code. Here is my controller:

class JohnsonController < ApplicationController
  def index
    @users = User.find(:all)
  end
end

And my EJS view (the file is named index.html.ejs):

<% for(var user in at.users) { %>
  <%= user.first_name() %><br />
<% } %>

The johnson rails plugin puts controller instance variables in to a special javascript variable called “at”. The “at” variable is actually a proxy to the controller, lazily fetching instance variables from the controller and importing those objects in to javascript land.

Lets take a look at the plugin, its only a few lines:

class EJSHandler < ActionView::TemplateHandler
  class EJSProxy # :nodoc:
    def initialize(controller)
      @controller = controller
    end

    def key?(pooperty)
      @controller.instance_variables.include?("@#{pooperty}")
    end

    def [](pooperty)
      @controller.instance_variable_get("@#{pooperty}")
    end

    def []=(pooperty, value)
      @controller.instance_variable_set("@#{pooperty}", value)
    end
  end

  def initialize(view)
    @view = view
  end

  def render(template)
    ctx = Johnson::Context.new
    ctx.evaluate('Johnson.require("johnson/template");')
    ctx['template'] = template.source
    ctx['controller'] = @view.controller
    ctx['at'] = EJSProxy.new(@view.controller)

    ctx.evaluate('Johnson.templatize(template).call(at)')
  end
end

ActionView::Template.register_template_handler("ejs", EJSHandler)

When the template gets rendered (the render method), I wrap the controller with an EJS proxy, then compile the template into a javascript function, and call that function. The “at” variable is set to the EJSProxy before executing the template, and all property accessing on the “at” variable is passed along to fetching instance variables from the controller.

Server side javascript coding in rails. Weird, eh?

Profiling Database Queries in Rails

Posted by – March 13, 2008

Despite the recent Ruby webserver speed contests, most of the slowness at my job results from slow (or too many) database queries.

To help keep database queries down, I added a stats to every page that shows the number of queries vs. cache hits, the number of rows returned, and the amount of data transferred from the database. In this screenshot I’m using the “live” environment, 3 cache hits, 169 misses, 577 rows returned, and 458.9k data transferred. Clicking the box hides it, and clicking “Super Hide!” hides the box and sets a cookie so that the box doesn’t show up again for a while.

Debug Window

To get this working, first I monkey patch the MysqlAdapter to collect database stats:

  ActiveRecord::ConnectionAdapters::MysqlAdapter.module_eval do
    @@stats_queries = @@stats_bytes = @@stats_rows = 0

    def self.get_stats
      { :queries => @@stats_queries,
        :rows => @@stats_rows,
        :bytes => @@stats_bytes }
    end
    def self.reset_stats
      @@stats_queries = @@stats_bytes = @@stats_rows = 0
    end

    def select_with_stats(sql, name)
      bytes = 0
      rows = select_without_stats(sql, name)
      rows.each do |row|
        row.each do |key, value|
          bytes += key.length
          bytes += value.length if value
        end
      end
      @@stats_queries += 1
      @@stats_rows += rows.length
      @@stats_bytes += bytes
      rows
    end
    alias_method_chain :select, :stats
  end

Next I patched the QueryCache to keep track of hits and misses:

  ActiveRecord::ConnectionAdapters::QueryCache.module_eval do
    @@hits = @@misses = 0

    def self.get_stats
      { :hits => @@hits,
        :misses => @@misses }
    end
    def self.reset_stats
      @@hits = @@misses = 0
    end

    def cache_sql_with_stats(sql, &block)
      if @query_cache.has_key?(sql)
        @@hits += 1
      else
        @@misses += 1
      end
      cache_sql_without_stats(sql, &block)
    end
    alias_method_chain :cache_sql, :stats
  end

Then modify ActionController to reset stats for each request:

  ActionController::Base.module_eval do
    def perform_action_with_reset
      ActiveRecord::ConnectionAdapters::MysqlAdapter::reset_stats
      ActiveRecord::ConnectionAdapters::QueryCache::reset_stats
      perform_action_without_reset
    end

    alias_method_chain :perform_action, :reset

    def active_record_runtime(runtime)
      stats = ActiveRecord::ConnectionAdapters::MysqlAdapter::get_stats
      "#{super} #{sprintf("%.1fk", stats[:bytes].to_f / 1024)} queries: #{stats[:queries]}"
    end
  end

Just drop all that inside the after_initialize in your development.rb and you’ll get the nice stats. After that, just create a partial that displays the stats and include the partial at the bottom of your layout. Our partial looks like this:

<% unless %w(production test).include?(RAILS_ENV) -%>
  <h4 id="debug" onclick="$(this).remove()" style="background:pink;text-align:center;position:absolute;top:16px;left:35%;padding:0.5em;border: 2px solid red;">
  <%= RAILS_ENV %>
  <br />
  <% if ActiveRecord::ConnectionAdapters::QueryCache.respond_to?(:get_stats) %>
    <% stats = ActiveRecord::ConnectionAdapters::QueryCache.get_stats %>
    Queries: <%= stats[:hits] %> / <%= stats[:misses] %> /
    <%= number_to_percentage((stats[:hits].to_f / (stats[:hits] + stats[:misses])) * 100, :precision => 0) %>
    |
  <% end %>
  <% if ActiveRecord::ConnectionAdapters::MysqlAdapter.respond_to?(:get_stats) %>
    <% stats = ActiveRecord::ConnectionAdapters::MysqlAdapter.get_stats %>
    Rows: <%= stats[:rows] %> |
    Transfer: <%= sprintf("%.1fk", stats[:bytes].to_f / 1024) %>
  <% end %>
  <p style="margin:0">
    <a style="color:magenta" href="#" onclick="superHide()">super hide!</a>
  </p>
  </h4>
  <script type="text/javascript">
    function superHide() {
      document.cookie = 'debug=hidden; path=/; domain=<%= request.host %>; max-age=14400';
    }
    if(document.cookie.indexOf('debug=hidden') != -1) {
      $('debug').hide();
    }
  </script>
<% end -%>

It’s a little work, but it helps keep my mind on reducing the queries. With enough work, one of these days the speed of the webserver will matter to me. Thanks to Adam Doppelt for the basis of this monkey patch. Any bugs are mine, not his!

Trigger Happy

Posted by – March 1, 2007

I’ve just released “Trigger Happy“, a rails plugin that adds support for triggers in your Active Record Migrations. To install the plugin just do this:


script/plugin install svn://rubyforge.org/var/svn/artriggers/trunk/trigger_happy
[/sourcecode]

To add a trigger do this:

add_trigger "ai_people",
  :o n         => 'people',
  :timing     => 'after',
  :event      => 'insert',
  :statement  => 'INSERT INTO log (id, timestamp) VALUES (NEW.id, NOW())'

To drop a trigger do this:

drop_trigger "ai_people"

It only supports mysql for now, but I plan on having other database supported in the future.

Audit Logs with ActiveRecord

Posted by – March 1, 2007

I’ve been trying to create an audit log for a few (12) tables, and unfortunately ActiveRecord seems to be falling flat for what I want to do. First I’ll describe whats out there already to do this, then I’ll talk about what I had to do.

There are a couple nice plugins out there for helping you keep track of changes to your records. The first one is acts_as_versioned. acts_as_versioned will copy your record to a version table whenever there is an insert or update, then increase a version number in the original table. It also automatically adds a list of versions to your original model, and lets you revert to any particular version. This is great, and almost exactly what I needed. I didn’t really need the ability to revert to a version, but that just seemed like gravy on top. The only problem with acts_as_versioned comes in when you try to keep track of changes to habtm relationships. But this is where the second plugin comes in to play.

The second plugin is called acts_as_versioned_association. This plugin will help you keep track of changes to your relationships. acts_as_versioned_association is built on top of acts_as_versioned. They way it works is by setting the owner model to acts_as_versioned, then when any associations are updated, it writes a new version of the owner model, then writes the associations to a associations version table. So if you were to have an Article model that has and belongs to many Documents, you would need 5 tables to represent that:

  1. articles
  2. articles_versions
  3. documents
  4. articles_documents
  5. articles_documents_versions

If the association changes, a record is written to articles (to increase the version number), articles_versions, articles_documents, and articles_documents_versions. But what happens if Article has 10 versioned habtm relationships, and just one of those relationships changes? Then a record will get written to every habtm version table for just one change. Thats 12 writes for a change to just one relationship. That will not scale….

Fortunately I don’t care about reverting to a previous version. All I care about is what changed, and when. So my favorite solution for this problem is to add triggers to the tables that may change. That way I only get one extra write when a relationship changes. Just copy the row to another table with a timestamp and an action.

But what about has_many :through?

has_many :through allows you to put a model on top of the join table. Then I could just drop acts_as_versioned on top of that model and be done. I would have used this solution except that I ran in to a bug. has_many :through does not support all of the same array manipulation that habtm does. For example, you can append (<<) to has_many :through and habtm, but the clear method does not work the same way on has_many :through. Also, has_many :through does not set an attribute= method like habtm does.