2013-05-21 @ 15:04

One Danger of Freedom Patches

I posted a benchmark on twitter about comparing a DateTime with a string. This is a short blurrrggghhh post about the benchmark and why there is such a performance discrepancy.

Here is the benchmark:

require 'benchmark/ips'
require 'active_support/all' if ENV['AS']
require 'date'

now = DateTime.now

Benchmark.ips do |x|
  x.report("lhs") { now == "foo" }
  x.report("rhs") { "foo" == now }
end

First we’ll run the benchmark without Active Support, then we’ll run the benchmark with Active Support.

The Benchmarks

Without Active Support

[aaron@higgins rails (master)]$ bundle exec ruby argh.rb 
Calculating -------------------------------------
                 lhs     57389 i/100ms
                 rhs     76222 i/100ms
-------------------------------------------------
                 lhs  2020064.6 (±14.7%) i/s -    9870908 in   5.015172s
                 rhs  3066573.4 (±13.2%) i/s -   15091956 in   5.012879s
[aaron@higgins rails (master)]$

With Active Support

[aaron@higgins rails (master)]$ AS=1 bundle exec ruby argh.rb 
Calculating -------------------------------------
                 lhs      4786 i/100ms
                 rhs     26327 i/100ms
-------------------------------------------------
                 lhs    62858.4 (±23.6%) i/s -     296732 in   5.019005s
                 rhs  2866546.6 (±26.6%) i/s -   13031865 in   4.996482s
[aaron@higgins rails (master)]$

Numbers!

In the benchmarks without Active Support, the performance is fairly close. The standard deviation is pretty big, but the numbers are within the ballpark of each other.

In the benchmarks with Active Support, the difference is enormous. It’s not even close. Why is this?

What is the deal?

This speed difference is due to a Freedom Patch that Active Support applies to the DateTime class:

class DateTime
  # Layers additional behavior on DateTime#<=> so that Time and
  # ActiveSupport::TimeWithZone instances can be compared with a DateTime.
  def <=>(other)
    super other.to_datetime
  end
end

DateTime includes the Comparable module which will call the <=> method whenever you call the == method. This Freedom Patch calls to_datetime on whatever is on the right hand side of the comparison. Rails Monkey Patches the String class to add a to_datetime method, but “foo” is not a valid Date, so it raises an exception.

The Comparable module rescues any exception that happens inside <=> and returns false. This means that any time you call DateTime#== with something that doesn’t respond to to_datetime, an exception is raised and immediately thrown away.

The original implementation just does object equality comparisons, returns false, and it’s done. This is why the original implementation is so much faster than the implementation with the Freedom Patch.

My 2 Cents

These are the dangers of Freedom Patching. As a Rails Core member, I know this is a controversial opinion, but I am not a fan of Freedom Patching. It seems convienient until one day you wonder why is this code:

date == "foo"

so much slower than this code?

"foo" == date

Freedom Patching hides complexity behind a familiar syntax. It flips your world upside down; making code that seems reasonable do something unexpected. When it comes to code, I do not like the unexpected.

EDIT: I clarified the section about strings raising an exception. The actual exception occurrs in another monkeypatch in Rails.

read more »

2013-03-03 @ 16:52

Dynamic Method Definitions

TL;DR: depending on your app, using define_method is faster on boot, consumes less memory, and probably doesn’t significantly impact performance.

Throughout the Rails code base, I typically see dynamic methods defined using class_eval. What I mean by “dynamic methods” is methods with names or bodies that are calculated at runtime, then defined.

For example, something like this:

class Foo
  class_eval <<EORUBY, __FILE__, __LINE__ + 1
    def wow_#{Time.now.to_i}
      # ...
    end
  EORUBY
end

I’m not sure why they are define this way versus using define_method. Why don’t we compare and contrast defining methods using class_eval and define_method?

The tests I’ll do here use MRI, Ruby 2.0.0.

Definition Performance

When defining a method, is it faster to use class_eval or define_method? Here is a trivial benchmark where we simulate defining 100,000 methods:

require 'benchmark'

GC.disable

N = 100000
Benchmark.bm(13) do |x|
  x.report("define_method") {
    class Foo
      N.times { |i| define_method("foo_#{i}") { } }
    end
  }

  x.report("class_eval") {
    class Bar
      N.times { |i|
        class_eval <<-eoruby, __FILE__, __LINE__ + 1
          def bar_#{i}
          end
        eoruby
      }
    end
  }
end

Results on my machine:

$ ruby test.rb
                    user     system      total        real
define_method   0.290000   0.030000   0.320000 (  0.318222)
class_eval      1.300000   0.120000   1.420000 (  1.518075)

The class_eval version seems to be much slower than the define_method version.

Why is definition performance different?

The reason performance is different is that on each call to class_eval, MRI creates a new parser and parses the string. In the define_method case, the parser is only run once.

We can see when the parser executes using DTrace. We will compare two programs, one with class_eval:

class Foo
  5.times do |i|
    class_eval "def f_#{i}; end", __FILE__, __LINE__
  end
end

and one with define_method:

class Foo
  5.times do |i|
    define_method("f_#{i}") { }
  end
end

Using DTrace, we can monitor the parse-begin probe which fires before MRI runs it’s parser and compiles instruction sequences:

ruby$target:::parse-begin
/copyinstr(arg0) == "test.rb"/
{
  printf("%s:%d\n", copyinstr(arg0), arg1);
}

Run DTrace using the define_method program:

$ sudo dtrace -q -s x.d -c"$(rbenv which ruby) test.rb"
test.rb:1

Now run again with the class_eval version:

$ sudo dtrace -q -s x.d -c"$(rbenv which ruby) test.rb"
test.rb:1
test.rb:3
test.rb:3
test.rb:3
test.rb:3
test.rb:3

In the class_eval version, the parser runs and compiles instruction sequences 6 times, where the define_method case only runs once.

Call speed

It seems it’s faster to define methods via define_method, but which method is faster to call? Let’s try with a trivial example:

require 'benchmark/ips'

GC.disable

class Foo
  define_method("foo") { }
  class_eval 'def bar; end'
end

Benchmark.ips do |x|
  foo = Foo.new
  x.report("class_eval") { foo.bar }
  x.report("define_method") { foo.foo }
end

Here are the results on my machine:

$ ruby test.rb
Calculating -------------------------------------
          class_eval    115154 i/100ms
       define_method    106872 i/100ms
-------------------------------------------------
          class_eval  7454955.2 (±5.0%) i/s -   37194742 in   5.004418s
       define_method  5061216.4 (±5.2%) i/s -   25221792 in   5.000041s

Clearly methods defined with class_eval are faster. But does it matter? Let’s try a test where we add a little work to each method:

require 'benchmark/ips'

GC.disable

class Foo
  define_method("foo") { 10.times.map { "foo".length } }
  class_eval 'def bar; 10.times.map { "foo".length }; end'
end

Benchmark.ips do |x|
  foo = Foo.new
  x.report("define_method") { foo.foo }
  x.report("class_eval") { foo.bar }
end

Running these on my machine, I get:

$ ruby test.rb
Calculating -------------------------------------
       define_method     23949 i/100ms
          class_eval     23015 i/100ms
-------------------------------------------------
       define_method   261039.7 (±6.3%) i/s -    1317195 in   5.066215s
          class_eval   228819.7 (±12.2%) i/s -    1150750 in   5.286635s

A small amount of work is enough to overcome the performance difference between them.

How about memory consumption?

Let’s compare class_eval and define_method on memory. We’ll use this program to compare maximum RSS for N methods:

N = (ENV['N'] || 100_000).to_i

class Foo
  N.times do |i|
    if ENV['EVAL']
      class_eval "def bar_#{i}; end"
    else
      define_method("bar_#{i}") { }
    end
  end
end

Here are the results (I’ve trimmed them a little for clarity):

$ EVAL=1 time -l ruby test.rb
        3.77 real         3.68 user         0.08 sys
 127389696  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
     38716  page reclaims
$ DEFN=1 time -l ruby test.rb
        0.69 real         0.63 user         0.05 sys
  69103616  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
     24487  page reclaims
$

The maximum RSS for the class_eval version is much higher than the define_method version. Why?

I mentioned earlier that the class_eval version instantiates a new parser and compiles the source. Each method definition in the class_eval version does not share instruction sequences, where the define_method version does.

Let’s verify this claim by using ObjectSpace.memsize_of_all!

Measuring Instructions

MRI will let us measure the total memory usage of the instruction sequences. Here we’ll modify the previous program to measure the instruction sequence size (in bytes) after defining many methods:

require 'objspace'

N = (ENV['N'] || 100_000).to_i

class Foo
  N.times do |i|
    if ENV['EVAL']
      class_eval "def bar_#{i}; end"
    else
      define_method("bar_#{i}") { }
    end
  end
end

GC.start

p ObjectSpace.memsize_of_all(RubyVM::InstructionSequence)

Let’s see the difference:

$ EVAL=1 ruby test.rb
44718112
$ DEFN=1 ruby test.rb
718112

Growth Rate

Now let’s see the growth rate between the two. Here is the growth rate for the class_eval case:

$ N=100 EVAL=1 ruby test.rb
762112
$ N=1000 EVAL=1 ruby test.rb
1158112
$ N=10000 EVAL=1 ruby test.rb
5118112
$ N=100000 EVAL=1 ruby test.rb
44718112

Now let’s compare to the define_method case:

$ N=100 DEFN=1 ruby test.rb
718112
$ N=1000 DEFN=1 ruby test.rb
718112
$ N=10000 DEFN=1 ruby test.rb
718112
$ N=100000 DEFN=1 ruby test.rb
718112

The memory consumed by instruction sequences in the class_eval case continually grows, where in the define_method case it does not. MRI reuses the instruction sequences in the case of define_method, so we see no growth.

Caution

Defining methods with define_method is faster, consumes less memory, and depending on your application isn’t significantly slower than using a class_eval defined method. So what is the down side?

Closures

The main down side is that define_method creates a closure. The closure could hold references to large objects, and those large objects will never be garbage collected. For example:

class Foo
  x = "X" * 1024000 # Not GC'd
  define_method("foo") { }
end

class Bar
  x = "X" * 1024000 # Is GC'd
  class_eval("def foo; end")
end

The closure could access the local variable x in the Foo class, so that variable cannot be garbage collected.

When using define_method be careful not to hold references to objects you don’t care about.

THE END

I hope you enjoyed this! <3<3<3<3

read more »

2013-02-06 @ 11:22

YAML f7u12

YAML seems to be getting a bad rap lately, and I’m not surprised. YAML was used as the attack vector to execute arbitrary code in a Rails process and was even used to steal secrets from rubygems.org.

Let’s try to dissect the attack vector used, and see how YAML fits in to the picture.

The Metasploit Exploit

First lets cover the most widely known vector. We (the Rails Security Team) have had reports of attempts to use the exploit on several websites, it’s in metasploit, and a variant was used to attack rubygems.org.

The Troubled Code

I’m going to boil down the code involved in order to make the attack more easy to digest. In Rails, there is a class defined that basically boils down to this definition:

class Helpers
  def initialize
    @module = Module.new
  end

  def []=(key, value)
    @module.module_eval <<-END_EVAL
      def #{value}(*args)
        # ... other stuff
      end
    END_EVAL
  end
end

This class defines routing helper methods on a module, and later this module is mixed in to your views. Let’s take a look at how to use this code to teach Linux Zealot an important lesson in security.

Exploitation

Our attacker knows that this class is defined in the system. Using YAML, along with Psych’s object deserialization, they can inject any object in to the system they choose. So how can they use this object? Let’s take a look at the YAML payload for exploiting this code, then talk about how it works:

--- !ruby/hash:Helpers
foo: |-
  mname; end; puts 'hello!'; def oops

We can clearly see the Ruby code in this YAML, but how does it get executed?

When Psych looks at the type declared !ruby/hash:Helpers, it says “ah, this is a subclass of a Ruby hash with the class of Helpers. So, it allocates a new Helpers class, then calls the []= method for each of the key value pairs in the YAML.

In this case, the key value pair is:

['foo', "mname; end; puts 'hello!'; def oops"]

Let’s take the value passed in, and do string substitution in the module_eval part of the code:

def mname; end; puts 'hello!'; def oops(*args)
  # ... other stuff
end

It’s kind of hard for Humans to read, so let’s add some newlines:

def mname
end

puts 'hello!'

def oops(*args)
  # ... other stuff
end

And now it should be pretty clear how an attacker can execute arbitrary code.

How do we fix this?

We have a few options for fixing this.

  1. Replace our module_eval with a define_method
  2. Change Psych to check super classes and ensure it’s a hash
  3. Stop using normal Ruby methods to set hash key / value pairs

Let’s say we did all of these. Are we safe? No.

Proxy Exploits

This exploit was reported to the Rails Security team by Ben Murphy. It uses “proxy objects” (basically anything with a method_missing implementation that calls to send) to execute arbitrary code on the remote server.

The Troubled Code

Again, this example is boiled down to make it easier to understand. Let’s say our system has a proxy object like this:

class Proxy
  def initialize(server, arg1, meth)
    @server = server
    @arg1   = arg1
    @meth   = meth
  end

  def method_missing(mid, *args)
    @server.send(@meth, @arg1, *args)
  end
end

This proxy is used to forward messages to some other object (in this case @server).

Exploitation

Our attacker knows that this class exists in our system. However, merely instantiating this object isn’t enough. The application code must call a method (any method) on this object before the system is compromised. In the case of Rails, the attacker knows that the YAML objects will be embedded in the parameters hash, so things like:

params[:foo].nil?
params[:foo].length
### etc

will set this one off. So what does the YAML payload look like?

--- !ruby/object:Proxy
server: !ruby/module 'Kernel'
meth: eval
arg1: puts :omg

This particular form is a normal object load. However, the attacker sets the instance variables on the proxy object to convenient values. When the proxy is instantiated, @server will be Kernel, @meth will be "eval", and @arg1 will be "puts :omg".

When the application calls any method on the object, method_missing will trigger. If we expand out the instance variables in method_missing, it will look like this:

Kernel.send("eval", "puts :omg")

Again we can see that arbitrary code can be executed.

This exploit is arguably more insidious than the previous exploit. In the previous exploit it was obvious that the app code that contained a call to eval could possibly execute arbitrary code. This exploit was able to trick our system in to calling eval even though the app code never explicitly eval’d anything!

You might be thinking “this proxy can’t be common”. Unfortunately there are classes similar to this in XMLRPC as well as Rack via a combination of ERB templates.

How do we fix this?

We applied our previous fixes, but our code was still susceptible to attacks. What do we do now? Why don’t we change Psych to:

  1. Only accept whitelisted classes
  2. But still allow Ruby “primitives” by default

Are we safe now? I guess that depends on what you mean by “safe”. Assuming that no object ever evals anything that’s set on your whitelisted classes, you may be safe from arbitrary code execution. But are you safe from other attacks?

MOAR ATTACKS!

Here’s a grab bag of attacks you can use even if “primitives” are allowed. Let your friends know how much you really love them with these tricks!

Eating up Object Space

In Ruby, symbols are never garbage collected. To DoS a server, simply send many Ruby symbols in YAML format:

---
- :foo
- :bar
- :baz
# etc

Infinite Loops!

Know a place where someone is doing this?

class Foo
  def some_method
    @ivar.each do |x|
      # ...
    end
  end
end

Teach them about infinite ranges! Psych will deserialize this YAML to a range from 1 to infinity on the object.

--- !ruby/object:Foo
ivar: !ruby/range
  begin: 1
  end: .inf
  excl: false

The @ivar.each call will have a fun time looping forever!

Infinite Recursion!

Have a friend that recurses data structures like this:

class Foo
  def some_method
    stack = [@foo]

    until stack.empty?
      y = stack.pop

      if Array === y
        stack.push y
      else
        process y
      end
    end
  end
end

Send them this little present:

--- !ruby/object:Foo
foo: &70143145831360
- *70143145831360

They’ll be so excited! @foo will result in an array that contains itself. The loop will be processing this stack for quite some time! Good times!

Pathological Regular Expressions

Your friend has given up on iterations. Just totally stopped. How can you show them how much you love them? I know! Send a pathological regular expression!

You know your friend is doing string matches:

class Foo
  def initialize
    @match = /ok/
  end

  def ok?
    @match =~ 'aaaaaaaaaaaaaaaaaaaaaaaaadaaaac'
  end
end

So you send a payload like this:

--- !ruby/object:Foo
match: !ruby/regexp /(b|a+)*c/

Loading this will result in a regular expression that takes an extremely long time to match the string. We can show our love by easily making every web process work hard on a bad regular expression!

ENOUGH ENOUGH!

So now we know that custom classes could result in evaled code, so we have to whitelist only “safe” classes. Now that we’ve whitelisted our safe classes, we need to make sure we don’t load symbols. After we make sure symbols are disallowed, we have to ensure loaded ranges are bounded. After we ensure loaded ranges are bounded, we have to check for self referential hashes and arrays, and the list goes on.

We’ve adjusted our code to make all these checks and balances. But we’ve only examined a handful of the Ruby “primitives” that are available. After examining only these few cases, are we sure that loading any other Ruby “primitive” should be considered safe?

I’m not.

YAML.safe_load

People are asking for a safe_load from Psych. But the question is: “what does safe mean?”. Some say “only prevent foreign code from being executed”, but does that mean we’re safe?

To me it doesn’t. To me, “safe” means something that is:

  1. Easy to understand.
  2. Conservative.
  3. Easy to extend.

I propose that the meaning of “safe load” would only load Null, Booleans, Numerics, Strings, Arrays, Hashes, and no self-referential data structures. This is easy to understand. You only need to know about 6 data types, not a laundry list of possible classes.

I’d prefer to stay conservative, not playing whack-a-mole when someone figures out how to exploit another class. Keeping the number of supprted data types low prevents playing whack-a-mole.

If you really need to load other types, just add the class to the whitelist when calling safe_load. It should really be “that easy”. You explicitly know the types that will be loaded, so the possible values returned only grow when you say so.

YAML Postmortem

This section isn’t actually a postmortem (YAML isn’t dead), it’s actually just a postscript section, so you can stop reading now. I just wanted to call it “postmortem” because I think it’s funny when people have postmortems about software.

People seem to be giving YAML a bad rap over these exploits. The truth is that all of these exploits exist in any scheme that allows you to create arbitrary objects in a target system. This is why nobody uses Marshal to send objects around. Think of YAML as a human readable Marshal.

Should we stop using YAML? No. But we probably shouldn’t use it to read foreign data. Can we make Psych safe? As I said earlier, it depends on what you think “safe” means. My opinion of “safe” puts YAML on the same field as JSON as far as “objects that can be transferred” is concerned.

Anyway, I think it’s important to see we have three things going on in these exploits. We have YAML the langauge, which defines schemes for arbitrary object serialization, Psych which honors those requests, and user land code which is subject to the exploits. YAML the language doesn’t say any of this code should be executed, and in fact Psych won’t eval random input. The problem being that certain YAML documents can be fed to Psych to create objects that interact with user code in unexpected ways.

The user land code is what gets exploited, YAML and Psych are merely a vehicle. But asking users to remove all cases of module_eval or method_missing + send and to require boundry checks, etc is completely unreasonable.

This is why we need a YAML.safe_load.

read more »

2012-12-18 @ 14:03

Rails 4 and your Gemfile

TL;DR This is your periodic reminder to specify dependency versions in your Gemfile

I started updating one of our larger projects at work to use edge Rails. This project uses devise, and the Gemfile declares the dependency like this:

gem "devise"

The latest version of devise correctly declares its dependency on Railties on ~> 3.1:

Devise depends on Rails 3.1

However, Devise version 1.5.3 does not declare a specific dependency on Rails (or Railties):

Devise version 1.5.3

This means that as I upgrade this application, doing a bundle update pulls in Devise version 1.5.3. This version of Devise is incompatible with the app’s codebase. How do you fix it? Update the Gemfile to include the version number (just like rubygems.org recommends) like this:

gem "devise", "~> 2.1.2"

Bundling against Rails 4.0 will fail, but at least it will be a fail during bundle time and not during runtime:

Bundler could not find compatible versions for gem "railties":
  In Gemfile:
    devise (~> 2.1.2) ruby depends on
      railties (~> 3.1) ruby

    rails (>= 0) ruby depends on
      railties (4.0.0.beta)

Time to find what other version issues await!

<3<3<3<3<3

read more »

2012-09-07 @ 10:24

Protected Methods and Ruby 2.0

TL;DR: respond_to? will return false for protected methods in Ruby 2.0

Let’s check out how protected and private methods behave in Ruby. After that, we’ll look at how Ruby 2.0 changes could possibly break your code (and what to do about it).

Method Visibility

In Ruby, we have three visibilities: public, protected, and private. Let’s define a class with all three:

class Heart
  def public_method; end

  protected
  def protected_method; end

  private
  def private_method; end
end

First, let’s see how these differ from within the Heart class.

Internal Visibility

Inside the Heart class, we can call any of these methods with an implicit recipient. In other words, this method will not raise exceptions (note that I’m just reopening the Heart class for demonstration):

class Heart
  def ok!
    public_method
    protected_method
    private_method
  end
end

Public and protected methods can be called with an explicit recipient, but private methods cannot. So the following code will raise an exception on the third line of the method body:

class Heart
  def not_ok!
    self.public_method    # OK
    self.protected_method # OK
    self.private_method   # raises NoMethodError
  end
end

External Visibility

Outside the Heart class, we can only call the public methods:

irb(main):032:0> heart = Heart.new
=> #<Heart:0x007fdad1952f78>
irb(main):033:0> heart.public_method    # => nil
irb(main):034:0> heart.protected_method # => raises NoMethodError
irb(main):035:0> heart.private_method   # => raises NoMethodError

One notable exception is if the object sending the message is of the same type as the object receiving the message, then it’s OK to call protected methods.

Here is an example:

class Hands < Heart
  def call_stuff r
    r.public_method    # => ok!
    r.protected_method # => ok, but only if self.is_a?(r.class)
    r.private_method   # => raises NoMethodError
  end
end

I find this behavior to be most useful when implementing equality operators. For example:

class A
  def == other
    if self.class == other.class
      internal == other.internal
    else
      super
    end
  end

  protected
  def internal; :a; end
end

Introspection

Finally, let’s look at respond_to?. The behavior of this method is changing in Ruby 2.0.0. First we’ll look at the behavior in 1.9, then how it changes in Ruby 2.0.0.

The respond_to? method will return true if the object responds to the given method. Let’s call respond_to? on our Heart object (with Ruby 1.9) and see what it returns:

1.9.3-p194 :010 > heart = Heart.new
 => #<Heart:0x007faaaa14e450> 
1.9.3-p194 :011 > heart.respond_to? :public_method    # => true 
1.9.3-p194 :012 > heart.respond_to? :protected_method # => true 
1.9.3-p194 :013 > heart.respond_to? :private_method   # => false 

Ruby 1.9 will return true for public and protected methods, but false for private methods. If we compare this to actually calling the method, we’ll see an inconsistent behavior. Let’s interleave respond_to? checks along with calling the method to see what happens:

1.9.3-p194 :014 > heart = Heart.new
 => #<Heart:0x007faaaa16d080> 
1.9.3-p194 :015 > heart.respond_to? :public_method    # => true 
1.9.3-p194 :016 > heart.public_method                 # => nil
1.9.3-p194 :017 > heart.respond_to? :protected_method # => true
1.9.3-p194 :018 > heart.protected_method              # => NoMethodError
1.9.3-p194 :019 > heart.respond_to? :private_method   # => false
1.9.3-p194 :020 > heart.private_method                # => NoMethodError

So, despite the fact that respond_to? returns true for the protected method, we cannot actually call that method.

Introspection (in Ruby 2.0.0)

In Ruby 2.0.0, respond_to? has changed. It no longer returns true for protected methods. Let’s look at our Heart example again, but this time with Ruby 2.0.0:

irb(main):013:0> heart = Heart.new
=> #<Heart:0x007fce0b09a188>
irb(main):014:0> heart.respond_to? :public_method    # => true
irb(main):015:0> heart.public_method                 # => nil
irb(main):016:0> heart.respond_to? :protected_method # => false
irb(main):017:0> heart.protected_method              # => NoMethodError
irb(main):018:0> heart.respond_to? :private_method   # => false
irb(main):019:0> heart.private_method                # => NoMethodError

The behavior of respond_to? lines up with the reality of calling the method in Ruby 2.0.0.

Caveats on Reality

The changes to respond_to? also apply inside our “same instances” case. Let’s use this class as an example:

class A
  def == a
    puts a.respond_to? :zoom!
    puts a.zoom!
  end

  protected
  def zoom!; :a; end
end

If we run the following code in Ruby 2.0.0, the call to respond_to? will return false despite the fact that we can actually call the method:

irb(main):029:0> A.new == A.new
false
a
=> nil

I’m not sure this is a big problem because we should be checking ancestors in the comparator methods. If we check that the ancestors are the same, then the respond_to? calls become unnecessary. Also 99% of the objects I write don’t implement object comparator methods.

Compatibility

Most of the problems I’ve found in the Rails code base relating to respond_to? were fixed by either changing the visibility of the method, or calling respond_to? with a true as the second argument. In 1.9, the true tells Ruby to search private methods, and in 2.0, private and protected methods.

For library authors, dealing with this change depends on the situation. For example, if you have code like this:

def some_method other
  if other.respond_to?(:foo)
    other.foo
  else
    some_default_behavior
  end
end

Consider forcing the other object to have the method foo, and the super class of the foo instance implementing some_default_behavior.

If you expect foo to be a protected method, consider changing to is_a? checks, or passing true to respond_to?. Passing true could result in false positives, but I haven’t personally encountered that as a problem (yet).

Happy Hacking! <3<3<3<3

read more »

2012-07-30 @ 11:00

Is it live?

TL;DR Rails 4.0 will allow you to stream arbitrary data at arbitrary intervals with Live Streaming.

HAPPY MONDAY EVERYONE!

Besides enabling multi-threading by default, one of the things I really wanted for Rails 4.0 is the ability to stream data to the client. I want the ability to treat the response object as an I/O object, and have the data I write immediately available to the client. Essentially, the ability to deliver whatever data I want, whenever I want.

Last night I merged a patch to master that allows us to do exactly that: send arbitrary data in real time to the client. In this article, I would like to show off the feature by developing a small application that automatically refreshes the page when a file is written. I’ll be working against edge Rails, specifically against this commit (hopefully then people in the future will notice if / when this article is out of date!)

Here is a video of the final product we’ll build in this article:

Response Streaming

The first thing I added was a “stream” object to the response. This object is where where data will be buffered until it is sent to the client. The stream object is meant to quack like an IO object. For example:

class MyController < ActionController::Base
  def index
    100.times {
      response.stream.write "hello world\n"
    }
    response.stream.close
  end
end

In order to maintain backwards compatibility, the above code will work, but it will not stream data to the client. It will buffer the data until the response is completed, then send everything at the same time.

Live Streaming

To make live streaming actually work, I added a module called ActionController::Live. If you mix this module in to your controller, all actions in that controller can stream data to the client in real time. We can make the above MyController example live stream by mixing in the module like so:

class MyController < ActionController::Base
  include ActionController::Live

  def index
    100.times {
      response.stream.write "hello world\n"
    }
    response.stream.close
  end
end

The code in our action stays exactly the same, but this time the data will be streamed to the client as every time we call the write method.

Webservers

Before we start on our little example project, we should talk a bit about web servers. By default, script/rails server uses WEBrick. The Rack adapter for WEBrick buffers all output in a way we cannot bypass, so developing this example with script/rails server will not work.

We could use Unicorn, but it is meant for fast responses. Unicorn will kill our connection after 30 seconds. The protocol we’re going to use actually makes this behavior irrelevant, but it’s a bit annoying to see the logs.

For this project, I think the best webserver would be either Rainbows!, Puma, or Thin. I’ve been playing with Puma a lot lately, so I’ll use it in this example.

Our application

We’re going to build an application that automatically reloads the page whenever a file is saved. You can find the final repository here.

Technology

For this project we’re going to use edge Rails and Live Streaming, along with a bit of JavaScript and Server-Sent Events. To detect file system changes, we’re going to use the rb-fsevent gem. I think this gem only works on OS X, but it should be easy to translate this project to Linux or Windows given the right library.

Server-Sent Events

If you’ve never heard of Server-Sent Events (from here on I’ll call them SSEs), it’s a feature of HTML5 that allows long polling, but is built in to the browser. Basically, the browser keeps a connection open to the server, and fires an event in JavaScript every time the server sends data. An example event looks like this:

id: 12345\n
event: some_channel\n
data: {"hello":"world"}\n\n

Messages are delimited by two newlines. The data field is the event’s payload. In this example, I’ve just embedded some JSON data in the payload. The event field is the name of the event to fire in JavaScript. The id field should be a unique id of the message. SSE does automatic reconnection; if the connection is lost, the browser will automatically try to reconnect. If an id has been provided with your messages, when the browser attempts to reconnect, it will send a header (Last-Event-ID) to the server allowing you to pick up where you left off.

We’re going to build a controller that emits SSEs and tells the browser to refresh the page.

Getting Started

The first thing we’ll do is generate a new Rails project from the Rails git repository (I keep all my git repos in ~/git):

$ cd ~/git/rails
$ ruby railties/bin/rails new ~/git/reloader --dev
$ cd ~/git/reloader

Update the Gemfile to include puma and rb-fsevent and re-bundle:

diff --git a/Gemfile b/Gemfile
index 9e075a8..51ce01c 100644
--- a/Gemfile
+++ b/Gemfile
@@ -6,6 +6,8 @@ gem 'arel',      github: 'rails/arel'
 gem 'active_record_deprecated_finders', github: 'rails/active_record_deprecated_finders'
 
 gem 'sqlite3'
+gem 'puma'
+gem 'rb-fsevent'
 
 # Gems used only for assets and not required
 # in production environments by default.

Then we’ll generate a controller for emitting SSE messages to the browser:

$ ruby script/rails g controller browser

Moving on!

Generating SSEs

I’d like an object that knows how to format messages as SSE and emits those messages to the live stream. To do this, we’ll write a small class that decorates the output stream and knows how to dump objects as SSEs:

require 'json'

module Reloader
  class SSE
    def initialize io
      @io = io
    end

    def write object, options = {}
      options.each do |k,v|
        @io.write "#{k}: #{v}\n"
      end
      @io.write "data: #{JSON.dump(object)}\n\n"
    end

    def close
      @io.close
    end
  end
end

We’ll place this file under lib/reloader/sse.rb and require it from the browser controller. In the controller, we’ll mix in ActionController::Live and try emitting some SSEs:

require 'reloader/sse'

class BrowserController < ApplicationController
  include ActionController::Live

  def index
    # SSE expects the `text/event-stream` content type
    response.headers['Content-Type'] = 'text/event-stream'

    sse = Reloader::SSE.new(response.stream)

    begin
      loop do
        sse.write({ :time => Time.now })
        sleep 1
      end
    rescue IOError
      # When the client disconnects, we'll get an IOError on write
    ensure
      sse.close
    end
  end
end

Next, update your routes.rb to point at the new controller:

Reloader::Application.routes.draw do
  get 'browser' => 'browser#index'
end

Fire up Puma in one shell:

$ puma
Puma 1.5.0 starting...
* Min threads: 0, max threads: 16
* Environment: development
* Listening on tcp://0.0.0.0:9292
Use Ctrl-C to stop

Then in another shell curl against the endpoint. You should see an event emitted every second. Here is my output after a few seconds:

$ curl -i http://localhost:9292/browser
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
X-UA-Compatible: IE=Edge
X-Request-Id: 76cfaa39-d23b-4eac-8337-f915410dc0de
X-Runtime: 0.430762
Transfer-Encoding: chunked

data: {"time":"2012-07-30T10:02:05-07:00"}

data: {"time":"2012-07-30T10:02:06-07:00"}

data: {"time":"2012-07-30T10:02:07-07:00"}

data: {"time":"2012-07-30T10:02:08-07:00"}

data: {"time":"2012-07-30T10:02:09-07:00"}

data: {"time":"2012-07-30T10:02:10-07:00"}

^C
$

Next we should monitor the file system.

File System Monitoring

Now we’ll update the controller to emit an event every time a file under app/assets or app/views changes. Rather than a loop in our controller, we’ll use the FSEvent object:

require 'reloader/sse'

class BrowserController < ApplicationController
  include ActionController::Live

  def index
    # SSE expects the `text/event-stream` content type
    response.headers['Content-Type'] = 'text/event-stream'

    sse = Reloader::SSE.new(response.stream)

    begin
      directories = [
        File.join(Rails.root, 'app', 'assets'),
        File.join(Rails.root, 'app', 'views'),
      ]
      fsevent = FSEvent.new

      # Watch the above directories
      fsevent.watch(directories) do |dirs|
        # Send a message on the "refresh" channel on every update
        sse.write({ :dirs => dirs }, :event => 'refresh')
      end
      fsevent.run

    rescue IOError
      # When the client disconnects, we'll get an IOError on write
    ensure
      sse.close
    end
  end
end

The controller will send an SSE named “refresh” every time a file is modified. Start Puma in one shell, then curl in a second shell, and touch a file in the third shell, you will see an event.

Curl started in one shell:

$ curl -i http://localhost:9292/browser

Touch a file in another:

$ touch app/assets/javascripts/application.js

Now the curl shell should look like this:

$ curl -i http://localhost:9292/browser
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
X-UA-Compatible: IE=Edge
X-Request-Id: 98331d36-ef7c-4d15-ad99-331149fc589b
X-Runtime: 43.307765
Transfer-Encoding: chunked

event: refresh
data: {"dirs":["/Users/aaron/git/reloader/app/assets/javascripts/"]}

Every time a file is modified under the directories we’re watching, an SSE will be sent up to the browser.

Listening with JavaScript

Next let’s add the JavaScript that will actually refresh the page. I’m going to add this directly to app/assets/javascripts/application.js. The JavaScript we’ll add simply opens an SSE connection and listens for refresh events.

jQuery(document).ready(function() {
  setTimeout(function() {
    var source = new EventSource('/browser');
    source.addEventListener('refresh', function(e) {
      window.location.reload();
    });
  }, 1);
});

Whenever a refresh event happens, the browser will reload the current page.

Parallel Requests

We need to update the configuration in development to handle multiple requests at the same time. One request for the page we’re working on, and another request for the SSE controller. Add these lines to your config/environments/development.rb but please note that they may change in the future:

  config.preload_frameworks = true
  config.allow_concurrency = true

Next we’ll see everything work together.

Trying it out!

To see the automatic refreshes in action, let’s create a test controller and view. I just want to see the automatic refreshes happen, so I’ll use the scaffolding to generate a model, view, and controller:

$ ruby script/rails g scaffold user name:string
$ rake db:migrate

Now run Puma, and navigate to http://localhost:9292/users. If you watch the developer tools, you’ll see the browser connect to /browser but the request will never finish. That is what we want: the browser listening for events on that endpoint.

If you change any file under app/assets or app/views, a message should be sent to the browser, and the browser will refresh the page.

YAY!

SSE Caveats / Features

SSEs will not work on IE (yet). If you want to use this with IE, you’ll have to find another way. SSEs will work on pretty much every other browser, including Mobile Safari.

Some webservers (notably Unicorn) cut off the request after a particular timeout. Be mindful of this when designing your application, and remember that SSE will automatically reconnect after a connection is lost.

Heroku will cut off your connections after 30 seconds. I had trouble getting the SSE to reconnect to a Heroku server, but I haven’t had time to investigate the issue.

Rails Live Streaming Caveats

Mixing the Live Streaming module in to your controller will enable every action in that controller to have a Live Streaming object. In order to make this feature work, I had to work around the Rack API and I did this by executing Live Stream actions in a new thread. This shouldn’t be a problem for most people, but I thought it would be good for people to know.

Headers cannot be written after the first call to write or close on the stream. You will get an exception if you attempt to write to the headers after those calls are made. This is because when you call write on the stream, the server will send all the headers up to the client, so writing new headers after that point is useless and probably a bug in your code.

Always make sure to close your Live Stream IO objects. If you don’t, it might mean that a socket will sit open forever.

WUT?

I thought streaming was already introduced in Rails 3.2. How is this different?

Yes, streaming templates were added to Rails 3.2. The main difference between Live Streaming and Streaming Templates is that with Streaming Template, the application developer could not choose what data was sent to the client and when. Live Streaming gives the application developer full control of what data is sent to the client and when.

Final Thoughts

I’m very excited about this feature of Rails 4. In my opinion, it is one of the most important new features. I’ve been interested in streaming data from Rails for a long time. We can use this feature to reduce latency and deliver data more quickly to clients on slow connections (e.g. cell phones), for infinite streams like chatrooms, or for cool productivity hacks like this article shows.

I hope you enjoyed this article! I think for the next demo of Live Streams, I would like to show how to reduce latency when sending JSON streams to the client. That might be fun.

<3<3<3

read more »

2012-07-09 @ 16:51

Crouching Tiger, Hidden Salami

I’ve been learning how to cure meat, and I thought I should share my setup. I’m currently on my third batch of meat (second time curing salami). Curing meat requires control of the environment, external and internal to the meat. This post is about the hardware I use for controlling the environment outside the meat. Specifically:

  • Temperature
  • Humidity
  • Wind speed

Temperature

I’m using a wine fridge, the Vinotemp VT-27 TS:

IMG_0008

I chose this fridge because it has a temperature range of 39ºF to 65ºF, which is an ideal range for meat curing.

Most of the curing time is spent around 60ºF, but during the fermentation stage, you must keep the temperature at 70ºF. In order to maintain the higher temperature, (until recently) I used a Freezer Temperature Controller. I was able to figure out the wiring for the temperature relay coil that controls the fridge pump, so I’m currently using a TI Launchpad and a Humidity and temperature sensor to control the fridge directly:

IMG_0046

Controlling the fridge directly is not necessary. The freezer temperature controller works great. I just wanted to control the fridge with a micro controller because I am a huge nerd!

Humidity

For the humidity, I am using a Crane Drop humidifier. I chose this humidifier because it’s ultrasonic, so supposedly the water molecules in the air are smaller (or so I’m told). More importantly, the humid air it produces is not heated, so it will not impact the temperature of your box as much.

To get the humid air in to the cooler, I drilled a hole in the back:

Click here to see a zoomed in version of the hole. I bought some tubing and hooked the humidifier in to the hole. This is a top view, showing the tubing hooked in to the fridge:

IMG_0027

I didn’t want to constantly monitor and adjust the humidifier, so I also bought a Dayton Humidifier Controller. The controller sits inside the fridge. I have an extension cord running in to the fridge, the sensor plugged in to that, then the humidifier plugged in to the sensor:

Wind speed

The final piece is wind speed. On my first batch, I didn’t understand the importance of wind speed until my sausage got slimy and gross. Make sure you have fans in your curing box! I have two fans plugged in to the extension cord that is used for the humidity sensor:

IMG_0035

Miscellaneous

The wire racks I have in the box are not stainless, so it’s important not to let the salami touch the metal (otherwise you get a metallic flavor). So I use a bunch of clothespins to hold on to the string that ties off my salami.

So far, I’ve found the best place to put the fans is above the meat pointing down. I found I couldn’t get good air circulation in other places.

This fridge was not meant to operate with such high humidity! When water condenses in your fridge, it drips to a tray outside the fridge where it will evaporate. But since this box operates with 70% to 90% humidity, there is way too much condensate for the drip tray. I had to remove the regular drip tray and run a pipe down to a container to catch the water:

Recap

My hardware list is:

The fridge cost me about $200. The humidifier and controllers ran about $150 together. I estimate the total cost was about $400 when I was done. The freezer controller can perform in the correct temperature range for curing meat, so one way to save money on this project would be to buy a used fridge from craigslist or something.

Another possibility is to buy a Temperature and humidity sensor. It requires a bit more assembly, but is cheaper than buying both the freezer controller and the humidity controller.

With a microcontroller, there are even more possible solutions. It just depends on how much you’re willing to spend, and how much you want to assemble yourself.

Here is my full setup:

IMG_0030

I hope you’ve enjoyed this post! I think my next post will be “Hacking your sausage box” (how to hack the Vinotemp wine cooler), followed by “Real Time Sausage Monitoring” (publishing and monitoring sausage info in real time).

Cat:

IMG_0026

<3<3<3<3<3

read more »

2012-06-18 @ 11:04

Removing config.threadsafe!

TL;DR: config.threadsafe! can be removed, but for Rails 4.0 we should just enable it by default.

A while back a ticket was filed on the Rails tracker to turn on config.threadsafe! mode by default in production. Unfortunately, this change was met with some resistance. Rather than make resistance to change a negative thing, I would like to make it a positive thing by talking about exactly what config.threadsafe! does. My goal is to prove that enabling config.threadsafe! in multi threaded environments and multi process environments is beneficial, therefore the option can be removed.

Before we discuss the impact of config.threadsafe! on multi-process vs multi-threaded environments, let’s understand exactly what the option does.

config.threadsafe!: what does it do?

Let’s take a look at the threadsafe! method:

def threadsafe!
  @preload_frameworks = true
  @cache_classes      = true
  @dependency_loading = false
  @allow_concurrency  = true
  self
end

Calling this method sets four options in our app configuration. Let’s walk through each option and talk about what it does.

Preloading Frameworks

The first option @preload_frameworks does pretty much what it says, it forces the Rails framework to be eagerly loaded on boot. When this option is not enabled, framework classes are loaded lazily via autoload. In multi-threaded environments, the framework needs to be eagerly loaded before any threads are created because of thread safety issues with autoload. We know that loading the framework isn’t threadsafe, so the strategy is to load it all up before any threads are ready to handle requests.

Caching classes

The @cache_classes option controls whether or not classes get reloaded. Remember when you’re doing “TDD” in your application? You modify a controller, then reload the page to “test” it and see that things changed? Ya, that’s what this option controls. When this option is false, as in development, your classes will be reloaded when they are modified. Without this option, we wouldn’t be able to do our “F5DD” (yes, that’s F5 Driven Development).

In production, we know that classes aren’t going to be modified on the fly, so doing the work to figure out whether or not to reload classes is just wasting resources, so it makes sense to never reload class definitions.

Dependency loading

This option, @dependency_loading controls code loading when missing constants are encountered. For example, a controller references the User model, but the User constant isn’t defined. In that case, if @dependency_loading is true, Rails will find the file that contains the User constant, and load that file. We already talked about how code loading is not thread safe, so the idea here is that we should load the framework, then load all user code, then disable dependency loading. Once dependency loading is disabled, framework code and app code should be loaded, and any missing constants will just raise an exception rather than attempt to load code.

We justify disabling this option in production because (as was mentioned earlier) code loading is not threadsafe, and we expect to have all code loaded before any threads can handle requests.

Allowing concurrency

@allow_concurrency is my favorite option. This option controls whether or not the Rack::Lock middleware is used in your stack. Rack::Lock wraps a mutex around your request. The idea being that if you have code that is not threadsafe, this mutex will prevent multiple threads from executing your controller code at the same time. When threadsafe! is set, this middleware is removed, and controller code can be executed in parallel.

Multi Process vs Multi Thread

Whether a multi-process setup or a multi-threaded setup is best for your application is beyond the scope of this article. Instead, let’s look at how the threadsafe! option impacts each configuration (multi-proc vs mult-thread) and compare and contrast the two.

Code loading and caching

I’m going to lump the first three options (@preload_frameworks, @cache_classes, and @dependency_loading) together because they control roughly the same thing: code loading. We know autoload to not be threadsafe, so it makes sense that in a threaded environment we should do these things in advance to avoid deadlocks.

@cache_classes is enabled by default regardless of your concurrency model. In production, Rails automatically preloads your application code so if we were to disable @dependency_loading in either a multi-process model or a multi-threading model, it would have no impact.

Among these settings, the one to differ most depending on concurrency model would be @preload_frameworks. In a multi-process environment, if @preload_frameworks is enabled, it’s possible that the total memory consumption could go up. But this depends on how much of the framework your application uses. For example, if your Rails application makes no use of Active Record, enabling @preload_frameworks will load Active Record in to memory even though it isn’t used.

So the worst case scenario in a multi-process environment is that a process might take up slightly more memory. This is the situation today, but I think that with smarter application loading techniques, we could actually remove the @preload_frameworks option, and maintain minimal memory usage.

Rack::Lock and the multi-threaded Bogeyman

Rack::Lock is a middleware that is inserted to the Rails middleware stack in order to protect our applications from the multi-threaded Bogeyman. This middleware is supposed to protect us from nasty race conditions and deadlocks by wrapping our requests with a mutex. The middleware locks a mutex at the beginning of the request, and unlocks the mutex when the request finishes.

To study the impact of this middleware, let’s write a controller that is not threadsafe, and see what happens with different combinations of webservers and different combinations of config.threadsafe!.

Here is the code we’ll use for comparing concurrency models and usage of Rack::Lock:

class UsersController < ApplicationController
  @counter = 0

  class << self
    attr_accessor :counter
  end

  trap(:INFO) {
    $stderr.puts "Count: #{UsersController.counter}"
  }

  def index
    counter = self.class.counter # read
    sleep(0.1)
    counter += 1                 # update
    sleep(0.1)
    self.class.counter = counter # write

    @users = User.all

    respond_to do |format|
      format.html # index.html.erb
      format.json { render json: @users }
    end
  end
end

This controller has a classic read-update-write race condition. Typically, you would see this code in the form of variable += 1, but in this case it’s expanded to each step along with a sleep in order to exacerbate the concurrency problems. Our code increments a counter every time the action is run, and we’ve set a trap so that we can ask the controller what the count is.

We’ll run the following code to test our controller:

require 'net/http'

uri = URI('http://localhost:9292/users')

100.times {
  5.times.map {
    Thread.new { Net::HTTP.get_response(uri) }
  }.each(&:join)
}

This code generates 500 requests, doing 5 requests simultaneously 100 times.

Rack::Lock and a mult-threaded webserver

First, let’s test against a threaded webserver with threadsafe! disabled. That means we’ll have Rack::Lock in our middleware stack. For the threaded examples, we’re going to use the puma webserver. Puma is set up to handle 16 concurrent requests by default, so we’ll just start the server in one window:

[aaron@higgins omglol]$ RAILS_ENV=production puma 
Puma 1.4.0 starting...
* Min threads: 0, max threads: 16
* Listening on tcp://0.0.0.0:9292
Use Ctrl-C to stop

Then run our test in the other and send a SIGINFO to the webserver:

[aaron@higgins omglol]$ time ruby multireq.rb 

real	1m46.591s
user	0m0.709s
sys	0m0.369s
[aaron@higgins omglol]$ kill -INFO 59717
[aaron@higgins omglol]$

If we look at the webserver terminal, we see the count is 500, just like we expected:

127.0.0.1 - - [16/Jun/2012 16:25:58] "GET /users HTTP/1.1" 200 - 0.8815
127.0.0.1 - - [16/Jun/2012 16:25:59] "GET /users HTTP/1.1" 200 - 1.0946
Count: 500

Now let’s retry our test, but enable config.threadsafe! so that Rack::Lock is not in our middleware:

[aaron@higgins omglol]$ time ruby multireq.rb 

real	0m24.452s
user	0m0.724s
sys	0m0.382s
[aaron@higgins omglol]$ kill -INFO 59753
[aaron@higgins omglol]$

This time the webserver logs are reporting “200”, not even close to the 500 we expected:

127.0.0.1 - - [16/Jun/2012 16:30:50] "GET /users HTTP/1.1" 200 - 0.2232
127.0.0.1 - - [16/Jun/2012 16:30:50] "GET /users HTTP/1.1" 200 - 0.4259
Count: 200

So we see that Rack::Lock is ensuring that our requests are running in a thread safe environment. You may be thinking to yourself “This is awesome! I don’t want to think about threading, let’s disable threadsafe! all the time!”, however let’s look at the cost of adding Rack::Lock. Did you notice the run times of our test program? The first run took 1 min 46 sec, where the second run took 24 sec. The reason is because Rack::Lock ensured that we have only one concurrent request at a time. If we can only handle one request at a time, it defeats the purpose of having a threaded webserver in the first place. Hence the option to remove Rack::Lock.

Rack::Lock and a mult-process webserver

Now let’s look at the impact Rack::Lock has on a multi-process webserver. For this test, we’re going to use the Unicorn webserver. We’ll use the same test program to generate 5 concurrent requests 100 times.

First let’s test with threadsafe! disabled, so Rack::Lock is in the middleware stack:

[aaron@higgins omglol]$ unicorn -E production
I, [2012-06-16T16:45:48.942354 #59827]  INFO -- : listening on addr=0.0.0.0:8080 fd=5
I, [2012-06-16T16:45:48.942688 #59827]  INFO -- : worker=0 spawning...
I, [2012-06-16T16:45:48.943922 #59827]  INFO -- : master process ready
I, [2012-06-16T16:45:48.945477 #59829]  INFO -- : worker=0 spawned pid=59829
I, [2012-06-16T16:45:48.946027 #59829]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:45:51.983627 #59829]  INFO -- : worker=0 ready

Unicorn only forks one process by default, so we’ll increase it to 5 processes and run our test program:

[aaron@higgins omglol]$ kill -SIGTTIN 59827
[aaron@higgins omglol]$ kill -SIGTTIN 59827
[aaron@higgins omglol]$ kill -SIGTTIN 59827
[aaron@higgins omglol]$ kill -SIGTTIN 59827
[aaron@higgins omglol]$ time ruby multireq.rb 

real	0m23.080s
user	0m0.634s
sys	0m0.320s
[aaron@higgins omglol]$ kill -INFO 59829 59843 59854 59865 59876
[aaron@higgins omglol]$

We have to run kill on multiple pids because we have multiple processes listening for requests. If we look at the logs:

[aaron@higgins omglol]$ unicorn -E production
I, [2012-06-16T16:45:48.942354 #59827]  INFO -- : listening on addr=0.0.0.0:8080 fd=5
I, [2012-06-16T16:45:48.942688 #59827]  INFO -- : worker=0 spawning...
I, [2012-06-16T16:45:48.943922 #59827]  INFO -- : master process ready
I, [2012-06-16T16:45:48.945477 #59829]  INFO -- : worker=0 spawned pid=59829
I, [2012-06-16T16:45:48.946027 #59829]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:45:51.983627 #59829]  INFO -- : worker=0 ready
I, [2012-06-16T16:46:54.379332 #59827]  INFO -- : worker=1 spawning...
I, [2012-06-16T16:46:54.382832 #59843]  INFO -- : worker=1 spawned pid=59843
I, [2012-06-16T16:46:54.384204 #59843]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:46:56.624781 #59827]  INFO -- : worker=2 spawning...
I, [2012-06-16T16:46:56.635782 #59854]  INFO -- : worker=2 spawned pid=59854
I, [2012-06-16T16:46:56.636441 #59854]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:46:57.703947 #59827]  INFO -- : worker=3 spawning...
I, [2012-06-16T16:46:57.708788 #59865]  INFO -- : worker=3 spawned pid=59865
I, [2012-06-16T16:46:57.709620 #59865]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:46:58.091562 #59843]  INFO -- : worker=1 ready
I, [2012-06-16T16:46:58.799433 #59827]  INFO -- : worker=4 spawning...
I, [2012-06-16T16:46:58.804126 #59876]  INFO -- : worker=4 spawned pid=59876
I, [2012-06-16T16:46:58.804822 #59876]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:47:01.281589 #59854]  INFO -- : worker=2 ready
I, [2012-06-16T16:47:02.292327 #59865]  INFO -- : worker=3 ready
I, [2012-06-16T16:47:02.989091 #59876]  INFO -- : worker=4 ready
Count: 100
Count: 100
Count: 100
Count: 100
Count: 100

We see the count totals to 500. Great! No surprises, we expected a total of 500.

Now let’s run the same test but with threadsafe! enabled. We learned from our previous tests that we’ll get a race condition, so let’s see the race condition in action in a multi-process environment. We enable threadsafe mode to eliminate Rack::Lock, and fire up our webserver:

[aaron@higgins omglol]$ unicorn -E production
I, [2012-06-16T16:53:48.480272 #59920]  INFO -- : listening on addr=0.0.0.0:8080 fd=5
I, [2012-06-16T16:53:48.480630 #59920]  INFO -- : worker=0 spawning...
I, [2012-06-16T16:53:48.482540 #59920]  INFO -- : master process ready
I, [2012-06-16T16:53:48.484182 #59921]  INFO -- : worker=0 spawned pid=59921
I, [2012-06-16T16:53:48.484672 #59921]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:53:51.666293 #59921]  INFO -- : worker=0 ready

Now increase to 5 processes and run our test:

[aaron@higgins omglol]$ kill -SIGTTIN 59920
[aaron@higgins omglol]$ kill -SIGTTIN 59920
[aaron@higgins omglol]$ kill -SIGTTIN 59920
[aaron@higgins omglol]$ kill -SIGTTIN 59920
[aaron@higgins omglol]$ time ruby multireq.rb 

real	0m22.920s
user	0m0.641s
sys	0m0.327s
[aaron@higgins omglol]$ kill -INFO 59932 59921 59943 59953 59958

Finally, take a look at our webserver output:

[aaron@higgins omglol]$ unicorn -E production
I, [2012-06-16T16:53:48.480272 #59920]  INFO -- : listening on addr=0.0.0.0:8080 fd=5
I, [2012-06-16T16:53:48.480630 #59920]  INFO -- : worker=0 spawning...
I, [2012-06-16T16:53:48.482540 #59920]  INFO -- : master process ready
I, [2012-06-16T16:53:48.484182 #59921]  INFO -- : worker=0 spawned pid=59921
I, [2012-06-16T16:53:48.484672 #59921]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:53:51.666293 #59921]  INFO -- : worker=0 ready
I, [2012-06-16T16:54:56.393218 #59920]  INFO -- : worker=1 spawning...
I, [2012-06-16T16:54:56.420914 #59932]  INFO -- : worker=1 spawned pid=59932
I, [2012-06-16T16:54:56.421824 #59932]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:54:57.962304 #59920]  INFO -- : worker=2 spawning...
I, [2012-06-16T16:54:57.966149 #59943]  INFO -- : worker=2 spawned pid=59943
I, [2012-06-16T16:54:57.966804 #59943]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:54:59.799125 #59920]  INFO -- : worker=3 spawning...
I, [2012-06-16T16:54:59.803206 #59953]  INFO -- : worker=3 spawned pid=59953
I, [2012-06-16T16:54:59.803816 #59953]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:55:00.927141 #59920]  INFO -- : worker=4 spawning...
I, [2012-06-16T16:55:00.931436 #59958]  INFO -- : worker=4 spawned pid=59958
I, [2012-06-16T16:55:00.932026 #59958]  INFO -- : Refreshing Gem list
I, [2012-06-16T16:55:01.808953 #59932]  INFO -- : worker=1 ready
I, [2012-06-16T16:55:05.292524 #59943]  INFO -- : worker=2 ready
I, [2012-06-16T16:55:06.491235 #59953]  INFO -- : worker=3 ready
I, [2012-06-16T16:55:06.955906 #59958]  INFO -- : worker=4 ready
Count: 100
Count: 100
Count: 100
Count: 100
Count: 100

Strange. Our counts total 500 again despite the fact that we clearly saw this code has a horrible race condition. The fact of the matter is that we don’t need Rack::Lock in a multi-process environment. We don’t need the lock because the socket is our lock. In a multi-process environment, when one process is handling a request, it cannot listen for another request at the same time (you would need threads to do this). That means that wrapping a mutex around the request is useless overhead.

Conclusion

I think this blurgh post is getting too long, so let’s wrap it up. The first three options that config.threadsafe! controls (@preload_frameworks, @cache_classes, and @dependency_loading) are either already used in a multi-process environment, or would have little to no overhead if used in a multi-process environment. The final configuration option, @allow_concurrency is completely useless in a multi-process environment.

In a multi-threaded environment, the first three options that config.threadsafe! controls are either already used by default or are absolutely necessary for a multi-threaded environment. Rack::Lock cripples a multi-threaded server such that @allow_concurrency should always be enabled in a multi-threaded environment. In other words, if you’re using code that is not thread safe, you should either fix that code, or consider moving to the multi-process model.

Because enabling config.threadsafe! would have little to no impact in a multi-process environment, and is absolutely necessary in a multi-threaded environment, I think that we should enable this flag by default in new Rails applications with the intention of removing the flag in future versions of Rails.

The End!

<3<3<3<3<3

read more »

2012-02-16 @ 10:10

Ten Years of Nerd Club

The Seattle Ruby Brigade, (the first Ruby Bridage), is having a party to celebrating it’s 10th anniversary, and you should attend! The party will be held February 28th from 7 to 10pm at the offices of Substantial (thanks Substantial!!!) Tickets are $10 and you can get them here.

If you would like to learn more about the history of nerd club, check out this talk I gave in 2008. You can find the accompanying slides here.

Here’s to another 10 years of nerd clubbing!

read more »

2012-01-14 @ 21:36

My Jerky Setup

I love making beef jerky. I started making jerky by using Alton Brown's recipe, but I found his jerky making apparatus to be lacking in a few key areas, so I put together my own jerky making setup. I use a modified food dehydrator for making my jerky. I prefer using a food dehydrator because it's easy to clean, efficient at circulating air, and easily adjustable to accommodate larger or smaller batches of meat. Jerky is supposed to be air dried with no heat. Drying with no heat required modification of my dehydrator. I would like to share my modification process with you!

Getting started

I use a Nesco American Harvest dehydrator. I found mine on sale for about $50, but it looks like the going rate on Amazon is about $65.

I like this dryer because the fan is on top, it's easy to clean, and you can buy and use extra trays. I think it goes up to 10 or 15 trays, but I only have 8.

Step 1: Remove the top nob

In order to disconnect the heating element, we must remove the plastic bar that goes across the top of the fan. The first step is to remove the knob:

Just pull the knob straight up, and it should come off:

Step 2: Disconnect the bar

Flip the lid over. There should be four screws outside the ring. Remove just those four screws:

6698739611_b6f1428b08_o-1

Flip the lid back over, and you should be able to remove the center plastic bar with the Nesco logo on it:

Step 3: Disconnect the heater

These two leads go to the potentiometer that controls the heater:

Disconnect one. The circuit will open, and the heater disabled:

I taped mine down with electrical tape:

That's it!

Just follow the steps backwards to reassemble your dryer. The electrically inclined may want to enhance their dryer with a switch. I (on the other hand) am lazy. I bought a second dryer so that I could make jerky and do normal food dehydrating at the same time.

You can see more pictures of the process here. I plan to write a follow up post where I actually make jerky. Yum!

read more »

2011-12-05 @ 13:07

Profiling Rails startup with DTrace

TL;DR: I’m adding DTrace probes to Ruby, and I used those to make a 6% performance improvement to rake environment.

Lately I've been working on adding DTrace probes to Ruby 2.0. I am interested in using them to profile and improve rails startup time. Today I want to look at building a profiler with DTrace, finding slow method calls, and making some incremental improvements.

The examples I'm going to show will only work on systems where DTrace is available, and with either stock Apple MRI 1.8.7, or my special fork of Ruby. Likely you don't have my fork, so if you'd like to play along, you'll have to use Apple's Ruby 1.8.7. I've published my branch here, but I will be adding and moving probes, so please don't depend on it. Eventually it will be merged to ruby trunk (hopefully).

All of these examples will be executing in the context of a Rails application built from Rails master.

What to profile?

It's easy to say that application boot time is not important. When we start our applications in production, they are only booted once, so the cost is only paid once. But as an application developer, the "boot once" argument doesn't hold water. Every time we run a rake task, we boot the app. Every time we run tests, we boot the app. Every time we run migrations, we boot the app. Every time we restart our webserver, we boot the app. In the context of application development, class reloading helps to some extent, but not when someone runs rake test.

There are good efforts surrounding the improvement of Rails test speeds, but I would like to suggest that these are just working around a larger problem: booting Rails is slow. If we can make Rails boot times fast, I think it would eliminate the need for testing outside the rails context.

So where do we start? Typically, we begin at the beginning. Today we'll start profiling the rake environment command. This task simply requires all of the files for your application, readying everything for execution against a Rails context. Many things can impact the startup time of our application: file parse time, method call time, GC, etc. Today we're going to focus on slow methods, and using DTrace to find those methods.

Viewing method calls

The DTrace probes in Ruby allow us to detect when method calls are made. We can tell when Ruby enters a method by watching the function-entry probe. This probe is given:

  • recipient class name
  • method name
  • file name and line number of the call site

Here is an example DTrace script that prints out all of this information each time a function-entry probe is activated:

ruby*:::function-entry
{
  printf("-> %s::%s (%s:%d)\n", copyinstr(arg0), copyinstr(arg1), copyinstr(arg2), arg3);
}

Save this to a file, and run the file along with rake environment like so:

$ sudo dtrace -q -s calls.d -c'rake environment'

You'll see all of the method call entries, and what file and line made the call. Here is a sample of the output:

-> Module::local_constant_names (/Users/aaron/git/rails/activesupport/lib/active_support/core_ext/module/introspection.rb:85)
-> Module::local_constants (/Users/aaron/git/rails/activesupport/lib/active_support/core_ext/module/introspection.rb:78)
-> Module::constants (/Users/aaron/git/rails/activesupport/lib/active_support/core_ext/module/introspection.rb:79)
-> Array::map (/Users/aaron/git/rails/activesupport/lib/active_support/core_ext/module/introspection.rb:86)
-> Symbol::to_s (/Users/aaron/git/rails/activesupport/lib/active_support/core_ext/module/introspection.rb:86)

Counting method calls

We were able to see all of the method calls, but this doesn't give us a good understanding of where the program is spending time. Instead of just viewing method calls, let's use the aggregation functionality in DTrace to count the number of times a method is called. To do this, we use @ variables in DTrace. @ variables are known as "aggregation variables" and let us, well, aggregate things.

Aggregate variables are associative arrays that allow arbitrary keys. We can modify our DTrace program to aggregate the count of the recipient class name and method name pairs like so:

ruby*:::function-entry
{
  @[copyinstr(arg0), copyinstr(arg1)] = count();
}

When the program finishes executing, DTrace will automatically output the aggregate variable. Running the program yields this output (I've truncated the output to the last 10 lines or so):

$ sudo dtrace -q -s calls.d -c'rake environment'

  Class                 core#define_method           6332
  Symbol                ==                           6612
  Module                method_added                 9329
  Kernel                hash                         9437
  Array                 hash                        11966
  Array                 each                        13101
  Fixnum                ^                           15540
  Module                ===                         20267
  String                hash                        37878
  Symbol                to_s                       214691
$

Great! We can see that when rake environment is run, we're calling Symbol#to_s over 200,000 times. Certainly this will put a burden on the Garbage Collector. If we change our program to aggregate by class, method, and call site, we can find out who is making so many calls. Let's modify the program to add file and line number information:

ruby*:::function-entry
{
  @[copyinstr(arg0), copyinstr(arg1), copyinstr(arg2), arg3] = count();
}

Rerun the program, and we'll see output like this (again I've truncated the output):

$ sudo dtrace -q -s calls.d -c'rake environment'

  Gem::Dependency        requirement           /Users/aaron/.local/lib/ruby/1.9.1/rubygems/dependency.rb               98             4641
  Array                  hash                  /Users/aaron/.local/lib/ruby/1.9.1/rubygems/specification.rb             1308             4697
  Bundler::DepProxy      method_missing        /Users/aaron/.local/lib/ruby/gems/1.9.1/gems/bundler-1.0.21/lib/bundler/rubygems_ext.rb              194             4809
  BasicObject            ==                    /Users/aaron/.local/lib/ruby/1.9.1/rubygems/specification.rb             1334             5610
  Kernel                 ===                   /Users/aaron/.local/lib/ruby/1.9.1/rubygems/specification.rb             1334             5610
  Module                 ===                   /Users/aaron/.local/lib/ruby/1.9.1/rubygems/specification.rb             1334             5940
  Array                  hash                  /Users/aaron/.local/lib/ruby/1.9.1/rubygems/requirement.rb              135             6001
  Fixnum                 ^                     /Users/aaron/.local/lib/ruby/1.9.1/rubygems/specification.rb             1308            11078
  String                 hash                  /Users/aaron/.local/lib/ruby/1.9.1/rubygems/specification.rb             1308            31209
  Symbol                 to_s                  /Users/aaron/git/rails/activesupport/lib/active_support/core_ext/module/introspection.rb               86           208900
$

This shows us that introspection.rb line 86 is calling Symbol#to_s over 208k times. If we take a look at that method, we'll see that on 1.9 it will call Symbol#to_s many times:

def local_constants #:nodoc:
  constants(false)
end

# Returns the names of the constants defined locally rather than the
# constants themselves. See <tt>local_constants</tt>;.
def local_constant_names
  local_constants.map { |c| c.to_s }
end

Knowing the number of times a method was called and who called it is pretty good. But we can't tell if that's why our program is slow. It could be that the method is very fast, so calling it many times doesn't matter. Certainly we need to keep an eye on this method, but let's try to find out where this program is spending the most time.

Sampling method calls

It would be best if we could time how long ruby spends in any particular method. But in order to do that, we would have to keep track of the Ruby stack. DTrace doesn't provide us with a good way to access the Ruby stack. We could write our own program to manually keep track, but there are problems. For example, if something raises an exception, how far up the stack did it go? Which method caught the exception? What about catch / throw? It's Friday and I don't feel like figuring out all the edge cases for dealing with that.

Rather than keeping track of the stack, we can use a timer probe in DTrace. We will set up two probes. One probe will record function entries in a global variable, the other probe will run at fixed intervals and record the last function that was called. This approach assumes that the more time we spend in any particular function, the more likely we'll see that function recorded. Every time a method is entered, this DTrace script will record the method call. Then we have a timer that runs at 5000hz taking a sample of the last method entered.

ruby*:::function-entry
{
  k = copyinstr(arg0);
  m = copyinstr(arg1);
}

tick-5000hz
/k != 0/
{
  @[k, m] = count();
}

Running this script against rake environment yields these results:

$ sudo dtrace -q -s calls.d -c'rake environment'

  Module            method_added               329
  Fixnum            ^                          330
  Mutex             unlock                     355
  String            hash                       477
  Array             -                          671
  Module            constants                  779
  Kernel            require                   2262
  IO                set_encoding              3713
  Symbol            to_s                      4095
  Kernel            `                         4122
$

The number on the right is the number of times the timer executed while ruby was executing the method on the left. Symbol#to_s is definitely in our top 10, but it seems we're actually spending more time in Kernel#`. Let's find all places where Kernel#` is called but using DTrace. We'll use predicates to ensure our probe only fires when that method is called. Predicates are declared below the probe name and above the curly brace within slashes. Our probe body will simply aggregate based on the file where the call happened:

ruby*:::function-entry
/copyinstr(arg0) == "Kernel" &amp;&amp; copyinstr(arg1) == "`"/
{
  @[copyinstr(arg2)] = count();
}

Execute this script:

$ sudo dtrace -q -s calls.d -c'rake environment'

  /Users/aaron/.local/lib/ruby/gems/1.9.1/gems/bundler-1.0.21/lib/bundler.rb                1
  /Users/aaron/.local/lib/ruby/gems/1.9.1/bundler/gems/coffee-rails-2120408a4063/coffee-rails.gemspec                3
  /Users/aaron/.local/lib/ruby/gems/1.9.1/bundler/gems/sass-rails-3c24e4fd8dd3/sass-rails.gemspec                3
  /Users/aaron/.local/lib/ruby/gems/1.9.1/gems/execjs-1.2.9/lib/execjs/external_runtime.rb                5
$

On the right side, we have the number of calls made to Kernel#`, and on the left we have the file where that call was made. We can see that running rake environment will create 12 subshells. Let's see how much time we're spending by creating new subshells:

ruby*:::function-entry
/copyinstr(arg0) == "Kernel" && copyinstr(arg1) == "`"/
{
  self->time = timestamp;
}

ruby*:::function-return
/copyinstr(arg0) == "Kernel" && copyinstr(arg1) == "`"/
{
  this->took = (timestamp - self->time) / 1000000;

  @[copyinstr(arg2)] = sum(this->took);
  @total = sum(this->took);
}

END
{
  printf("Time per file:\n");
  printa(@);
  printf("\nTOTAL: ");
  printa(@total);
}

Here we keep track of the total time spent in Kernel#` in the @total variable, and the aggregate per file in the @ variable. The report lists the amount of time in milliseconds that we're spending. If we run this along with the time command, we can get a rough idea of what percentage we're spending in Kernel#`:

$ time sudo dtrace -q -s calls.d -c'rake environment'
Time per file:

  /Users/aaron/.local/lib/ruby/gems/1.9.1/gems/bundler-1.0.21/lib/bundler.rb  15
  /Users/aaron/.local/lib/ruby/gems/1.9.1/bundler/gems/coffee-rails-2120408a4063/coffee-rails.gemspec  246
  /Users/aaron/.local/lib/ruby/gems/1.9.1/bundler/gems/sass-rails-3c24e4fd8dd3/sass-rails.gemspec  267
  /Users/aaron/.local/lib/ruby/gems/1.9.1/gems/execjs-1.2.9/lib/execjs/external_runtime.rb  398

TOTAL: 
              926


real  0m7.942s
user  0m4.052s
sys   0m2.909s
$

With DTrace enabled, the total run time took about 8000ms and of that time 926ms was spent inside Kernel#`. So let's say about 11% of our time running rake environment is spent in subshells.

Removing the Subshells

First we need to figure out where these subshells are being created. Since we're studying DTrace, let's write a script to aggregate call sites of Kernel#`:

ruby*:::function-entry
/copyinstr(arg0) == "Kernel" &amp;&amp; copyinstr(arg1) == "`"/
{
  @[copyinstr(arg2), arg3] = count();
}

Running this, we can find the locations of our subshells:

/Users/aaron/.local/lib/ruby/gems/1.9.1/bundler/gems/coffee-rails-2120408a4063/coffee-rails.gemspec       19        1
/Users/aaron/.local/lib/ruby/gems/1.9.1/bundler/gems/coffee-rails-2120408a4063/coffee-rails.gemspec       20        1
/Users/aaron/.local/lib/ruby/gems/1.9.1/bundler/gems/coffee-rails-2120408a4063/coffee-rails.gemspec       21        1
/Users/aaron/.local/lib/ruby/gems/1.9.1/bundler/gems/sass-rails-3c24e4fd8dd3/sass-rails.gemspec       22        1
/Users/aaron/.local/lib/ruby/gems/1.9.1/bundler/gems/sass-rails-3c24e4fd8dd3/sass-rails.gemspec       23        1
/Users/aaron/.local/lib/ruby/gems/1.9.1/bundler/gems/sass-rails-3c24e4fd8dd3/sass-rails.gemspec       24        1
/Users/aaron/.local/lib/ruby/gems/1.9.1/gems/bundler-1.0.21/lib/bundler.rb      209        1
/Users/aaron/.local/lib/ruby/gems/1.9.1/gems/execjs-1.2.9/lib/execjs/external_runtime.rb      150        5

We'll tackle fixing the gemspecs first because they are most boring to fix.

Fixing the gemspecs

Each gemspec shells out and runs git in order to populate various parts of the gemspec. Here are the relevant lines from one of the gemspecs:

s.files         = `git ls-files`.split("\n")
s.test_files    = `git ls-files -- {test,spec,features}/*`.split("\n")
s.executables   = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }

To fix this, we're going to generate the gemspec in advance, then check in the generated gemspec. The downside of this solution is that whenever anybody adds or removes a file, the gemspec must be regenerated. The upside to this solution is that you see exactly what files are packaged in your gemspec. A more optimal solution would be to change bundler such that when a git repository is checked out, a gemspec for that repo is automatically built. If you're looking for an opportunity to contribute to open source, this is it! :-D

The gemspecs are now generated in advance. Next we should look at fixing subshells Bundler.

Fixing Bundler

In bundler.rb around line 209, we find that Bundler is trying to locate the sudo executable:

def requires_sudo?
  return @requires_sudo if defined?(@checked_for_sudo) && @checked_for_sudo

  path = bundle_path
  path = path.parent until path.exist?
  sudo_present = !(`which sudo` rescue '').empty?

  @checked_for_sudo = true
  @requires_sudo = settings.allow_sudo? && !File.writable?(path) && sudo_present
end

Let's refactor this to use a which method that searches the PATH environment variable:

def which(cmd)
  if File.executable? cmd
    cmd
  else
    path = ENV['PATH'].split(File::PATH_SEPARATOR).find { |path|
      File.executable? File.join(path, cmd)
    }
    path && File.expand_path(cmd, path)
  end
end

def requires_sudo?
  # ...
  sudo_present = which "sudo"
  # ...
end

Our which implementation isn't 100% perfect, (it can't handle input like "./sudo" correctly) but it is good enough to find the sudo executable on a person's system. Benchmarks reveal that the implementation that doesn't shell out is over 10x faster than the version that does:

require 'rubygems'
require 'benchmark/ips'

def sand_which cmd
  `which #{cmd}`
end

def blair_which cmd
  if File.executable? cmd
    cmd
  else
    path = ENV['PATH'].split(File::PATH_SEPARATOR).find { |path|
      File.executable? File.join(path, cmd)
    }
    path && File.expand_path(cmd, path)
  end
end

Benchmark.ips do |x|
  x.report("sand") { sand_which 'sudo' }
  x.report("blair") { blair_which 'sudo' }
end
$ ruby lol.rb 
                sand      251.0 (±4.8%) i/s -       1265 in   5.051489s (cycle=23)
               blair     3222.0 (±8.2%) i/s -      16218 in   5.076616s (cycle=306)
$

Now that we have bundler fixed, let's take a look at ExecJS.

Fixing ExecJS

Our last place that shells out is inside ExecJS. The DTrace probes specifically list execjs/external_runtime.rb line 150. If we look there, we'll find another implementation of which:

def which(command)
  Array(command).each do |name|
    name, args = name.split(/\s+/, 2)
    result = if ExecJS.windows?
      `#{shell_escape("#{ExecJS.root}/support/which.bat", name)}`
    else
      `#{shell_escape('command', '-v', name)} 2>/dev/null`
    end

    if path = result.strip.split("\n").first
      return args ? "#{path} #{args}" : path
    end
  end
  nil
end

When fixing performance problems, it's helpful to understand the task the code is trying to accomplish, as well is how the method is called. Here is the stack trace for this method inside ExecJS:

lib/execjs/external_runtime.rb:134:in `locate_binary'
lib/execjs/external_runtime.rb:97:in `initialize'
lib/execjs/runtimes.rb:37:in `new'
lib/execjs/runtimes.rb:37:in `<module:Runtimes>'
lib/execjs/runtimes.rb:9:in `<module:ExecJS>'
lib/execjs/runtimes.rb:8:in `<top (required)>'
lib/execjs.rb:2:in `require'
lib/execjs.rb:2:in `<top (required)>'

There are a couple important things we can learn from this stack trace: 1) this method is called immediately when the file is required, and 2) it's called from inside a constructor (on line 97). Let's look at locate_binary, the initializer, and how the objects are constructed in runtimes.rb:

external_runtime.rb:

def initialize(options)
  @name        = options[:name]
  @command     = options[:command]
  @runner_path = options[:runner_path]
  @test_args   = options[:test_args]
  @test_match  = options[:test_match]
  @encoding    = options[:encoding]
  @binary      = locate_binary
end

protected
def locate_binary
  if binary = which(@command)
    if @test_args
      output = `#{shell_escape(binary, @test_args)} 2>&1`
      binary if output.match(@test_match)
    else
      binary
    end
  end
end

runtimes.rb:

module ExecJS
  module Runtimes
    # ...

    Node = ExternalRuntime.new(
      :name        => "Node.js (V8)",
      :command     => ["nodejs", "node"],
      :runner_path => ExecJS.root + "/support/node_runner.js",
      :encoding    => 'UTF-8'
    )

    # ...
  end
end

The initializer calls locate_binary which calls which which shells out. This is a great example of a hidden dependency in a constructor, our hidden dependency being the ability to shell out, and the availability of the command command. This isn't to say that the command command will sometimes not be available, the problem is that people constructing this object will not know that it depends on these things. It's "hidden" from the caller. The first thing we should do is stop the constructor from immediately shelling out. We'll do this by initializing @binary to nil, adding a binary method, and changing all places that access the instance variable to use the method instead. Our diff looks like this:

diff --git a/lib/execjs/external_runtime.rb b/lib/execjs/external_runtime.rb
index a8180d1..65f46f9 100644
--- a/lib/execjs/external_runtime.rb
+++ b/lib/execjs/external_runtime.rb
@@ -94,7 +94,7 @@ module ExecJS
       @test_args   = options[:test_args]
       @test_match  = options[:test_match]
       @encoding    = options[:encoding]
-      @binary      = locate_binary
+      @binary      = nil
     end

     def exec(source)
@@ -113,16 +113,21 @@ module ExecJS

     def available?
       require "multi_json"
-      @binary ? true : false
+      binary ? true : false
     end

+    private
+      def binary
+        @binary ||= locate_binary
+      end
+
     protected
       def runner_source
         @runner_source ||= IO.read(@runner_path)
       end

       def exec_runtime(filename)
-        output = sh("#{shell_escape(*(@binary.split(' ') << filename))} 2>&1")
+        output = sh("#{shell_escape(*(binary.split(' ') << filename))} 2>&1")
         if $?.success?
           output
         else

Now the object will only shell out if the value is absolutely necessary. This technique is called "extracting a getter", and you can read more about it in Michael Feathers's book Working Effectively with Legacy Code (IMO this book is a mandatory read for any software engineer). Now we're lazily shelling out, but if we look at the impact, we've gone from ExecJS creating 5 subshells down to 3 subshells:

$ sudo dtrace -q -s calls.d -c'rake environment'
/Users/aaron/.local/lib/ruby/gems/1.9.1/gems/execjs-1.2.9/lib/execjs/external_runtime.rb      155      3
$

The target is 0 subshells, so we should continue. This code is trying to locate executables on the file system, similar to what bundler was doing. We can change the which method in ExecJS to be a Ruby implementation similar to the implementation we put in bundler:

diff --git a/lib/execjs/external_runtime.rb b/lib/execjs/external_runtime.rb
index 65f46f9..88a08ce 100644
--- a/lib/execjs/external_runtime.rb
+++ b/lib/execjs/external_runtime.rb
@@ -121,6 +121,22 @@ module ExecJS
         @binary ||= locate_binary
       end

+      def which_windows(name)
+        result = `#{shell_escape("#{ExecJS.root}/support/which.bat", name)}`
+        result.strip.split("\n").first
+      end
+
+      def which_unix(name)
+        if File.executable? cmd
+          cmd
+        else
+          path = ENV['PATH'].split(File::PATH_SEPARATOR).find { |path|
+            File.executable? File.join(path, cmd)
+          }
+          path && File.expand_path(cmd, path)
+        end
+      end
+
     protected
       def runner_source
         @runner_source ||= IO.read(@runner_path)
@@ -147,19 +163,14 @@ module ExecJS
       end

       def which(command)
-        Array(command).each do |name|
+        Array(command).find do |name|
           name, args = name.split(/\s+/, 2)
-          result = if ExecJS.windows?
-            `#{shell_escape("#{ExecJS.root}/support/which.bat", name)}`
-          else
-            `#{shell_escape('command', '-v', name)} 2>/dev/null`
-          end
+          path = ExecJS.windows? ? which_windows(name) : which_unix(name)

-          if path = result.strip.split("\n").first
-            return args ? "#{path} #{args}" : path
-          end
+          next unless path
+
+          args ? "#{path} #{args}" : path
         end
-        nil
       end

       if "".respond_to?(:force_encoding)

It isn't clear what the utility of the Windows batch file is, so it's been extracted to it's own method in order to maintain the current functionality. After these patches (which I have submitted here), the number of subshells is now down to zero.

Total Impact

Before:

$ time rake environment

real    0m2.377s
user    0m2.028s
sys 0m0.320s

After:

$ time rake environment 

real    0m2.224s
user    0m1.946s
sys 0m0.260s

After eliminating subshells, we've saved 153 milliseconds for approximately 6% savings. Not as much as the 11% we were hoping for, but still greater than 0. If we re-run our sampling profiler, we'll find no more references to Kernel#`:

$ sudo dtrace -q -s calls.d -c'rake environment'

  Module           ===                306
  Mutex            unlock             324
  Fixnum           ^                  353
  Module           method_added       368
  Array            -                  509
  String           hash               603
  Module           constants          859
  Kernel           require           2292
  IO               set_encoding      3371
  Symbol           to_s              4561

Conclusion

Removing subshells from the Rails startup process wasn't a silver bullet. From my experience in doing performance work, there is rarely a silver bullet that will fix performance problems. That said, even 2.2 seconds is still too slow for Rails startup time. If we collate the sampling profiler by file and method call, I think we can get a more clear picture about our runtime cost:

$ sudo dtrace -q -s calls.d -c'rake environment'

  Array            hash                  /Users/aaron/.local/lib/ruby/1.9.1/rubygems/requirement.rb              255
  Mutex            unlock                <internal:prelude>                                              365
  Fixnum           ^                     /Users/aaron/.local/lib/ruby/1.9.1/rubygems/specification.rb              493
  Array            -                     /Users/aaron/git/rails/activesupport/lib/active_support/dependencies.rb              507
  IO               set_encoding          /Users/aaron/.local/lib/ruby/1.9.1/rubygems/custom_require.rb              636
  Module           constants             /Users/aaron/git/rails/activesupport/lib/active_support/core_ext/module/introspection.rb              841
  IO               set_encoding          /Users/aaron/git/rails/activesupport/lib/active_support/dependencies.rb             1098
  String           hash                  /Users/aaron/.local/lib/ruby/1.9.1/rubygems/specification.rb             1188
  Kernel           require               /Users/aaron/git/rails/activesupport/lib/active_support/dependencies.rb             1253
  Symbol           to_s                  /Users/aaron/git/rails/activesupport/lib/active_support/core_ext/module/introspection.rb             8981

Of the top ten slow methods, five of them are inside Rails. These particular methods are all associated with the way that Rails handles missing constants and code reloading. The total amount of time spent dealing with code reloading is likely far greater than the subshelling issue we fixed. I knew this before writing this article, but tackling the reloading code in Rails is much longer than a single blog post. The goal of this blog post was to show that:

  • Startup performance can be improved
  • Even small changes can make an impact
  • DTrace is pretty awesome

I'd like to write an article with a more comprehensive analysis of where Rails spends it's time on boot (GC, file parsing, method calls), but I don't have all the information I need yet. My next steps are to:

  • Try and understand Rails code reloading better
  • Add more GC related probes to Ruby
  • Add source compile probes to ruby

Once I've made more progress, I'll write another article.

<3<3<3<3

read more »

2011-11-30 @ 15:57

PSA: The number of gems installed on your system can impact rails boot time

Just a PSA: the number of gems you have installed on your system will impact your rails boot time. Why is this?

When you require a file, rubygems must find the file. If the file isn't on the load path, it must find it in your installed gems. To do this, it loads all gemspecs and searches for the file. This means all gemspecs on your system are evaluated, so the more gems you have, the longer it takes. You can read more in the rubygems comments.

The second problem is when bundler comes in to the picture. When rubygems loads the gemspec files, it turns them in to Gem::Specification objects and caches them. Sometimes bundler must modify your GEM_HOME and GEM_PATH environment variables. These environment variables tell rubygems where to locate gems on your file system. Unfortunately, if these variables are modified after the specification cache is built, that means the cache must be cleared. Until recently, bundler always cleared the rubygems cache. Since rubygems can load gemspecs before bundler gets activated (or may even while bundler is being activated!), this means it's not unusual for all gemspecs on your system to be evaluated twice. Earlier today, I submitted a patch that will only clear the cache if the environment variables are modified. Hopefully this will mean fewer gemspec evals in the next version of bundler.

To help reduce boot time, just remember:

  • Use gemsets (if you're on RVM)
  • Clean unused gems often

To clean up your gems, just use the command gem cleanup.

Hope that helps!

read more »

2011-10-20 @ 12:14

Connection Management in ActiveRecord

OMG! Happy Thursday! I am trying to be totally enthusiastic, but the truth is that I have a cold, so there will be fewer uppercase letters and exclamation points than usual.

Anyway, I want to talk about database connection management in ActiveRecord. I am not too pleased with its current state of affairs. I would like to describe how ActiveRecord connection management works today, how I think it should work, and steps towards fixing the current system.

TL;DR: database connection API in ActiveRecord should be more similar to File API

Thinking in terms of files

It's convenient to think of our database connection as a file. Dealing with files is very common. When we work with files, the basic sequence goes something like this:

  • Open the file
  • Do some work on the file handle
  • Close the file

We're very used to doing these steps when dealing with files. Typically our code will look something like this:

File.open('somefile.txt', 'wb') do |fh| # Open the file
  fh.write "hello world"                # Do some work with the file
end                                     # Close file when block returns

We don't want to share open files among threads because dealing with synchronization around reading and writing to the file is too difficult (and time consuming). So maybe we'll store the handle in a thread local or something until we're ready to close it.

Our basic requirements for dealing with a database connection are essentially the same as when dealing with files. We need to open our database connection, do some work with the connection (send and receive queries), and close the connection. We have these similarities, yet the API for dealing with database connections in ActiveRecord is vastly different. Let's look at how each of these steps are performed in ActiveRecord today.

Opening a connection

Opening a connection to the database is very easy. First we configure ActiveRecord with the database specification, then we call connection to actually get back a database handle:

ActiveRecord::Base.establish_connection(
  :adapter  => "sqlite",
  :database => "path/to/dbfile")

connection_handle = ActiveRecord::Base.connection

The main difference between this API and the File API is that we've separated the connection specification from actually opening the connection. In the case of opening a file, we call open along with a "specification" which includes the file name and how we want to open it. In this case, we've separated the two; essentially storing the specification in a global place, then opening the connection later.

This leads to two questions:

  1. Where is the specification stored?
  2. When I call connection, what specification is used?

The answer to the first question can be found by reading the establish_connection method. Specifically if we look at line 63 we'll find a clue. Since this method is a class method, the call to name returns the class name of the recipient. This name (along with our actual spec) is passed in to the connection handler object. If we jump through a few more layers of indirection, we'll find that what we have is essentially a one to one mapping of class name to connection specification.

Armed with this information, we can tackle the second question. If we look at the implementation of connection, it calls retrieve_connection on itself, which calls retrieve_connection on the connection handler with itself. A few more method calls later, and we see that each ActiveRecord subclass walks up the inheritance tree looking for a connection:

def retrieve_connection_pool(klass)
  pool = @connection_pools[klass.name]
  return pool if pool
  return nil if ActiveRecord::Base == klass
  retrieve_connection_pool klass.superclass
end

If we read this code carefully, we'll notice that not only are connection specifications mapped to classes so are database connections!

Why is this bad?

This behavior smells bad to me. The reason is because we're tightly coupling classes along with database connections when really this relationship doesn't need to exist.

How can it be improved?

If this tight coupling is removed, the complexity of ActiveRecord can be reduced and at the same time increasing the features available! The way we can reduce this coupling is by passing the connection specification to the method that actually opens the connection. Specifications can be stored on each class as a convenience, but nothing more.

What if opening a connection looked more like this?

spec = ActiveRecord::Base.specificiation
ActiveRecord::ConnectionPool.open(spec) do |conn|
  ...
end

We could maintain the current behavior by storing specifications on each class, but eliminate the coupling between connection and class. We would be able to delete all of the code that looks up connections by class hierarchy, and open the doors to having features like this:

spec = database_a
ActiveRecord::ConnectionPool.open(spec) do |conn|
  User.find_all
end

spec = database_b
ActiveRecord::ConnectionPool.open(spec) do |conn|
  User.find_all
end

Working with the connection

Working with our connection should remain the same. We have one place to retrieve our connection and work with it. Woo!

Dealing with thread safety

Sharing open file handles among threads probably isn't a good idea and the same can be said about open database connections. So how does ActiveRecord keep connections localized to one thread? If we jump through many, many, method calls, we'll find where the connection is actually checked out of the connection pool. It is here we see how thread safety is handled:

# Retrieve the connection associated with the current thread, or call
# #checkout to obtain one if necessary.
#
# #connection can be called any number of times; the connection is
# held in a hash keyed by the thread id.
def connection
  @reserved_connections[current_connection_id] ||= checkout
end

A hash is kept where the key is the current_connection_id. The implementation of current_connection_id looks up the current id. If the id isn't set, it sets it to the object id of the current thread:

def current_connection_id #:nodoc:
  ActiveRecord::Base.connection_id ||= Thread.current.object_id
end

Next we look at the implementation of connection_id to find that it just gets and sets a thread local:

def connection_id
  Thread.current['ActiveRecord::Base.connection_id']
end

def connection_id=(connection_id)
  Thread.current['ActiveRecord::Base.connection_id'] = connection_id
end

These methods ensure that we have a one to one relationship of open connection and thread.

Closing the connection

Finally we reach our last step: closing the connection. How many of you have closed your connection to the database in ActiveRecord? My guess is that it's very few. I think the reason people don't typically close their connections with ActiveRecord is twofold. One, you don't have to because it just does it for you, and two, the API to close a particular connection is pretty convoluted.

So how is the connection closed today? There are two ways, the easy way and the hard way.

The easy way

The easy way is good enough in a non-threaded application. A rack middleware clears out all of the connections at the end of the request. The source for clear_active_connections! is pretty simple. For each connection pool in the system (remember it's one pool per AR class and connection spec), release that connection:

# Returns any connections in use by the current thread back to the pool,
# and also returns connections to the pool cached by threads that are no
# longer alive.
def clear_active_connections!
  @connection_pools.each_value {|pool| pool.release_connection }
end

Each pool releases the connection it has using the current_connection_id (which happens to be the current thread id):

# Signal that the thread is finished with the current connection.
# #release_connection releases the connection-thread association
# and returns the connection to the pool.
def release_connection(with_id = current_connection_id)
  conn = @reserved_connections.delete(with_id)
  checkin conn if conn
end

Not bad. But what if our system has multiple threads?

The hard way

Believe it or not, the connection pool in ActiveRecord will check in connections in the checkout method. Let me say that again: the checkout method checks in connections and checks out connections. If you're not facepalming yet, let's look at a small part of the checkout method:

@queue.wait(@timeout)

if(@checked_out.size < @connections.size)
  next
else
  clear_stale_cached_connections!
  if @size == @checked_out.size
    raise ConnectionTimeoutError, "could not obtain a database connection#{" within #{@timeout} seconds" if @timeout}. The max pool size is currently #{@size}; consider increasing it."
  end
end

This bit of the checkout method is not called unless our connection pool has become full. First we wait for other threads to check in their connection. While we're waiting, if other threads checked in their connection, the first branch of the if statement executes, and a connection is returned. If no threads have checked in their connection, we call clear_stale_cached_connections!:

def clear_stale_cached_connections!
  keys = @reserved_connections.keys - Thread.list.find_all { |t|
    t.alive?
  }.map { |thread| thread.object_id }
  keys.each do |key|
    checkin @reserved_connections[key]
    @reserved_connections.delete(key)
  end
end

This method walks through every thread in your system, looking for connections that were allocated to threads that no longer exist. Then it checks in connections associated with those dead threads. Since there is really no easy way for users to check in their own connections, this is actually a common code path for systems that use threads.

Why is this bad?

It should be pretty clear why this behavior is bad. Walking through every thread in the system, and asking if it's alive isn't very cheap. Even worse is that we're coupling ourselves to the threading system. We cannot change the connection pool to work with other concurrency solutions (like Fibers) because those solutions may not give us the introspection we need to perform this operation!

But really, this is treating a symptom. The real problem is that checking in connections is too difficult, so people don't do it.

How can we fix this?

I think the best solution for this is to mimic the File API. If we do this, it will become natural for people dealing with the database connection to actually close the connection.

We should make ActiveRecord::Base.connection consult a thread local. That thread local is set in the rack middleware where the connection is opened. If someone creates a new thread, they must populate that thread local, and close the connection at the end of the thread.

Simplified, our middleware would become something like this:

class ConnectionManagement
  def call env
    spec       = ActiveRecord::Base.spec
    connection = ActiveRecord::ConnectionPool.open spec
    ActiveRecord::Base.connection = connection

    @app.call env

    connection.close
  end
end

When people create a new thread, it would look something like this:

Thread.new do
  spec = ActiveRecord::Base.spec
  ActiveRecord::ConnectionPool.open(spec) do |connection|
    ActiveRecord::Base.connection = connection

    # do some stuff
  end
end

What does this buy us?

This buys us two important things: simple connection pool management, and freedom of choice on our concurrency model.

omg the end.

I hope I've convinced you that by simply learning to treat our database connection like a file, we can reduce code complexity and at the same time increase the features available. I think I can add this feature to Rails 3.2 and mostly maintain backwards compatibility. I think we can keep 100% backwards compatibility if we add some sort of flag like config.i_suck_and_will_not_close_my_database_connections = true or, config.my_app_is_awesome = true.

Anyway, I'm totally sick and I'll stop blllluuurrrrggghhhing now.

<3 <3 <3 <3 <3

read more »

2011-06-29 @ 15:59

I want DTrace probes in Ruby

Recently I was debugging a performance regression in Rails 3.1. The ticket reported for the regression just indicated a speed problem. Namely allocating new active record objects was very slow in Rails 3.1 compared to Rails 3.0.

The benchmark program

Here is the program I was using for benchmarking. It's slightly modified from the original program that @paul so kindly submitted:

require 'active_record'
require 'benchmark'

p ActiveRecord::VERSION::STRING

ActiveRecord::Base.establish_connection(
  :adapter  => "sqlite3",
  :database => ":memory:"
)

ActiveRecord::Base.connection.execute("CREATE TABLE active_record_models (id INTEGER UNIQUE, title STRING, text STRING)")

class ActiveRecordModel < ActiveRecord::Base; end
ActiveRecordModel.new

N = 100_000
Benchmark.bm { |x| x.report('new') { N.times { ActiveRecordModel.new } } }

I could tell from perftools.rb output we were calling new methods, and those new methods took plenty of time.

Here is the call graph for this benchmark on Rails 3.0:

arnew

Compared to the call graph on Rails 3.1:

arnew31

After some work, I was able to remove the method calls to scope when allocating new AR objects, but the benchmark still showed 3.1 to be slower than 3.0. I noticed we were spending more time in the Garbage Collector on 3.1 than on 3.0. This lead me to believe that we were creating more objects in the newer version.

Counting Object Allocations

Counting object allocations with Ruby 1.9 is actually pretty easy. We can call ObjectSpace.count_objects to get a list of the live objects in the system. That method returns a hash where the keys are the object type and the values are the number of that object in the system (there are other keys, be we don't care about them for now).

Knowing this information, we can write a function that will calculate a rough difference in object allocations:

def allocate_count
  GC.disable
  before = ObjectSpace.count_objects
  yield
  after = ObjectSpace.count_objects
  after.each { |k,v| after[k] = v - before[k] }
  GC.enable
  after
end

p allocate_count { 100.times { {} } }

The first thing we do in this function is disable the garbage collector. It's possible that the code we're investigating could trigger GC, and that will impact our allocated object count.

Then we grab our current list of objects, yield to the code we want to profile, grab the list again, calculate our delta, enable GC and return!

If you run this code, you should see output similar to this (I've removed keys I don't care about):

{ ... :T_HASH=&gt;101, ... }

You can see from the output that we've allocated 101 hashes. 100 came from our 100.times call, and one came from calling the method on ObjectSpace.

Since inspecting our code can impact object allocations, these counts can't be 100% accurate. Another annoying thing is that we can't determine where our code is allocating these hashes. It's easy enough to see the hash literals in this code, but imagine a project like rails. You may be able to find hash literals in your source, but determining which ones are being called is difficult.

After running this against the profile, I found that ActiveRecord in 3.1 was allocating 2x the number of hashes that 3.0 allocated. Now the problem is figuring out where these hashes were being allocated.

DTrace

DTrace gives us hooks in to our processes. Many of the hooks provided are for system calls, or other various functions. DTrace requires that we write probes in our code, and unfortunately Ruby does not have DTrace probes built in.

Adding DTrace to Ruby

To find these Hash allocations, I decided to add DTrace support. I usually run Ruby built from Ruby trunk, so running a modified Ruby seemed acceptible. It turns out adding DTrace support is pretty easy.

First I added a probe definition file. This file describes the probes that I want to add to Ruby. It contains the probe names and the signatures of the probes:

provider ruby {
  probe hash__alloc(const char *, int);
};

This definition says we're declaring a probe called hash-alloc and it will take a string (the file name) and an int (the line number).

Next I used this definition file to generate a header file that ruby would use:

$ dtrace -o probes.h -h -s probes.d

Finally, I modified hash.c to trigger the probe:

diff --git a/hash.c b/hash.c
index b49aff8..c40d94d 100644
--- a/hash.c
+++ b/hash.c
@@ -15,6 +15,7 @@
 #include "ruby/st.h"
 #include "ruby/util.h"
 #include "ruby/encoding.h"
+#include "probes.h"
 #include <errno.h>
 
 #ifdef __APPLE__
@@ -221,6 +222,9 @@ hash_alloc(VALUE klass)
     OBJSETUP(hash, klass, T_HASH);
 
     RHASH_IFNONE(hash) = Qnil;
+    if(RUBY_HASH_ALLOC_ENABLED()) {
+       RUBY_HASH_ALLOC(rb_sourcefile(), rb_sourceline());
+    }
 
     return (VALUE)hash;
 }

After this, I built and installed ruby. Then I wrote a dtrace script to display the filename and line number where hash allocations were happening:

ruby*:::hash-alloc
{
  printf("%s:%d", copyinstr(arg0), arg1);
}

Then I ran the benchmark with dtrace:

$ sudo dtrace -s x.d -c 'ruby -I lib test.rb'

Here is an excerpt of the output:

~~~ 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1528 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1529 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1534 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1535 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1525 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/persistence.rb:322 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1527 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1528 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1529 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1534 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1535 1 126930 hash_alloc:hash-alloc /Users/aaron/git/rails/activerecord/lib/active_record/base.rb:1525</code></pre>

I was able to compare the output from this script on Rails 3.0 vs Rails 3.1. From this output, I was able to determine:

  • The number of hash allocations
  • Where the hashes were allocated
  • Which allocations were new in Rails 3.1

Armed with this information, I was able to eliminate some allocations and return speed in Rails 3.1 to that of Rails 3.0.

DTrace vs gdb vs memprof

I probably could have accomplished this task with memprof, but it requires that I use 1.8.7 from rvm. Working with RVM on my machine is difficult (don't ask, let's just say I'm a "special needs" user), and I wanted to use 1.9. So memprof was out the window.

I was able to get the same information by scripting gdb. I set a breakpoint at the correct function, called rb_sourcefile() and rb_sourceline() and redirected to a file. The problem with gdb is that it seemed very slow, and scripting it was a pain (though I am not a gdb expert!).

DTrace was fast, relatively easy to use, and very scriptable. It made me happy!

OMG!!!

I would like to see dtrace probes officially added to ruby trunk. The ruby that ships with OS X has them built in, but they don't give us information like hash literal allocations. It looks like they were in ruby trunk at one point, but were then reverted. I would like to see them added again.

If you want to play with them on ruby trunk, here is my full patch. Make sure to run make probes.h before make && make install.

HAPPY WEDNESDAY!!!! <3 <3 <3 <3

(it feels good to blog on a non “professional” blog)

read more »

2011-06-28 @ 16:08

TIL: It's OK to return nil from to_ary

tl;dr: You can return nil from to_ary and to_a.

Today I discovered that it's OK for to_ary to return nil. Ruby spec tests this behavior, and the ruby implementation supports it. But why would you want to implement this?

Array#flatten and to_ary

Consider the following code:

class Item
  def respond_to?(name, visibility = false)
    p "respond to: #{name}"
    false # we don't respond to anything!!
  end

  def method_missing(name, *args)
    p "method missing: #{name}"
    super # if something?
    # do something else
  end
end

[[Item.new]].flatten

In Ruby 1.9, flatten will actually call to_ary on each of the items in the collection. If the method table for this object contains to_ary, Ruby will call to_ary, and if method_missing is implemented, it will call method_missing regardless of how your respond_to? returns. The reason ruby will call method_missing is because it could implement to_ary. (As for why it ignores the return value of respond_to?, I don't know)

Naturally, if we were to call to_ary directly on this object it would raise a NoMethodError exception. When calling flatten, ruby will swallow the exception. If we enable $DEBUG by running ruby with -d, we can see the exception:

[aaron@higgins ~]$ ruby -d test.rb
Exception `LoadError' at /Users/aaron/.local/lib/ruby/site_ruby/1.9.1/rubygems.rb:1215 - cannot load such file -- rubygems/defaults/operating_system
Exception `LoadError' at /Users/aaron/.local/lib/ruby/site_ruby/1.9.1/rubygems.rb:1224 - cannot load such file -- rubygems/defaults/ruby
"method missing: to_ary"
Exception `NoMethodError' at test.rb:9 - undefined method `to_ary' for #&lt;Item:0x00000101079e80&gt;
"respond to: to_ary"
[aaron@higgins ~]$

Dispatching to method_missing can be expensive, and using exceptions for flow control even more problematic. The way we can get around this issue is by implementing to_ary and having it return nil:

class Item
  def respond_to?(name, visibility = false)
    p "respond to: #{name}"
    false # we don't respond to anything!!
  end

  def method_missing(name, *args)
    p "method missing: #{name}"
    super # if something?
    # do something else
  end

  private
  def to_ary
    nil
  end
end

[[Item.new]].flatten

Run the code again with -d, and you'll see no more calls to respond_to?, no more calls to method_missing, and no more exceptions raised:

[aaron@higgins ~]$ ruby -d test.rb
Exception `LoadError' at /Users/aaron/.local/lib/ruby/site_ruby/1.9.1/rubygems.rb:1215 - cannot load such file -- rubygems/defaults/operating_system
Exception `LoadError' at /Users/aaron/.local/lib/ruby/site_ruby/1.9.1/rubygems.rb:1224 - cannot load such file -- rubygems/defaults/ruby
[aaron@higgins ~]$

Array() and to_a

The Array() function exhibits a similar behavior, but with the to_a method. Try this same code, but rather than using Array#flatten, do this:

Array(Item.new)

We can fix the error produced by this code with a slight change to our class:

class Item
  def respond_to?(name, visibility = false)
    p "respond to: #{name}"
    false # we don't respond to anything!!
  end

  def method_missing(name, *args)
    p "method missing: #{name}"
    super # if something?
    # do something else
  end

  private
  def to_ary
    nil
  end
  alias :to_a :to_ary
end

[[Item.new]].flatten
Array(Item.new)

Hmmmm

Are warnings annoying? Yes. Is this strange behavior? I tend to think so. But if I'm forced to deal with a class the implements method_missing, I'd like to reduce the number of calls to method_missing.

Anyway. I hope you found this informative. Have a Happy Tuesday!!!!

<3 <3 <3 <3

Small Side Note

The reason there are exceptions coming from rubygems when -d is enabled is because rubygems attempts to require files that are not shipped with rubygems. These files are for packagers to provide. For example someone packaging rubygems for debian, may need to do customizations and those files are where that happens.

read more »

2011-06-27 @ 15:14

Mechanize 2.0.0 has been released!

ZOMG! Thanks to @drbrain, mechanize 2.0.0 has been released!

See the full announcement here!

read more »

2011-03-03 @ 12:24

Rack API is awkward

TL;DR: Rack API is poor when you consider streaming response bodies.

ZOMG!!!! HAPPY THURSDAY!!!! Maybe I shouldn't be so excited now. I want to talk about stuff I've been working on in Rails 3.1, and problems I'm encountering today. I want to use this blllurrrggghhh blog post to talk through through the problems I've been having, and to share the pain with others.

Pie is delicious!

One feature that would be useful to add to Rails is having a streaming response body. When Rails processes a response, the entire response is buffered in memory before it can be sent to the user. Some information like Content Length (among other things) is derived, and the response is sent.

Sometimes buffering a response is less than ideal. It would be nice if we could send the head tag along with any css or script includes to the browser as quickly as possible. Then the browser can download external resources while we're still processing data on the server. If this were possible, total response time may remain the same, but the time to first byte would be decreased and the page would load faster as external resource can be downloaded in parallel.

This feature sounds great, but there are many things to think about before it can be implemented. We need to support infinite streams, chunked encoding, prevent header manipulation, ensure database connections, blah, blah blah.

Rack interface

I'm getting ahead of myself. Before we get to our ultimate "pie in the sky" streaming solution, let's take a look at the Rack API. Rack defines an interface for writing web applications. A rack handler must respond to call which takes one parameter, the request environment. call must return a three item list of:

  • Response code
  • Headers
  • Body

The response code should be a number (like 200), the headers are a hash (like { 'X-Omg' => 'hello!' }). The body must respond to each and take a block. The body must yield a string to the block, and the string will be output to the client. Optionally, the body may respond to close, and rack will call close when output is complete.

An Example Rack application

Let's write an example application. Our sample application will simulate an ERb page. We'll add some sleep statements to simulate work happening during the ERb rendering process:

class FooApplication
  class ErbPage
    def to_a
      head = "the head tag"
      sleep(2)
      body = "the body tag"
      sleep(2)
      [head, body]
    end
  end

  def call(env)
    [200, {}, ErbPage.new.to_a]
  end
end

For the purposes of demonstration, we'll be using a fake implementation of rack:

class FakeRack
  def serve(application)
    status, headers, body = application.call({})
    p :status  => status
    p :headers => headers

    body.each do |string|
      p string
    end

    body.close if body.respond_to?(:close)
  end
end

If we feed our application through FakeRack like this:

app  = FooApplication.new
rack = FakeRack.new

rack.serve app

We'll see output from the rack application, and the total program run time is about 4 seconds:

$ time ruby foo.rb                                       
{:status=>200}
{:headers=>{}}
"the head tag"
"the body tag"

real    0m4.008s
user    0m0.003s
sys     0m0.003s

Great! So far, no problem. Why don't we add a middleware to time how long the response takes.

Rack Middleware

Rack Middleware is simply another Rack application. With Rack, we set up a linked list of middleware that eventually point to the real application. We give the head of the linked list to Rack, Rack calls call on the head of the list, and it is the list's responsibility to call call on it's link.

Here, we'll write a Rack middleware to measure how long the "ERb render" takes and add a header indicating the response time.

class ResponseTimer
  def initialize(app)
    @app = app
  end

  def call(env)
    now                        = Time.now
    status, headers, body      = @app.call(env)
    headers['X-Response-Took'] = Time.now - now

    [status, headers, body]
  end
end

When we construct the ResponseTimer, we pass it the real application. Then we pass the response timer instance to rack:

app   = FooApplication.new
timer = ResponseTimer.new app
rack  = FakeRack.new

rack.serve timer

When rack calls call on the response timer, it records the current time, then calls call on the real application. When the real application returns, the response timer then adds a header with the time delta. The output of this program will look like this:

$ time ruby foo.rb 
{:status=>200}
{:headers=>{"X-Response-Took"=>3.999937}}
"the head tag"
"the body tag"

real    0m4.010s
user    0m0.004s
sys     0m0.004s

Speeding up our response time

We've noticed a problem with our Rack application. When a client connects, it takes 4 seconds before they receive any data! It would be nice if we could feed our client the head tag ASAP so they can download external resources.

We know that Rack will call each and (depending on your webserver) immediately send data to the client. Rather than computing values in ERb ahead of time, we'll compute them when Rack asks for them (when each is called).

Let's refactor the ERb page to be lazy about calculating values:

class FooApplication
  class ErbPage
    def each
      head = "the head tag"
      yield head

      sleep(2)

      body = "the body tag"
      yield body

      sleep(2)
    end
  end

  def call(env)
    [200, {}, ErbPage.new]
  end
end

Now no values are calculated until rack calls each on our body. If we run the program, we'll see output from the application more quickly than before.

However, the output is somewhat strange:

$ time ruby foo.rb
{:status=>200}
{:headers=>{"X-Response-Took"=>1.1e-05}}
"the head tag"
"the body tag"

real    0m4.032s
user    0m0.027s
sys     0m0.016s

The time command reports that our response was about 4 seconds. But our response header says that the response took nearly 0 seconds! Why is this?

If we look closely at our timer middleware, we can see it is only timing how long it took for call to return.

We cannot guarantee that any processing happened during the call method.

Let me say that again:

We cannot guarantee that any processing happened during the call method.

We wanted our response timer to time how long the ERb took to render, but really it is just timing how long the call method took.

ZOMG HOW FIX?!?

Iterating over the body

One way to fix is to iterate over the body. If the timer iterates over the body, then we can calculate the real time:

class ResponseTimer
  def initialize(app)
    @app = app
  end

  def call(env)
    now                        = Time.now
    status, headers, body      = @app.call(env)

    newbody = []
    body.each { |str| newbody << str }
    headers['X-Response-Took'] = Time.now - now

    [status, headers, newbody]
  end
end

But this solution is no good! Our response timer now buffers the response, and our client ends up waiting for 4 seconds before they get any data.

We know that Rack calls close on the body after it's done processing the request. Why don't we try hooking on that method?

Introducing a Proxy Object

One way we can hook on to the close method is by wrapping the response body in a proxy object. Then we can intercept calls made on the body and perform any work we need done:

class ResponseTimer
  class TimerProxy
    def initialize(body)
      @now     = Time.now
      @body    = body
    end

    def close
      @body.close if @body.respond_to?(:close)

      $stderr.puts({'X-Response-Took' => (Time.now - @now)})
    end

    def each(&block)
      @body.each(&block)
    end
  end

  def initialize(app)
    @app = app
  end

  def call(env)
    status, headers, body = @app.call(env)

    [status, headers, TimerProxy.new(body)]
  end
end

Wow! Suddenly our middleware is not so simple. This proxy solution is sub-optimal for a few reasons. We're required to make a new object for every request, and our proxy object will add another stack frame between calls from rack to the response body. Even worse, every middleware that needs to do work after the response is finished must define this proxy object.

This solution does get the job done. If we look at the output from the program, we'll see that the TimerProxy in fact measures ERb processing time correctly:

$ time ruby foo.rb 
{:status=>200}
{:headers=>{}}
"the head tag"
"the body tag"
{"X-Response-Took"=>4.000268}

real    0m4.044s
user    0m0.029s
sys     0m0.015s

Diligent readers will note that the response time is no longer part of the response headers. This is because when the body is flushed, the headers must be flushed too. We no longer have the opportunity to add extra headers when each is called on the body.

Our solution isn't too bad, but it actually isn't complete. The full awkwardness of this API along with a complete solution can actually be felt (and read) in the Rack source itself.

Lady Gaga Solution

Another possible solution is to decorate the body using a module. We can define a module, then simply call extend on the body with the module:

class ResponseTimer
  def initialize(app)
    @app = app
  end

  def call(env)
    status, headers, body      = @app.call(env)
    body.extend(Module.new {
      now = Time.now

      define_method(:close) do
        super if defined?(super)

        $stderr.puts({'X-Response-Took' => (Time.now - now)})
      end
    })

    [status, headers, body]
  end
end

The body is extended with an anonymous module. During module definition, the time is recorded. We use define_method because it uses a lambda which will keep a reference to the previously calculated time. In the close method, we call super if it's defined, then output our time.

This example also works, but has a few downsides. It is different than previous examples because we are timing only the ERb rendering and not call plus ERb rendering. Using this solution, we're required to create a new module on every request, and also break method caching on every request. Similar to the proxy object solution, we must create a new module and extend for every middleware that must to processing after the response is finished.

ZOMG YOUR EXAMPLE IS CONTRIVED

Yup. But I merely simplified a real world problem. As I mentioned earlier, you can see the awkwardness of this API in rack.

But now that we know about this problem, we can identify middleware that will break streaming responses. For example, Rails defines a middleware that checks connections back in to the connection pool. If our ERb in Rails was streaming, we would lose the database connection during ERb render. The same is true with the query cache in active record. Surely, these cannot be the only middleware that will break when a streaming body is used!

Lifecycle hooks

I think a good solution to this problem would be if Rack provided lifecycle hooks. A Place where we can say "run this when the response is done". We can define something like that today using middleware:

class EndOfLife
  attr_reader :callbacks

  def initialize(app)
    @app       = app
    @callbacks = []
  end

  def call(env)
    status, headers, body = @app.call(env)
    body.extend(Module.new {
      attr_accessor :eol

      def close
        super if defined?(super)
        eol.callbacks.each { |cb| cb.call }
      end
    })
    body.eol = self

    [status, headers, body]
  end
end

app = FooApplication.new
eol = EndOfLife.new app
eol.callbacks << lambda { puts "it finished!" }

rack  = FakeRack.new

rack.serve eol

This keeps us from defining many proxy objects or module extensions during a response. We only define one module extension, and hook any "end of life" hooks on to this instance. The downside is that we cannot guarantee the position of this middleware in the middleware linked list. That means that the "end of life" middleware may not actually execute at the end of the response!

A "real" solution

Rack's interface is simple, and I like that. The simplicity is attractive, but the API seems to fall on it's face when we start talking about streaming web servers. If I remember correctly, Apache 1.0 modules suffered the same problems that Rack is presenting us today. Maybe we should look at Apache 2.0 buckets and filters and design our API using patterns from a project that has already solved this problem.

ZOMG I AM TIRED OF TYPING!!

I'm not happy with any of the solutions I've presented. All of them have downsides that I find unattractive. We can live with the downsides, but life will suck. If any of you dear readers have better solutions for me, I am all ears!

Thanks for listening, and HAVE A GREAT DAY!!!!

<3 <3 <3 <3 <3

Edit: I just noticed that Rack contains a “timer” middleware similar to the one I’ve implemented in this blog post. You can view the broken middleware here.

read more »

2011-01-21 @ 23:50

RubyCommitters.org winners

Hi everybody!

I’m going to get this over. To me, this is like removing a bandage, I must do it all at once. I’ve increased the Love Bucks, so I’ll list the winners along with their prizes:

Winners, please send me your paypal address and I’ll send you the Love Bucks.

read more »

2011-01-20 @ 20:17

Ruby Committers Design Contest Update!

Update:

Hey folks, somehow I made a mistake and one entry was omitted from the index. Please take a look at static mirror, and vote for it here if you like it! Apologies to the people that submitted this entry. I didn’t mean to miss this!

Non Update:

The deadline for entry was last night. I’ve compiled a static mirror of all of the entries here.

I am humbled, and incredibly impressed by the number and quality of all of the entries. I can’t thank everyone enough for doing this. The quality of the results is so good, that I would really appreciate feedback from the community.

Please check out the entries list. Carefully review all entries. Then, leave feedback by either voting up the entry on github, or leaving a comment on my blog. I will take all feedback in to consideration tomorrow when I decide the winner.

Remember that when you judge these designs, please judge just the design. If links are broken, or encoding looks wrong, or something is a little bit off, it’s probably my fault. Pick the designs you like the most, and we’ll polish them up after declaring a winner.

Please tell me about all of the designs you like. If you like 5 of them, let me know. It’s also helpful to know why you like those designs.

As I said, I’ll declare the winner tomorrow before 23:59:59 PST.

Again, I really can’t thank everyone enough for doing this. The results are so much more than I could have hoped for, or expected. You have all outdone yourselves! No amount of Love Bucks can repay for the awesomeness I’ve witnessed, the best I can do is keep fixing bugs on Ruby, Rails, and any other project I get my hands on!

Thank you!

read more »

2011-01-04 @ 10:06

rubycommitters.org design contest!

omg.... OMG.... ZOMG!!!!! HAPPY TUESDAY TO EVERYONE IN THE WORLD!!!!

Alright, now that the formalities are out of the way, LET’S GET DOWN TO BUSINESS! Some of you may or may not know, I joined the Ruby core team in October 2009. I am very proud to be a member of ruby-core. However, I have noticed that the Ruby core team does not have an awesome website like the Rails core team.

To rectify this situation, I registered a domain: rubycommitters.org. Unfortunately, my design skills are… sub-par. I would like to fist fix this by having a contest. I would like you to code the design for rubycommitters.org.

How do I enter?

Just fork the rubycommitters.org project on github, follow the instructions in the README, and make the site look good! When you’re done, send me a pull request.

How many times can I enter?

As many as you want. However, you can only win one place.

When is the due date?

You must send me the pull request by January 19th, 23:59:59 PST.

When will winners be announced?

I will announce winners by January 21st, 23:59:59 PST. Once I’ve decided, I’ll email the winners to get their PayPal information and transfer the money Love Bucks.

What are the prizes?

My employer pays me in Tenderlove Cash (from here on referred to as “Love Bucks”), so the prize will be in Love Bucks. Fortunately for all of us, Love Bucks exchange with the US Dollar at a 1:1 ratio. So I will send you your prize via PayPal in the form of US Dollars.

First Place: 200 300 Love Bucks (in the form of US Dollars) Second Place: 100 Love Bucks (in the form of US Dollars)

All entrants will win a hug from me that is redeemable next time we meet each other!

How will entries be judged?

By me, however I want. I’ll probably ask the intertubes for help, but I have the final say.

Why is the prize so low?

These Love Bucks are coming out of my own wallet! Give me a break!

Conclusion

Since I announced I needed help last weekend, already 9 people have started working, so you’d better get cracking!

I am proud to be a member of the Ruby core team, I’m proud to be involved in the Rails development team, and most of all I’m proud to be a member of the best development community in the world. Thanks to everyone that makes being a member of the Ruby community so awesome!

EDIT!!!!

@mariozig has graciously offered to up the ante. He has donated 100 Love Bucks to the first place winner! That’s a total of 300 Love Bucks for the person who wins first place! Awesome!!!

read more »