YAGNI methods slow us down

Jun 4, 2014 @ 11:49 am

TL;DR: OutputBuffer subclasses SafeBuffer which forces us to do runtime checks that are probably unnecessary

I made a post about YAGNI methods hurting you where I said I would provide two examples, but then I got tired of writing the article so I just did one example. Here is the other example! The previous example demonstrated a memory leak that was introduced because the “footprint” (the number of methods implemented on the object) was too large.

This example will show how these YAGNI methods are impacting performance of your Rails application, and we’ll talk about how to fix it.

This problem is in ActionView::OutputBuffer which inherits from ActiveSupport::SafeBuffer, which inherits from Ruby’s String. Let’s talk about the behavior ActiveSupport::SafeBuffer first, then we’ll see how it impacts to the performance of ActionView::OutputBuffer.

`ActiveSupport::SafeBuffer`

This class is the class that’s used to mark a string as being “html safe”. It’s how Rails detects whether a string has been html escaped or not. In Rails, a normal Ruby string is considered to be not “html safe”. For example:

>> x = "foo"
=> "foo"
>> x.class
=> String
>> x.html_safe?
=> false

If we call html_safe on the string, it returns an instance of ActiveSupport::SafeBuffer that is “tagged” as html safe:

>> x = "foo"
=> "foo"
>> y = x.html_safe
=> "foo"
>> y.class
=> ActiveSupport::SafeBuffer
>> y.html_safe?
=> true

Whenever we html escape a string in Rails, it returns a safe buffer:

>> x = "<html>"
=> "<html>"
>> x.html_safe?
=> false
>> y = ERB::Util.html_escape x
=> "&lt;html&gt;"
>> y.class
=> ActiveSupport::SafeBuffer
>> y.html_safe?
=> true

Now, using the html_safe? predicate, we can easily tell the difference between strings that have been tagged as “html safe” and strings that haven’t been tagged as “html safe” (side note: just like encodings, tagging something does not mean that it actually is correct. We can tag things as “html safe” without them actually being html safe).

We can also concatenate unsafe strings to a safe string, and it will be automatically escaped:

>> x = ERB::Util.html_escape "<html>"
=> "&lt;html&gt;"
>> x.concat "<blink>"
=> "&lt;html&gt;&lt;blink&gt;"
>> x.html_safe?
=> true

Finally, we can concatenate safe strings and they will not be double escaped:

>> x = ERB::Util.html_escape "<html>"
=> "&lt;html&gt;"
>> x.concat ERB::Util.html_escape "<blink>"
=> "&lt;html&gt;&lt;blink&gt;"
>> x.html_safe?
=> true

Infecting a SafeBuffer

So far, the html_safe? predicate is a 1:1 relationship with the class. Meaning that if the class of the string is ActiveSupport::SafeBuffer, then html_safe? would return true, and if the class was String, html_safe? would return false.

Unfortunately it is not a 1:1 relationship. We can mutate a SafeBuffer, making it unsafe again. For example:

>> x = ERB::Util.html_escape "<html>"
=> "&lt;html&gt;"
>> dangerous = "&gt;<script>"
=> "&gt;<script>"
>> x.gsub!(/&gt;/, dangerous)
=> "&lt;html&gt;<script>"
>> x
=> "&lt;html&gt;<script>"
>> x.class
=> ActiveSupport::SafeBuffer
>> x.html_safe?
=> false

You can see that the string in dangerous has been embedded in to the SafeBuffer without being escaped. The class of x is still SafeBuffer, but calling html_safe? will return false. As you can see, the return value of html_safe? is not a 1:1 relationship with the class.

Concatenating to an infected SafeBuffer

Our infected SafeBuffer still supports concatenation:

>> x
=> "&lt;html&gt;<script>"
>> x.class
=> ActiveSupport::SafeBuffer
>> x.html_safe?
=> false
>> x.concat "<blink>"
=> "&lt;html&gt;<script><blink>"

But you can see that the “<blink>” string was not escaped. This means that the concat behavior on a SafeBuffer changes depending on the value of html_safe?. If you look at the implementation of concat, along with it’s helper method, you can indeed see that this is true.

What is OutputBuffer?

OutputBuffer is a buffer that is fed to Rack and then output to the client over the wire. Templates instantiate an OutputBuffer and concatenate computed output to the buffer.

Impact of SafeBuffer on OutputBuffer

How does SafeBuffer impact OutputBuffer? Let’s take a look at a compiled ERB template to find out. Here is an ERB template after it’s been compiled to Ruby:

@output_buffer = output_buffer || ActionView::OutputBuffer.new
@output_buffer.safe_append='      <tr>
        <td>'.freeze
@output_buffer.append=( book.name )

The output isn’t pretty, but you can see that we essentially call two methods on the output buffer: append=, and safe_append= (why we have an = is a mystery for the ages). append= is used when there are dynamic values like <%= link_to ... %>, and safe_append= is used for string literals in your template.

Let’s look at the implementation of safe_append=:

def safe_concat(value)
  return self if value.nil?
  super(value.to_s)
end
alias :safe_append= :safe_concat

First, notice that value is never nil. The ERB compiler ensures that fact, so the value.nil? check is useless. However, this blurrrggghhh post is about how YAGNI methods are hurting us. We do need to concatenate, so you are gonna need this.

OutputBuffer is a subclass of SafeBuffer, and safe_concat supers in to SafeBuffer. Let’s look at the implementation of safe_concat on SafeBuffer:

def safe_concat(value)
  raise SafeConcatError unless html_safe?
  original_concat(value)
end

safe_concat raises an exception if the current object is not html_safe?. Every time our ERB template concatenates a string, it checks whether or not the current object is “html safe”.

As we saw earlier, the only way to make an instance of a SafeBuffer not html safe is by calling an unsafe method on that instance.

YAGNI method pain

Every time we concatenate to an OutputBuffer object, we’re forced to check a flag. This flag is directly related to people calling gsub! on the OutputBuffer. This bothers me because, as you can imagine, we concatenate on to the OutputBuffer extremely frequently.

I’m betting that it’s extremely rare for someone to call gsub! on an OutputBuffer. I’m also willing to bet that people calling gsub! on an OutputBuffer would be willing to call to_s on it before doing their mutation, so why do we “support” this?

Our OutputBuffer is punishing most people for a feature that is extremely rare. This is thanks to the features we got “for free” from our superclass.

Secondly, if you look in the same file that defines OutputBuffer, you’ll see a class called StreamingBuffer. This class is meant to be a replacement for OutputBuffer, but it streams to the client. Notice that this class does not inherit from SafeBuffer. Further evidence that gsub! on OutputBuffer is a YAGNI method.

How do we fix this?

I think we can fix this by disconnecting OutputBuffer from it’s superclass. I suspect that the methods people actually use on OutputBuffer are extremely few. I’d also say that this class should be completely private. In other words, we (the Rails team) should design our API such that people never actually touch instances of the OutputBuffer.

If we disconnect OutputBuffer from it’s superclass, I think we can even do some tricks with the ERB compiler to attain faster output.