http://tenderlovemaking.com/
Tender Lovemaking
2024-02-16T11:56:28-08:00
Aaron Patterson
tenderlove@ruby-lang.org
© Aaron Patterson - tenderlove@ruby-lang.org
http://tenderlovemaking.com/2024/02/16/using-serial-ports-with-ruby.html
Using Serial Ports with Ruby
2024-02-16T11:56:28-08:00
2024-02-16T11:56:28-08:00
<p>Lets mess around with serial ports today!
I love doing hardware hacking, and dealing with serial ports is a common thing you have to do when working with embedded systems.
Of course I want to do everything with Ruby, and I had found Ruby serial port libraries to be either lacking, or too complex, so I decided to <a href="https://github.com/tenderlove/uart">write my own</a>.
I feel like I’ve not done a good enough job promoting the library, so today we’re going to mess with serial ports using the <a href="https://github.com/tenderlove/uart">UART gem</a>.
Don’t let the last commit date on the repo fool you, despite being over 6 years ago, this library is actively maintained (and I use it every day!).</p>
<p>I’ve got a <a href="https://www.gqelectronicsllc.com/comersus/store/comersus_viewItem.asp?idProduct=4579">GMC-320 Geiger counter</a>.
Not only is the price pretty reasonable, but it also has a serial port interface!
You can log data, then download the logged data via serial port.
Today we’re just going to write a very simple program that gets the firmware version via serial port, and then gets live updates from the device.
This will allow us to start with UART basics in Ruby, and then work with streaming data and timeouts.</p>
<p>The company that makes the Geiger counter published <a href="https://www.gqelectronicsllc.com/download/GQ-RFC1201.txt">a spec for the UART commands that the device supports</a>, so all we need to do is send the commands and read the results.</p>
<p>According to the spec, the default UART config for my Geiger counter is 115200 BPS, Data bit: 8, no parity, Stop bit: 1, and no control.
This is pretty easy to configure with the UART gem, all of these values are default except for the baud rate.
The UART gem defaults to 9600 for the baud rate, so that’s the only thing we’ll have to configure.</p>
<h2 id="getting-the-hardware-version">Getting the hardware version</h2>
<p>To get the hardware model and version, we just have to send <code><GETVER>></code> over the serial port and then read the response.
Let’s write a small program that will fetch the hardware model and version and print them out.</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">uart</span><span class="delimiter">"</span></span>
<span class="constant">UART</span>.open <span class="predefined-constant">ARGV</span>[<span class="integer">0</span>], <span class="integer">115200</span> <span class="keyword">do</span> |serial|
<span class="comment"># Send the "get version" command</span>
serial.write <span class="string"><span class="delimiter">"</span><span class="content"><GETVER>></span><span class="delimiter">"</span></span>
<span class="comment"># read and print the result</span>
puts serial.read
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>The first thing we do is require the <code>uart</code> library (make sure to <code>gem install uart</code> if you haven’t done so yet).
Then we open the serial interface.
We’ll pass the tty file in via the command line, so <code>ARGV[0]</code> will have the path to the tty.
When I plug my Geiger counter in, it shows up as <code>/dev/tty.usbserial-111240</code>.
We also configure the baud rate to 115200.</p>
<p>Once the serial port is open, we are free to read and write to it as if it were a Ruby file object.
In fact, this is because it really is just a regular file object.</p>
<p>First we’ll send the command <code><GETVER>></code>, then we’ll read the response from the serial port.</p>
<p>Here’s what it looks like when I run it on my machine:</p>
<pre><code>$ ruby rad.rb /dev/tty.usbserial-111240
GMC-320Re 4.09
</code></pre>
<h2 id="live-updates">Live updates</h2>
<p>According to the documentation, we can get live updates from the hardware.
To do that, we just need to send the <code><HEARTBEAT1>></code> command.
Once we send that command, the hardware will write a value to the serial port every second, and it’s our job to read the data when it becomes available.
We can use <code>IO#wait_readable</code> to wait until there is data to be read from the serial port.</p>
<p>According to the specification, there are two bytes (a 16 bit integer), and we need to ignore the top 2 bits.
We’ll create a mask to ignore the top two bits, and combine that with the two bytes we read to get our value:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">uart</span><span class="delimiter">"</span></span>
<span class="constant">MASK</span> = (~(<span class="integer">3</span> << <span class="integer">14</span>)) & <span class="integer">0xFFFF</span>
<span class="constant">UART</span>.open <span class="predefined-constant">ARGV</span>[<span class="integer">0</span>], <span class="integer">115200</span> <span class="keyword">do</span> |serial|
<span class="comment"># turn on heartbeat</span>
serial.write <span class="string"><span class="delimiter">"</span><span class="content"><HEARTBEAT1>></span><span class="delimiter">"</span></span>
loop <span class="keyword">do</span>
<span class="keyword">if</span> serial.wait_readable
count = ((serial.readbyte << <span class="integer">8</span>) | serial.readbyte) & <span class="constant">MASK</span>
p count
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">ensure</span>
<span class="comment"># make sure to turn off heartbeat</span>
serial.write <span class="string"><span class="delimiter">"</span><span class="content"><HEARTBEAT0>></span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>After we’ve sent the “start heartbeat” command, we enter a loop.
Inside the loop, we block until there is data available to read by calling <code>serial.wait_readable</code>.
Once there is data to read, we’ll read two bytes and combine them to a 16 bit integer.
Then we mask off the integer using the <code>MASK</code> constant so that the two top bits are ignored.
Finally we just print out the count.</p>
<p>The <code>ensure</code> section ensures that when the program exits, we’ll tell the hardware “hey, we don’t want to stream data anymore!”</p>
<p>When I run this on my machine, the output is like this (I hit Ctrl-C to stop the program):</p>
<pre><code>$ ruby rad.rb /dev/tty.usbserial-111240
0
0
0
0
0
0
0
1
0
1
0
0
0
1
^Crad.rb:10:in 'IO#wait_readable': Interrupt
from rad.rb:10:in 'block (2 levels) in <main>'
from <internal:kernel>:191:in 'Kernel#loop'
from rad.rb:9:in 'block in <main>'
from /Users/aaron/.rubies/arm64/ruby-trunk/lib/ruby/gems/3.4.0+0/gems/uart-1.0.0/lib/uart.rb:57:in 'UART.open'
from rad.rb:5:in '<main>'
</code></pre>
<p>Lets do two improvements, and then call it a day.
First, lets specify a timeout, then lets calculate the CPM.</p>
<h2 id="specifying-a-timeout">Specifying a timeout</h2>
<p>Currently, <code>serial.wait_readable</code> will block forever, but we expect an update from the hardware about every second.
If it takes longer than say 2 seconds for data to be available, then something must be wrong and we should print a message or exit the program.</p>
<p>Specifying a timeout is quite easy, we just pass the timeout (in seconds) to the <code>wait_readable</code> method like below:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">uart</span><span class="delimiter">"</span></span>
<span class="constant">MASK</span> = (~(<span class="integer">3</span> << <span class="integer">14</span>)) & <span class="integer">0xFFFF</span>
<span class="constant">UART</span>.open <span class="predefined-constant">ARGV</span>[<span class="integer">0</span>], <span class="integer">115200</span> <span class="keyword">do</span> |serial|
<span class="comment"># turn on heartbeat</span>
serial.write <span class="string"><span class="delimiter">"</span><span class="content"><HEARTBEAT1>></span><span class="delimiter">"</span></span>
loop <span class="keyword">do</span>
<span class="keyword">if</span> serial.wait_readable(<span class="integer">2</span>)
count = ((serial.readbyte << <span class="integer">8</span>) | serial.readbyte) & <span class="constant">MASK</span>
p count
<span class="keyword">else</span>
<span class="global-variable">$stderr</span>.puts <span class="string"><span class="delimiter">"</span><span class="content">oh no, something went wrong!</span><span class="delimiter">"</span></span>
exit(<span class="integer">1</span>)
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">ensure</span>
<span class="comment"># make sure to turn off heartbeat</span>
serial.write <span class="string"><span class="delimiter">"</span><span class="content"><HEARTBEAT0>></span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>When data becomes available, <code>wait_readable</code> will return a truthy value, and if the timeout was reached, then it will return a falsy value.
So, if it takes more than 2 seconds for data to become available <code>wait_readable</code> will return <code>nil</code>, and we print an error message and exit the program.</p>
<h2 id="calculating-cpm">Calculating CPM</h2>
<p>CPM stands for “counts per minute”, meaning the number of ionization events the hardware has detected within one minute.
However, the value that we’re reading from the serial port is actually the “counts per second” (or ionization events the hardware detected in the last second).
Most of the time that value is 0 so it’s not super fun to read.
Lets calculate the CPM and print that instead.</p>
<p>We know the samples are arriving about every second, so I’m just going to modify this code to keep a list of the last 60 samples and just sum those:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">uart</span><span class="delimiter">"</span></span>
<span class="constant">MASK</span> = (~(<span class="integer">3</span> << <span class="integer">14</span>)) & <span class="integer">0xFFFF</span>
<span class="constant">UART</span>.open <span class="predefined-constant">ARGV</span>[<span class="integer">0</span>], <span class="integer">115200</span> <span class="keyword">do</span> |serial|
<span class="comment"># turn on heartbeat</span>
serial.write <span class="string"><span class="delimiter">"</span><span class="content"><HEARTBEAT1>></span><span class="delimiter">"</span></span>
samples = []
loop <span class="keyword">do</span>
<span class="keyword">if</span> serial.wait_readable(<span class="integer">2</span>)
<span class="comment"># Push the sample on the list</span>
samples.push(((serial.readbyte << <span class="integer">8</span>) | serial.readbyte) & <span class="constant">MASK</span>)
<span class="comment"># Make sure we only have 60 samples in the list</span>
<span class="keyword">while</span> samples.length > <span class="integer">60</span>; samples.shift; <span class="keyword">end</span>
<span class="comment"># Print a sum of the samples (if we have 60)</span>
p <span class="key">CPM</span>: samples.sum <span class="keyword">if</span> samples.length == <span class="integer">60</span>
<span class="keyword">else</span>
<span class="global-variable">$stderr</span>.puts <span class="string"><span class="delimiter">"</span><span class="content">oh no, something went wrong!</span><span class="delimiter">"</span></span>
exit(<span class="integer">1</span>)
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">ensure</span>
<span class="comment"># make sure to turn off heartbeat</span>
serial.write <span class="string"><span class="delimiter">"</span><span class="content"><HEARTBEAT0>></span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>Here is the output on my machine:</p>
<pre><code>$ ruby rad.rb /dev/tty.usbserial-111240
{:CPM=>9}
{:CPM=>8}
{:CPM=>8}
{:CPM=>8}
{:CPM=>8}
{:CPM=>9}
{:CPM=>9}
{:CPM=>9}
</code></pre>
<p>After about a minute or so, it starts printing the CPM.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I love playing with embedded systems as well as dealing with UART.
Next time you need to do any serial port communications in Ruby, UART to consider using <a href="https://github.com/tenderlove/uart">my gem</a>.
Thanks for your time, and I hope you have a great weekend!</p>
http://tenderlovemaking.com/2023/09/02/fast-tokenizers-with-stringscanner.html
Fast Tokenizers with StringScanner
2023-09-02T13:00:14-07:00
2023-09-02T13:00:14-07:00
<p>Lately I’ve been messing around with writing a <a href="https://github.com/tenderlove/tinygql">GraphQL parser called TinyGQL</a>.
I wanted to see how fast I could make a GraphQL parser without writing any C extensions.
I <a href="https://railsatscale.com/2023-08-29-ruby-outperforms-c/">think I did pretty well</a>, but I’ve learned some tricks for speeding up parsers and I want to share them.</p>
<p>Today we’re going to specifically look at the lexing part of parsing.
Lexing is just breaking down an input string in to a series of tokens.
It’s the parser’s job to interpret those tokens.
My favorite tool for tokenizing documents in Ruby is <code>StringScanner</code>.
Today we’re going to look at a few tricks for speeding up <code>StringScanner</code> based lexers.
We’ll start with a very simple GraphQL lexer and apply a few tricks to speed it up.</p>
<h2 id="a-very-basic-lexer">A very basic lexer</h2>
<p>Here is the lexer we’re going to work with today:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">strscan</span><span class="delimiter">"</span></span>
<span class="keyword">class</span> <span class="class">Lexer</span>
<span class="constant">IDENTIFIER</span> = <span class="regexp"><span class="delimiter">/</span><span class="content">[_A-Za-z][_0-9A-Za-z]*</span><span class="char">\b</span><span class="delimiter">/</span></span>
<span class="constant">WHITESPACE</span> = <span class="regexp"><span class="delimiter">%r{</span><span class="content"> [, </span><span class="char">\c\r</span><span class="char">\n</span><span class="char">\t</span><span class="content">]+ </span><span class="delimiter">}</span><span class="modifier">x</span></span>
<span class="constant">COMMENTS</span> = <span class="regexp"><span class="delimiter">%r{</span><span class="content"> </span><span class="char">\#</span><span class="content">.*$ </span><span class="delimiter">}</span><span class="modifier">x</span></span>
<span class="constant">INT</span> = <span class="regexp"><span class="delimiter">/</span><span class="content">[-]?(?:[0]|[1-9][0-9]*)</span><span class="delimiter">/</span></span>
<span class="constant">FLOAT_DECIMAL</span> = <span class="regexp"><span class="delimiter">/</span><span class="content">[.][0-9]+</span><span class="delimiter">/</span></span>
<span class="constant">FLOAT_EXP</span> = <span class="regexp"><span class="delimiter">/</span><span class="content">[eE][+-]?[0-9]+</span><span class="delimiter">/</span></span>
<span class="constant">FLOAT</span> = <span class="regexp"><span class="delimiter">/</span><span class="inline"><span class="inline-delimiter">#{</span><span class="constant">INT</span><span class="inline-delimiter">}</span></span><span class="inline"><span class="inline-delimiter">#{</span><span class="constant">FLOAT_DECIMAL</span><span class="inline-delimiter">}</span></span><span class="inline"><span class="inline-delimiter">#{</span><span class="constant">FLOAT_EXP</span><span class="inline-delimiter">}</span></span><span class="content">|</span><span class="inline"><span class="inline-delimiter">#{</span><span class="constant">FLOAT_DECIMAL</span><span class="inline-delimiter">}</span></span><span class="content">|</span><span class="inline"><span class="inline-delimiter">#{</span><span class="constant">FLOAT_EXP</span><span class="inline-delimiter">}</span></span><span class="delimiter">/</span></span>
<span class="constant">KEYWORDS</span> = [ <span class="string"><span class="delimiter">"</span><span class="content">on</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">fragment</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">true</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">false</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">null</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">query</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">mutation</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">subscription</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">schema</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">scalar</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">type</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">extend</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">implements</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">interface</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">union</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">enum</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">input</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">directive</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">repeatable</span><span class="delimiter">"</span></span>
].freeze
<span class="constant">KW_RE</span> = <span class="regexp"><span class="delimiter">/</span><span class="inline"><span class="inline-delimiter">#{</span><span class="constant">Regexp</span>.union(<span class="constant">KEYWORDS</span>.sort)<span class="inline-delimiter">}</span></span><span class="char">\b</span><span class="delimiter">/</span></span>
<span class="constant">KW_TABLE</span> = <span class="constant">Hash</span>[<span class="constant">KEYWORDS</span>.map { |kw| [kw, kw.upcase.to_sym] }]
<span class="keyword">module</span> <span class="class">Literals</span>
<span class="constant">LCURLY</span> = <span class="string"><span class="delimiter">'</span><span class="content">{</span><span class="delimiter">'</span></span>
<span class="constant">RCURLY</span> = <span class="string"><span class="delimiter">'</span><span class="content">}</span><span class="delimiter">'</span></span>
<span class="constant">LPAREN</span> = <span class="string"><span class="delimiter">'</span><span class="content">(</span><span class="delimiter">'</span></span>
<span class="constant">RPAREN</span> = <span class="string"><span class="delimiter">'</span><span class="content">)</span><span class="delimiter">'</span></span>
<span class="constant">LBRACKET</span> = <span class="string"><span class="delimiter">'</span><span class="content">[</span><span class="delimiter">'</span></span>
<span class="constant">RBRACKET</span> = <span class="string"><span class="delimiter">'</span><span class="content">]</span><span class="delimiter">'</span></span>
<span class="constant">COLON</span> = <span class="string"><span class="delimiter">'</span><span class="content">:</span><span class="delimiter">'</span></span>
<span class="constant">VAR_SIGN</span> = <span class="string"><span class="delimiter">'</span><span class="content">$</span><span class="delimiter">'</span></span>
<span class="constant">DIR_SIGN</span> = <span class="string"><span class="delimiter">'</span><span class="content">@</span><span class="delimiter">'</span></span>
<span class="constant">EQUALS</span> = <span class="string"><span class="delimiter">'</span><span class="content">=</span><span class="delimiter">'</span></span>
<span class="constant">BANG</span> = <span class="string"><span class="delimiter">'</span><span class="content">!</span><span class="delimiter">'</span></span>
<span class="constant">PIPE</span> = <span class="string"><span class="delimiter">'</span><span class="content">|</span><span class="delimiter">'</span></span>
<span class="constant">AMP</span> = <span class="string"><span class="delimiter">'</span><span class="content">&</span><span class="delimiter">'</span></span>
<span class="keyword">end</span>
<span class="constant">ELLIPSIS</span> = <span class="string"><span class="delimiter">'</span><span class="content">...</span><span class="delimiter">'</span></span>
include <span class="constant">Literals</span>
<span class="constant">PUNCTUATION</span> = <span class="constant">Regexp</span>.union(<span class="constant">Literals</span>.constants.map { |name|
<span class="constant">Literals</span>.const_get(name)
})
<span class="constant">PUNCTUATION_TABLE</span> = <span class="constant">Literals</span>.constants.each_with_object({}) { |x,o|
o[<span class="constant">Literals</span>.const_get(x)] = x
}
<span class="keyword">def</span> <span class="function">initialize</span> doc
<span class="instance-variable">@doc</span> = doc
<span class="instance-variable">@scan</span> = <span class="constant">StringScanner</span>.new doc
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">next_token</span>
<span class="keyword">return</span> <span class="keyword">if</span> <span class="instance-variable">@scan</span>.eos?
<span class="keyword">case</span>
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">WHITESPACE</span>) <span class="keyword">then</span> [<span class="symbol">:WHITESPACE</span>, s]
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">COMMENTS</span>) <span class="keyword">then</span> [<span class="symbol">:COMMENT</span>, s]
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">ELLIPSIS</span>) <span class="keyword">then</span> [<span class="symbol">:ELLIPSIS</span>, s]
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">PUNCTUATION</span>) <span class="keyword">then</span> [<span class="constant">PUNCTUATION_TABLE</span>[s], s]
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">KW_RE</span>) <span class="keyword">then</span> [<span class="constant">KW_TABLE</span>[s], s]
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">IDENTIFIER</span>) <span class="keyword">then</span> [<span class="symbol">:IDENTIFIER</span>, s]
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">FLOAT</span>) <span class="keyword">then</span> [<span class="symbol">:FLOAT</span>, s]
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">INT</span>) <span class="keyword">then</span> [<span class="symbol">:INT</span>, s]
<span class="keyword">else</span>
[<span class="symbol">:UNKNOWN_CHAR</span>, <span class="instance-variable">@scan</span>.getch]
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>It only tokenizes a subset of GraphQL. Namely, it omits string literals.
Matching string literals is kind of gross, and I wanted to keep this example small, so I removed them.
I have a large document that I’ll use to measure some performance aspects of this lexer, and if you want to try it out, you can find the document <a href="https://github.com/tenderlove/tinygql/blob/main/benchmark/fixtures/negotiate.gql">here</a>.</p>
<p>To use the lexer, just pass the document you want to tokenize, then repeatedly call <code>next_token</code> on the lexer until it returns <code>nil</code>:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>lexer = <span class="constant">Lexer</span>.new input
<span class="keyword">while</span> tok = lexer.next_token
<span class="comment"># do something</span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>GraphQL documents look something like this:</p>
<pre><code>mutation {
a: likeStory(storyID: 12345) {
b: story {
c: likeCount
}
}
}
</code></pre>
<p>And with this lexer implementation, the tokens come out as tuples and they look something like this:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>[<span class="symbol">:IDENTIFIER</span>, <span class="string"><span class="delimiter">"</span><span class="content">likeStory</span><span class="delimiter">"</span></span>]
[<span class="symbol">:LPAREN</span>, <span class="string"><span class="delimiter">"</span><span class="content">(</span><span class="delimiter">"</span></span>]
[<span class="symbol">:IDENTIFIER</span>, <span class="string"><span class="delimiter">"</span><span class="content">storyID</span><span class="delimiter">"</span></span>]
</pre></div>
</div>
</div>
<p>Our benchmarking code is going to be very simple, we’re just going to use the lexer to pull all of the tokens out of the test document:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">benchmark/ips</span><span class="delimiter">"</span></span>
<span class="keyword">def</span> <span class="function">allocations</span>
x = <span class="constant">GC</span>.stat(<span class="symbol">:total_allocated_objects</span>)
<span class="keyword">yield</span>
<span class="constant">GC</span>.stat(<span class="symbol">:total_allocated_objects</span>) - x
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">go</span> doc
lexer = <span class="constant">Lexer</span>.new doc
<span class="keyword">while</span> tok = lexer.next_token
<span class="comment"># do nothing</span>
<span class="keyword">end</span>
<span class="keyword">end</span>
doc = <span class="predefined-constant">ARGF</span>.read
<span class="constant">Benchmark</span>.ips { |x| x.report { go doc } }
p <span class="key">ALLOCATIONS</span>: allocations { go doc }
</pre></div>
</div>
</div>
<p>With this implementation of the lexer, here are the benchmark results on my machine:</p>
<pre><code>$ ruby -I lib test.rb benchmark/fixtures/negotiate.gql
Warming up --------------------------------------
21.000 i/100ms
Calculating -------------------------------------
211.043 (± 0.9%) i/s - 1.071k in 5.075133s
{:ALLOCATIONS=>20745}
</code></pre>
<p>We can do a little over 200 iterations per second, and tokenizing the document allocates a bit over 20k objects.</p>
<h2 id="stringscanner-context">StringScanner context</h2>
<p>Before we get to optimizing this lexer, lets get a little background on <code>StringScanner</code>.
<code>StringScanner</code> is one of my favorite utilities that ships with Ruby.
You can think of this object as basically a “cursor” that points inside a string.
When you successfully scan a token from the beginning of the cursor, <code>StringScanner</code> will move the cursor forward.
If scanning fails, the cursor doesn’t move.</p>
<p>The inspect method on the <code>StringScanner</code> object makes this behavior very clear, so lets just look at some code in IRB:</p>
<pre><code>>> scanner = StringScanner.new("the quick brown fox jumped over the lazy dog")
=> #<StringScanner 0/44 @ "the q...">
>> scanner.scan(/the /)
=> "the "
>> scanner
=> #<StringScanner 4/44 "the " @ "quick...">
>> scanner.scan(/qui/)
=> "qui"
>> scanner
=> #<StringScanner 7/44 "...e qui" @ "ck br...">
>> scanner.scan(/hello/)
=> nil
>> scanner
=> #<StringScanner 7/44 "...e qui" @ "ck br...">
</code></pre>
<p>The <code>@</code> symbol in the inspect output shows where the cursor currently points, and the ratio at the beginning gives you kind of a “progress” counter.
As I scanned through the string, the cursor moved forward.
Near the end, you can see where I tried to scan “hello”, it returned <code>nil</code>, and the cursor stayed in place.</p>
<p>Combining <code>StringScanner</code> with the linear <code>case / when</code> in Ruby is a great combination for really easily writing tokenizers.</p>
<p><code>StringScanner</code> also allows us to skip particular values, as well as ask for the current cursor position:</p>
<pre><code>>> scanner
=> #<StringScanner 7/44 "...e qui" @ "ck br...">
>> scanner.skip(/ck /)
=> 3
>> scanner
=> #<StringScanner 10/44 "...uick " @ "brown...">
>> scanner.skip(/hello/)
=> nil
>> scanner
=> #<StringScanner 10/44 "...uick " @ "brown...">
>> scanner.pos
=> 10
</code></pre>
<p>Calling <code>skip</code> will try to skip a pattern.
If skipping works, it returns the length of the string it matched, and if it fails, it returns <code>nil</code>.
You can also get and set the position of the cursor using the <code>pos</code> and <code>pos=</code> methods.</p>
<p>Now lets try to speed up this lexer!</p>
<h2 id="speeding-up-this-lexer">Speeding up this lexer</h2>
<p>The name of the game for speeding up lexers (or really any code) is to reduce the number of method calls as well as the number of allocations.
So we’re going to try applying some tricks to reduce both.</p>
<p>Whenever I’m trying to improve the performance of any code, I find it is important to think about the context of how that code is used.
For example, our lexer currently yields tokens for comments and whitespace.
However, the GraphQL grammar ignores comments and whitespace.
Since the parser doesn’t actually need to know about whitespace or comments in order to understand the document, it is fine for the lexer to just skip them.</p>
<p>Our first optimization is to combine the whitespace and comment check, and then quit returning tokens:</p>
<div class="language-diff highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="line comment">diff --git a/test.rb b/test.rb</span>
<span class="line comment">index 2c1e874..9130a54 100644</span>
<span class="line head"><span class="head">--- </span><span class="filename">a/test.rb</span></span>
<span class="line head"><span class="head">+++ </span><span class="filename">b/test.rb</span></span>
<span class="change"><span class="change">@@</span> -2,8 +2,12 <span class="change">@@</span></span> require <span class="string"><span class="delimiter">"</span><span class="content">strscan</span><span class="delimiter">"</span></span>
<span class="keyword">class</span> <span class="class">Lexer</span>
<span class="constant">IDENTIFIER</span> = <span class="regexp"><span class="delimiter">/</span><span class="content">[_A-Za-z][_0-9A-Za-z]*</span><span class="char">\b</span><span class="delimiter">/</span></span>
<span class="line delete"><span class="delete">-</span> <span class="constant">WHITESPACE</span> = <span class="regexp"><span class="delimiter">%r{</span><span class="content"> [, </span><span class="char">\c\r</span><span class="char">\n</span><span class="char">\t</span><span class="content">]+ </span><span class="delimiter">}</span><span class="modifier">x</span></span></span>
<span class="line delete"><span class="delete">-</span> <span class="constant">COMMENTS</span> = <span class="regexp"><span class="delimiter">%r{</span><span class="content"> </span><span class="char">\#</span><span class="content">.*$ </span><span class="delimiter">}</span><span class="modifier">x</span></span></span>
<span class="line insert"><span class="insert">+</span> <span class="constant">IGNORE</span> = <span class="regexp"><span class="delimiter">%r{</span></span></span>
<span class="line insert"><span class="insert">+</span><span class="regexp"><span class="content"> (?:</span></span></span>
<span class="line insert"><span class="insert">+</span><span class="regexp"><span class="content"> [, </span><span class="char">\c\r</span><span class="char">\n</span><span class="char">\t</span><span class="content">]+ |</span></span></span>
<span class="line insert"><span class="insert">+</span><span class="regexp"><span class="content"> </span><span class="char">\#</span><span class="content">.*$</span></span></span>
<span class="line insert"><span class="insert">+</span><span class="regexp"><span class="content"> )*</span></span></span>
<span class="line insert"><span class="insert">+</span><span class="regexp"><span class="content"> </span><span class="delimiter">}</span><span class="modifier">x</span></span></span>
<span class="constant">INT</span> = <span class="regexp"><span class="delimiter">/</span><span class="content">[-]?(?:[0]|[1-9][0-9]*)</span><span class="delimiter">/</span></span>
<span class="constant">FLOAT_DECIMAL</span> = <span class="regexp"><span class="delimiter">/</span><span class="content">[.][0-9]+</span><span class="delimiter">/</span></span>
<span class="constant">FLOAT_EXP</span> = <span class="regexp"><span class="delimiter">/</span><span class="content">[eE][+-]?[0-9]+</span><span class="delimiter">/</span></span>
<span class="change"><span class="change">@@</span> -51,11 +55,11 <span class="change">@@</span></span> <span class="keyword">class</span> <span class="class">Lexer</span>
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">next_token</span>
<span class="line insert"><span class="insert">+</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">IGNORE</span>)</span>
<span class="line insert"><span class="insert">+</span></span>
<span class="keyword">return</span> <span class="keyword">if</span> <span class="instance-variable">@scan</span>.eos?
<span class="keyword">case</span>
<span class="line delete"><span class="delete">-</span> <span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">WHITESPACE</span>) <span class="keyword">then</span> [<span class="symbol">:WHITESPACE</span>, s]</span>
<span class="line delete"><span class="delete">-</span> <span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">COMMENTS</span>) <span class="keyword">then</span> [<span class="symbol">:COMMENT</span>, s]</span>
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">ELLIPSIS</span>) <span class="keyword">then</span> [<span class="symbol">:ELLIPSIS</span>, s]
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">PUNCTUATION</span>) <span class="keyword">then</span> [<span class="constant">PUNCTUATION_TABLE</span>[s], s]
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">KW_RE</span>) <span class="keyword">then</span> [<span class="constant">KW_TABLE</span>[s], s]
</pre></div>
</div>
</div>
<p>By combining the whitespace and comment regex, we could eliminate one method call.
We also changed the <code>scan</code> to a <code>skip</code> which eliminated string object allocations.
Lets check the benchmark after this change:</p>
<pre><code>$ ruby -I lib test.rb benchmark/fixtures/negotiate.gql
Warming up --------------------------------------
32.000 i/100ms
Calculating -------------------------------------
322.100 (± 0.3%) i/s - 1.632k in 5.066846s
{:ALLOCATIONS=>10527}
</code></pre>
<p>This is great! Our iterations per second (IPS) went from 211 to 322, and our allocations went from about 20k down to around 10k.
So we cut our allocations in half and increased speed by about 50%.</p>
<h2 id="thinking-bigger">Thinking Bigger</h2>
<p>This lexer returns a tuple for each token.
The tuple looks like this: <code>[:LPAREN, "("]</code>.
But when the parser looks at the token, how often does it actually need the string value of the token?</p>
<p>When the parser looks at the first element, it is able to understand that the lexer found a left parenthesis just by looking at the symbol <code>:LPAREN</code>.
The parser gets no benefit from the <code>"("</code> string that is in the tuple.</p>
<p>Just by looking at the token name, the parser can tell what string the lexer found.
This is true for all punctuation, as well as keywords.</p>
<p>Identifiers and numbers are a little bit different though.
The parser doesn’t particularly care about the actual string value of any identifier or number.
It only cares that an identifier or number was found.
However, if we think one level up, it’s quite likely that consumers of the parser will care what field name or number was in the GraphQL document.</p>
<p>Since the parser doesn’t care about the actual token value, but the user <em>does</em> care about the token value, lets split the <code>next_token</code> method in two:</p>
<ol>
<li>One method to get the token (<code>:INT</code>, <code>:LCURLY</code>, etc)</li>
<li>One method to get the token value</li>
</ol>
<p>When the parser encounters a token where the token value <em>actually matters</em>, the parser can ask the lexer for the token value.
For example, something like this:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>lexer = <span class="constant">Lexer</span>.new doc
<span class="keyword">while</span> tok = lexer.next_token
<span class="keyword">if</span> tok == <span class="symbol">:IDENTIFIER</span>
p lexer.token_value
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="comment">__END__
mutation {
a: likeStory(storyID: 12345) {
b: story {
c: likeCount
}
}
}</span>
</pre></div>
</div>
</div>
<p>This split buys us two really big wins.
The first is that <code>next_token</code> doesn’t need to return an array anymore.
That’s already one object per token saved.
The second win is that <em>we only ever allocate a string when we really need it</em>.</p>
<p>Here is the new <code>next_token</code> method, and the <code>token_value</code> helper method:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> <span class="keyword">def</span> <span class="function">next_token</span>
<span class="instance-variable">@scan</span>.skip(<span class="constant">IGNORE</span>)
<span class="keyword">return</span> <span class="keyword">if</span> <span class="instance-variable">@scan</span>.eos?
<span class="keyword">case</span>
<span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">ELLIPSIS</span>) <span class="keyword">then</span> <span class="symbol">:ELLIPSIS</span>
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">PUNCTUATION</span>) <span class="keyword">then</span> <span class="constant">PUNCTUATION_TABLE</span>[s]
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">KW_RE</span>) <span class="keyword">then</span> <span class="constant">KW_TABLE</span>[s]
<span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">IDENTIFIER</span>) <span class="keyword">then</span> <span class="symbol">:IDENTIFIER</span>
<span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">FLOAT</span>) <span class="keyword">then</span> <span class="symbol">:FLOAT</span>
<span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">INT</span>) <span class="keyword">then</span> <span class="symbol">:INT</span>
<span class="keyword">else</span>
<span class="instance-variable">@scan</span>.getch
<span class="symbol">:UNKNOWN_CHAR</span>
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">token_value</span>
<span class="instance-variable">@doc</span>.byteslice(<span class="instance-variable">@scan</span>.pos - <span class="instance-variable">@scan</span>.matched_size, <span class="instance-variable">@scan</span>.matched_size)
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>We’ve changed the method to only return a symbol that identifies the token.
We also changed most <code>scan</code> calls to <code>skip</code> calls.
<code>scan</code> will return the string it matched (an allocation), but <code>skip</code> simply returns the length of the string it matched (not an allocation).</p>
<p>As the parser requests tokens from the lexer, if it encounters a token where it actually cares about the string value, it just calls <code>token_value</code>.
This makes our benchmark a little bit awkward now because we’ve shifted the blame of “identifier” allocations from the lexer to the parser.
If the parser wants an allocation, it’ll have to ask the lexer for it.
But lets keep pushing forward with the same benchmark (just remembering that once we integrate the lexer with the parser, we’ll have allocations for identifiers).</p>
<p>With this change, our benchmark results look like this:</p>
<pre><code>$ ruby -I lib test.rb benchmark/fixtures/negotiate.gql
Warming up --------------------------------------
35.000 i/100ms
Calculating -------------------------------------
360.209 (± 0.6%) i/s - 1.820k in 5.052764s
{:ALLOCATIONS=>1915}
</code></pre>
<p>We went from 322 IPS to 360 IPS, and from 10k allocations down to about 2k allocations.</p>
<h2 id="punctuation-lookup-table">Punctuation Lookup Table</h2>
<p>Unfortunately we’ve still got two lines in the tokenizer that are doing allocations:</p>
<pre><code> when s = @scan.scan(PUNCTUATION) then PUNCTUATION_TABLE[s]
when s = @scan.scan(KW_RE) then KW_TABLE[s]
</code></pre>
<p>Let’s tackle the punctuation line first.
We still extract a string from the scanner in order to do a hash lookup to find the symbol name for the token.
We need the string <code>")"</code> so that we can map it to the symbol <code>:RPAREN</code>.
One interesting feature about these punctuation characters is that they are all only one byte and thus limited to values between 0 - 255.
Instead of extracting a substring, we can get the byte at the current scanner position, then use the byte as an array index.
If there is a value at that index in the array, then we know we’ve found a token.</p>
<p>First we’ll build the lookup table like this:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> <span class="constant">PUNCTUATION_TABLE</span> = <span class="constant">Literals</span>.constants.each_with_object([]) { |n, o|
o[<span class="constant">Literals</span>.const_get(n).ord] = n
}
</pre></div>
</div>
</div>
<p>This will create an array.
The array will have a symbol at the index corresponding to the byte value of our punctuation.
Any other index will return <code>nil</code>.
And since we’re only dealing with one byte, we know the maximum value can only ever be 255.
The code below gives us a sample of how this lookup table works:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> <span class="string"><span class="delimiter">'</span><span class="content">()ab</span><span class="delimiter">'</span></span>.bytes.each <span class="keyword">do</span> |byte|
p <span class="constant">PUNCTUATION_TABLE</span>[byte]
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>The output is like this:</p>
<pre><code>$ ruby -I lib test.rb benchmark/fixtures/negotiate.gql
:LPAREN
:RPAREN
nil
nil
</code></pre>
<p>We can use the <code>pos</code> method on the <code>StringScanner</code> object to get our current cursor position (no allocation), then use that information to extract a byte from the string (also no allocation).
If the byte has a value in the lookup table, we know we’ve found a token and we can push the <code>StringScanner</code> forward one byte.</p>
<p>After incorporating the punctuation lookup table, our <code>next_token</code> method looks like this:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> <span class="keyword">def</span> <span class="function">next_token</span>
<span class="instance-variable">@scan</span>.skip(<span class="constant">IGNORE</span>)
<span class="keyword">return</span> <span class="keyword">if</span> <span class="instance-variable">@scan</span>.eos?
<span class="keyword">case</span>
<span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">ELLIPSIS</span>) <span class="keyword">then</span> <span class="symbol">:ELLIPSIS</span>
<span class="keyword">when</span> tok = <span class="constant">PUNCTUATION_TABLE</span>[<span class="instance-variable">@doc</span>.getbyte(<span class="instance-variable">@scan</span>.pos)] <span class="keyword">then</span>
<span class="comment"># If the byte at the current position is inside our lookup table, push</span>
<span class="comment"># the scanner position forward 1 and return the token</span>
<span class="instance-variable">@scan</span>.pos += <span class="integer">1</span>
tok
<span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">KW_RE</span>) <span class="keyword">then</span> <span class="constant">KW_TABLE</span>[s]
<span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">IDENTIFIER</span>) <span class="keyword">then</span> <span class="symbol">:IDENTIFIER</span>
<span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">FLOAT</span>) <span class="keyword">then</span> <span class="symbol">:FLOAT</span>
<span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">INT</span>) <span class="keyword">then</span> <span class="symbol">:INT</span>
<span class="keyword">else</span>
<span class="instance-variable">@scan</span>.getch
<span class="symbol">:UNKNOWN_CHAR</span>
<span class="keyword">end</span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>Rerunning our benchmarks gives us these results:</p>
<pre><code>$ ruby -I lib test.rb benchmark/fixtures/negotiate.gql
Warming up --------------------------------------
46.000 i/100ms
Calculating -------------------------------------
459.031 (± 1.1%) i/s - 2.300k in 5.011232s
{:ALLOCATIONS=>346}
</code></pre>
<p>We’ve gone from 360 IPS up to 459 IPS, and from about 2k allocations down to only 350 allocations.</p>
<h2 id="perfect-hashes-and-graphql-keywords">Perfect Hashes and GraphQL Keywords</h2>
<p>We have one more line in our lexer that is allocating objects:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> <span class="keyword">when</span> s = <span class="instance-variable">@scan</span>.scan(<span class="constant">KW_RE</span>) <span class="keyword">then</span> <span class="constant">KW_TABLE</span>[s]
</pre></div>
</div>
</div>
<p>This line is allocating objects because it needs to map the keyword it found in the source to a symbol:</p>
<pre><code>>> Lexer::KW_TABLE["query"]
=> :QUERY
>> Lexer::KW_TABLE["schema"]
=> :SCHEMA
</code></pre>
<p>It would be great if we had a hash table that didn’t require us to extract a string from the source document.
And that’s exactly what we’re going to build.</p>
<p>When this particular regular expression matches, we know that the lexer has found 1 of the 19 keywords listed in the <code>KW_TABLE</code>, we just don’t know which one.
What we’d like to do is figure out <em>which</em> keyword matched, and do it without allocating any objects.</p>
<p>Here is the list of 19 GraphQL keywords we could possibly match:</p>
<pre><code>["on",
"true",
"null",
"enum",
"type",
"input",
"false",
"query",
"union",
"extend",
"scalar",
"schema",
"mutation",
"fragment",
"interface",
"directive",
"implements",
"repeatable",
"subscription"]
</code></pre>
<p><code>StringScanner#skip</code> will return the length of the match, and we know that if the length is 2 we unambiguously matched <code>on</code>, and if the length is 12 we unambiguously matched <code>subscription</code>.
So if the matched length is 2 or 12, we can just return <code>:ON</code> or <code>:SUBSCRIPTION</code>.
That leaves 17 other keywords we need to disambiguate.</p>
<p>Of the 17 remaining keywords, the 2nd and 3rd bytes uniquely identify that keyword:</p>
<pre><code>>> (Lexer::KW_TABLE.keys - ["on", "subscription"]).length
=> 17
>> (Lexer::KW_TABLE.keys - ["on", "subscription"]).map { |w| w[1, 2] }
=> ["ra", "ru", "al", "ul", "ue", "ut", "ch", "ca", "yp", "xt", "mp", "nt", "ni", "nu", "np", "ir", "ep"]
>> (Lexer::KW_TABLE.keys - ["on", "subscription"]).map { |w| w[1, 2] }.uniq.length
=> 17
</code></pre>
<p>We can use these two bytes as a key to a hash table and design a <a href="https://en.wikipedia.org/wiki/Perfect_hash_function">“perfect hash”</a> to look up the right token.
A perfect hash is a hash table where the possible keys for the hash are <em>known in advance</em>, and the hashing function will <em>never make a collision</em>.
In other words, no two hash keys will result in the same bucket index.</p>
<p>We know that the word we found is one of a limited set, so this seems like a good application for a perfect hash.</p>
<h3 id="building-a-perfect-hash">Building a Perfect Hash</h3>
<p>A perfect hash function uses a pre-computed “convenient” constant that let us uniquely identify each key, but also limit the hash table to a small size.
Basically we have a function like this:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">def</span> <span class="function">_hash</span> key
(key * <span class="constant">SOME_CONSTANT</span>) >> <span class="integer">27</span> & <span class="integer">0x1f</span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>But we must figure out the right constant to use such that each entry in our perfect hash gets a unique bucket index.
We’re going to use the upper 5 bits of a “32 bit integer” (it’s not actually 32 bits, we’re just going to treat it that way) to find our hash key.
The reason we’re going to use 5 bits is because we have 17 keys, and 17 can’t fit in 4 bits.
To find the value of <code>SOME_CONSTANT</code>, we’re just going to use a brute force method.</p>
<p>First lets convert the two bytes from each GraphQL keyword to a 16 bit integer:</p>
<pre><code>>> keys = (Lexer::KW_TABLE.keys - ["on", "subscription"]).map { |w| w[1, 2].unpack1("s") }
=> [24946, 30066, 27745, 27765, 25973, 29813, 26723, 24931, 28793, 29816, 28781, 29806, 26990, 30062, 28782, 29289, 28773]
</code></pre>
<p>Next we’re going to use a brute force method to find a constant value such that we can convert these 16 bit numbers in to unique 5 bit numbers:</p>
<pre><code>>> c = 13
=> 13
?> loop do
?> z = keys.map { |k| ((k * c) >> 27) & 0x1f }
?> break if z.uniq.length == z.length
?> c += 1
>> end
=> nil
>> c
=> 18592990
</code></pre>
<p>We start our search at 13.
Our loop tries applying the hashing function to all keys.
If the hashing function returns unique values for all keys, then we found the right value for <code>c</code>, otherwise we increment <code>c</code> by one and try the next number.</p>
<p>After this loop finishes (it takes a while), we check <code>c</code> and that’s the value for our perfect hash!</p>
<p>Now we can write our hashing function like this:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">def</span> <span class="function">_hash</span> key
(key * <span class="integer">18592990</span>) >> <span class="integer">27</span> & <span class="integer">0x1f</span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>This function will return a unique value based on the 2nd and 3rd bytes of each GraphQL keyword.
Lets prove that to ourselves in IRB:</p>
<pre><code>?> def _hash key
?> (key * 18592990) >> 27 & 0x1f
>> end
=> :_hash
>> keys = (Lexer::KW_TABLE.keys - ["on", "subscription"]).map { |w| w[1, 2].unpack1("s") }
=> [24946, 30066, 27745, 27765, 25973, 29813, 26723, 24931, 28793, 29816, 28781, 29806, 26990, 30062, 28782, 29289, 28773]
>> keys.map { |key| _hash(key) }
=> [31, 5, 3, 6, 14, 1, 21, 29, 20, 2, 18, 0, 26, 4, 19, 25, 17]
</code></pre>
<p>We’ll use these integers as an index in to an array that stores the symbol name associated with that particular keyword:</p>
<pre><code>>> # Get a list of the array indices for each keyword
=> nil
>> array_indexes = keys.map { |key| _hash(key) }
=> [31, 5, 3, 6, 14, 1, 21, 29, 20, 2, 18, 0, 26, 4, 19, 25, 17]
>> # Insert a symbol in to an array at each index
=> nil
>> table = kws.zip(array_indexes).each_with_object([]) { |(kw, key),o| o[key] = kw.upcase.to_sym }
=>
[:INTERFACE,
...
</code></pre>
<p>Now we have a table we can use to look up the symbol for a particular keyword given the keyword’s 2nd and 3rd bytes.</p>
<h3 id="take-a-breather">Take a breather</h3>
<p>I think this is getting a little complicated so I want to step back and take a breather.
What we’ve done so far is write a function that, given the 2nd and 3rd bytes of a string returns an index to an array.</p>
<p>Let’s take the keyword <code>interface</code> as an example.
The 2nd and 3rd bytes are <code>nt</code>:</p>
<pre><code>>> "interface"[1, 2]
=> "nt"
</code></pre>
<p>We can use <code>unpack1</code> to convert <code>nt</code> in to a 16 bit integer:</p>
<pre><code>>> "interface"[1, 2].unpack1("s")
=> 29806
</code></pre>
<p>Now we pass that integer to our hashing function (I called it <code>_hash</code> in IRB):</p>
<pre><code>>> _hash("interface"[1, 2].unpack1("s"))
=> 0
</code></pre>
<p>And now we have the array index where to find the <code>:INTERFACE</code> symbol:</p>
<pre><code>>> table[_hash("interface"[1, 2].unpack1("s"))]
=> :INTERFACE
</code></pre>
<p>This will work for any of the strings we used to build the perfect hash function.
Lets try a few:</p>
<pre><code>>> table[_hash("union"[1, 2].unpack1("s"))]
=> :UNION
>> table[_hash("scalar"[1, 2].unpack1("s"))]
=> :SCALAR
>> table[_hash("repeatable"[1, 2].unpack1("s"))]
=> :REPEATABLE
</code></pre>
<h2 id="integrating-the-perfect-hash-in-to-the-lexer">Integrating the Perfect Hash in to the Lexer</h2>
<p>We’ve built our hash table and hash function, so the next step is to add them to the lexer:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> <span class="constant">KW_TABLE</span> = [<span class="symbol">:INTERFACE</span>, <span class="symbol">:MUTATION</span>, <span class="symbol">:EXTEND</span>, <span class="symbol">:FALSE</span>, <span class="symbol">:ENUM</span>, <span class="symbol">:TRUE</span>, <span class="symbol">:NULL</span>,
<span class="predefined-constant">nil</span>, <span class="predefined-constant">nil</span>, <span class="predefined-constant">nil</span>, <span class="predefined-constant">nil</span>, <span class="predefined-constant">nil</span>, <span class="predefined-constant">nil</span>, <span class="predefined-constant">nil</span>, <span class="symbol">:QUERY</span>, <span class="predefined-constant">nil</span>, <span class="predefined-constant">nil</span>, <span class="symbol">:REPEATABLE</span>,
<span class="symbol">:IMPLEMENTS</span>, <span class="symbol">:INPUT</span>, <span class="symbol">:TYPE</span>, <span class="symbol">:SCHEMA</span>, <span class="predefined-constant">nil</span>, <span class="predefined-constant">nil</span>, <span class="predefined-constant">nil</span>, <span class="symbol">:DIRECTIVE</span>,
<span class="symbol">:UNION</span>, <span class="predefined-constant">nil</span>, <span class="predefined-constant">nil</span>, <span class="symbol">:SCALAR</span>, <span class="predefined-constant">nil</span>, <span class="symbol">:FRAGMENT</span>]
<span class="keyword">def</span> <span class="function">_hash</span> key
(key * <span class="integer">18592990</span>) >> <span class="integer">27</span> & <span class="integer">0x1f</span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>Remember we derived the magic constant <code>18592990</code> earlier via brute force.</p>
<p>In the <code>next_token</code> method, we need to extract the 2nd and 3rd bytes of the keyword, combine them to a 16 bit int, use the <code>_hash</code> method to convert the 16 bit int to a 5 bit array index, then look up the symbol (I’ve omitted the rest of the <code>next_token</code> method):</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> <span class="keyword">when</span> len = <span class="instance-variable">@scan</span>.skip(<span class="constant">KW_RE</span>) <span class="keyword">then</span>
<span class="comment"># Return early if uniquely identifiable via length</span>
<span class="keyword">return</span> <span class="symbol">:ON</span> <span class="keyword">if</span> len == <span class="integer">2</span>
<span class="keyword">return</span> <span class="symbol">:SUBSCRIPTION</span> <span class="keyword">if</span> len == <span class="integer">12</span>
<span class="comment"># Get the position of the start of the keyword in the main document</span>
start = <span class="instance-variable">@scan</span>.pos - len
<span class="comment"># Get the 2nd and 3rd byte of the keyword and combine to a 16 bit int</span>
key = (<span class="instance-variable">@doc</span>.getbyte(start + <span class="integer">2</span>) << <span class="integer">8</span>) | <span class="instance-variable">@doc</span>.getbyte(start + <span class="integer">1</span>)
<span class="comment"># Get the array index</span>
index = _hash(key)
<span class="comment"># Return the symbol</span>
<span class="constant">KW_TABLE</span>[index]
</pre></div>
</div>
</div>
<p>We know the length of the token because it’s the return value of <code>StringScanner#skip</code>.
If the token is uniquely identifiable based on its length, then we’ll return early.
Otherwise, ask <code>StringScanner</code> for the cursor position and then use the length to calculate the index of the beginning of the token (remember <code>StringScanner</code> pushed the cursor forward when <code>skip</code> matched).</p>
<p>Once we have the beginning of the token, we’ll use <code>getbyte</code> (which doesn’t allocate) to get the 2nd and 3rd bytes of the keyword.
Then we’ll combine the two bytes to a 16 bit int.
Finally we pass the int to the hash function and use the return value of the hash function to look up the token symbol in the array.</p>
<p>Let’s check our benchmarks now!</p>
<pre><code>$ ruby -I lib test.rb benchmark/fixtures/negotiate.gql
Warming up --------------------------------------
46.000 i/100ms
Calculating -------------------------------------
468.978 (± 0.4%) i/s - 2.346k in 5.002449s
{:ALLOCATIONS=>3}
</code></pre>
<p>We went from 459 IPS up to 468 IPS, and from 346 allocations down to 3 allocations.
1 allocation for the Lexer object, 1 allocation for the <code>StringScanner</code> object, and 1 allocation for ????</p>
<p>Actually, if we run the allocation benchmark twice we’ll get different results:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">benchmark/ips</span><span class="delimiter">"</span></span>
<span class="keyword">def</span> <span class="function">allocations</span>
x = <span class="constant">GC</span>.stat(<span class="symbol">:total_allocated_objects</span>)
<span class="keyword">yield</span>
<span class="constant">GC</span>.stat(<span class="symbol">:total_allocated_objects</span>) - x
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">go</span> doc
lexer = <span class="constant">Lexer</span>.new doc
<span class="keyword">while</span> tok = lexer.next_token
<span class="keyword">end</span>
<span class="keyword">end</span>
doc = <span class="predefined-constant">ARGF</span>.read
<span class="constant">Benchmark</span>.ips { |x| x.report { go doc } }
p <span class="key">ALLOCATIONS</span>: allocations { go doc }
p <span class="key">ALLOCATIONS</span>: allocations { go doc }
</pre></div>
</div>
</div>
<p>Output is this:</p>
<pre><code>$ ruby -I lib test.rb benchmark/fixtures/negotiate.gql
Warming up --------------------------------------
46.000 i/100ms
Calculating -------------------------------------
465.071 (± 0.6%) i/s - 2.346k in 5.044626s
{:ALLOCATIONS=>3}
{:ALLOCATIONS=>2}
</code></pre>
<p>Ruby uses GC allocated objects to store some inline caches.
Since it was the first time we called the <code>allocations</code> method, a new inline cache was allocated, and that dinged us.
We’re actually able to tokenize this entire document with only 2 allocations: the lexer and the string scanner.</p>
<h2 id="one-more-hack">One more hack</h2>
<p>Lets do one more trick.
We want to reduce the number of method calls the scanner makes as much as we can.
The <code>case / when</code> statement in <code>next_token</code> checks each <code>when</code> statement one at a time.
One trick I like to do is rearrange the statements so that the most popular tokens come first.</p>
<p>If we tokenize our benchmark program and tally up all of the tokens that come out, it looks like this:</p>
<pre><code>>> lexer = Lexer.new File.read "benchmark/fixtures/negotiate.gql"
=>
#<Lexer:0x0000000105c33c90
...
>> list = []
=> []
?> while tok = lexer.next_token
?> list << tok
>> end
=> nil
>> list.tally
=> {:QUERY=>1, :IDENTIFIER=>2976, :LPAREN=>15, :VAR_SIGN=>6, :COLON=>56, :BANG=>1,
:RPAREN=>15, :LCURLY=>738, :RCURLY=>738, :ELLIPSIS=>350, :ON=>319, :INT=>24,
:TYPE=>4, :INPUT=>1, :FRAGMENT=>18}
</code></pre>
<p>From this data, it looks like <code>ELLIPSIS</code> tokens aren’t as popular as punctuation or <code>IDENTIFIER</code> tokens.
Yet we’re always checking for <code>ELLIPSIS</code> tokens first.
Lets move the <code>ELLIPSIS</code> check below the identifier check.
This makes looking for <code>ELLIPSIS</code> more expensive, but it makes finding punctuation and identifiers cheaper.
Since punctuation and identifiers occur more frequently in our document, we should get a speedup.</p>
<p>I applied this patch:</p>
<div class="language-diff highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="line comment">diff --git a/test.rb b/test.rb</span>
<span class="line comment">index ac147c2..275b8ba 100644</span>
<span class="line head"><span class="head">--- </span><span class="filename">a/test.rb</span></span>
<span class="line head"><span class="head">+++ </span><span class="filename">b/test.rb</span></span>
<span class="change"><span class="change">@@</span> -59,7 +59,6 <span class="change">@@</span></span> <span class="keyword">class</span> <span class="class">Lexer</span>
<span class="keyword">return</span> <span class="keyword">if</span> <span class="instance-variable">@scan</span>.eos?
<span class="keyword">case</span>
<span class="line delete"><span class="delete">-</span> <span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">ELLIPSIS</span>) <span class="keyword">then</span> <span class="symbol">:ELLIPSIS</span></span>
<span class="keyword">when</span> tok = <span class="constant">PUNCTUATION_TABLE</span>[<span class="instance-variable">@doc</span>.getbyte(<span class="instance-variable">@scan</span>.pos)] <span class="keyword">then</span>
<span class="comment"># If the byte at the current position is inside our lookup table, push</span>
<span class="comment"># the scanner position forward 1 and return the token</span>
<span class="change"><span class="change">@@</span> -78,6 +77,7 <span class="change">@@</span></span> <span class="keyword">class</span> <span class="class">Lexer</span>
<span class="constant">KW_TABLE</span>[_hash(key)]
<span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">IDENTIFIER</span>) <span class="keyword">then</span> <span class="symbol">:IDENTIFIER</span>
<span class="line insert"><span class="insert">+</span> <span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">ELLIPSIS</span>) <span class="keyword">then</span> <span class="symbol">:ELLIPSIS</span></span>
<span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">FLOAT</span>) <span class="keyword">then</span> <span class="symbol">:FLOAT</span>
<span class="keyword">when</span> <span class="instance-variable">@scan</span>.skip(<span class="constant">INT</span>) <span class="keyword">then</span> <span class="symbol">:INT</span>
<span class="keyword">else</span>
</pre></div>
</div>
</div>
<p>Now when we rerun the benchmark, we get this:</p>
<pre><code>$ ruby -I lib test.rb benchmark/fixtures/negotiate.gql
Warming up --------------------------------------
48.000 i/100ms
Calculating -------------------------------------
486.798 (± 0.4%) i/s - 2.448k in 5.028884s
{:ALLOCATIONS=>3}
{:ALLOCATIONS=>2}
</code></pre>
<p>Great, we went from 465 IPS to 486 IPS!</p>
<h2 id="conclusion">Conclusion</h2>
<p>The lexer we started with tokenized the 80kb GraphQL document at 211 IPS, and where we left off it was running at 486 IPS.
More than a 2x speed improvement!</p>
<p>Our starting lexer allocated over 20k objects, and when we finished we got it down to just 2 objects.
Of course the parser may ask the lexer to allocate something, but we know that we’re only allocating the <em>bare minimum</em>.
In fact, if the parser only records positional offsets, it could very well never ask the lexer to allocate anything!</p>
<p>When I’m doing this stuff I try to use as many tricks as possible for increasing speed.
But I think the biggest and best payoffs come from trying to think about the problem from a higher level and adjust the problem space itself.
Converting <code>next_token</code> to return only a symbol rather than a tuple cut our object allocations by more than half.
Questioning the code’s design itself is much harder, but I think reaps a greater benefit.</p>
<p>Anyway, these are hacks I like to use!
If you want to play around with the lexer we’ve been building in this post, I’ve put the source code <a href="https://gist.github.com/tenderlove/e9bb912648a3d2bce00c4f60bc632a10">here</a>.</p>
<p>I hope you enjoyed this, and have a good day!</p>
http://tenderlovemaking.com/2023/03/19/bitmap-matrix-and-undirected-graphs-in-ruby.html
Bitmap Matrix and Undirected Graphs in Ruby
2023-03-19T12:12:27-07:00
2023-03-19T12:12:27-07:00
<p>I’ve been working my way through <a href="https://www.elsevier.com/books/engineering-a-compiler/cooper/978-0-12-815412-0">Engineering a Compiler</a>.
I really enjoy the book, but one part has you build an interference graph for doing register allocation via graph coloring.
An interference graph is an <a href="https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)">undirected graph</a>, and one way you can represent an undirected graph is with a bitmap matrix.</p>
<p>A bitmap matrix is just a matrix but the values in the matrix can only be 1 or 0.
If every node in your graph maps to an index, you can use the bitmap matrix to represent edges in the graph.</p>
<p>I made a bitmap matrix implementation that I like, but I think the code is too trivial to put in a Gem.
Here is the code I used:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">class</span> <span class="class">BitMatrix</span>
<span class="keyword">def</span> <span class="function">initialize</span> size
<span class="instance-variable">@size</span> = size
size = (size + <span class="integer">7</span>) & <span class="integer">-8</span> <span class="comment"># round up to the nearest multiple of 8</span>
<span class="instance-variable">@row_bytes</span> = size / <span class="integer">8</span>
<span class="instance-variable">@buffer</span> = <span class="string"><span class="delimiter">"</span><span class="char">\0</span><span class="delimiter">"</span></span>.b * (<span class="instance-variable">@row_bytes</span> * size)
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">initialize_copy</span> other
<span class="instance-variable">@buffer</span> = <span class="instance-variable">@buffer</span>.dup
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">set</span> x, y
raise <span class="constant">IndexError</span> <span class="keyword">if</span> y >= <span class="instance-variable">@size</span> || x >= <span class="instance-variable">@size</span>
x, y = [y, x].sort
row = x * <span class="instance-variable">@row_bytes</span>
column_byte = y / <span class="integer">8</span>
column_bit = <span class="integer">1</span> << (y % <span class="integer">8</span>)
<span class="instance-variable">@buffer</span>.setbyte(row + column_byte, <span class="instance-variable">@buffer</span>.getbyte(row + column_byte) | column_bit)
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">set?</span> x, y
raise <span class="constant">IndexError</span> <span class="keyword">if</span> y >= <span class="instance-variable">@size</span> || x >= <span class="instance-variable">@size</span>
x, y = [y, x].sort
row = x * <span class="instance-variable">@row_bytes</span>
column_byte = y / <span class="integer">8</span>
column_bit = <span class="integer">1</span> << (y % <span class="integer">8</span>)
(<span class="instance-variable">@buffer</span>.getbyte(row + column_byte) & column_bit) != <span class="integer">0</span>
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">each_pair</span>
<span class="keyword">return</span> enum_for(<span class="symbol">:each_pair</span>) <span class="keyword">unless</span> block_given?
<span class="instance-variable">@buffer</span>.bytes.each_with_index <span class="keyword">do</span> |byte, i|
row = i / <span class="instance-variable">@row_bytes</span>
column = i % <span class="instance-variable">@row_bytes</span>
<span class="integer">8</span>.times <span class="keyword">do</span> |j|
<span class="keyword">if</span> (<span class="integer">1</span> << j) & byte != <span class="integer">0</span>
<span class="keyword">yield</span> [row, (column * <span class="integer">8</span>) + j]
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">to_dot</span>
<span class="string"><span class="delimiter">"</span><span class="content">graph g {</span><span class="char">\n</span><span class="delimiter">"</span></span> + each_pair.map { |x, y| <span class="string"><span class="delimiter">"</span><span class="inline"><span class="inline-delimiter">#{</span>x<span class="inline-delimiter">}</span></span><span class="content"> -- </span><span class="inline"><span class="inline-delimiter">#{</span>y<span class="inline-delimiter">}</span></span><span class="content">;</span><span class="delimiter">"</span></span> }.join(<span class="string"><span class="delimiter">"</span><span class="char">\n</span><span class="delimiter">"</span></span>) + <span class="string"><span class="delimiter">"</span><span class="char">\n</span><span class="content">}</span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>I like this implementation because all bits are packed in to a binary string.
Copying the matrix is trivial because we just have to dup the string.
Memory usage is much smaller than if every node in the graph were to store an actual reference to other nodes.</p>
<p>Anyway, this was fun to write and I hope someone finds it useful!</p>
http://tenderlovemaking.com/2023/01/18/vim-tmux-and-fish.html
Vim, tmux, and Fish
2023-01-18T11:23:35-08:00
2023-01-18T11:23:35-08:00
<p>I do most of my text editing with <a href="https://github.com/macvim-dev/macvim">MacVim</a>, but when I pair with people I like to use <a href="https://tmate.io">tmate</a>.
tmate is just an easy way to connect tmux sessions with a remote person.
But this means that I go from coding in a GUI to coding in a terminal.
Normally this wouldn’t be a problem, but I had made a Fish alias that would open the MacVim GUI every time I typed <code>vim</code> in the terminal.
Of course when I’m pairing via tmate, the other people cannot see the GUI, so I would have to remember a different command to open Vim.</p>
<p>Today I did about 10min of research to fix this problem and came up with the following Fish command:</p>
<pre><code>$ cat .config/fish/functions/vim.fish
function vim --wraps='vim' --description 'open Vim'
if set -q TMUX # if we're in a TMUX session, open in terminal
command vim $argv
else
# Otherwise open macvim
open -a MacVim $argv;
end
end
</code></pre>
<p>All it does is open terminal Vim if I’m in a TMUX session, otherwise it opens the MacVim GUI.</p>
<p>Instead of putting up with this frustration for such a long time, I should have taken the 10 min required to fix the situation.
This was a good reminder for me, and hopefully I’ll be better about it in the future!</p>
http://tenderlovemaking.com/2022/12/07/in-memory-of-a-giant.html
In Memory of a Giant
2022-12-07T07:08:41-08:00
2022-12-07T07:08:41-08:00
<p>The Ruby community has lost a giant.
As a programmer, I always feel as if I’m standing on the shoulders of giants.
<a href="https://chrisseaton.com">Chris Seaton</a> was one of those giants.</p>
<p>I’ve been working at the same company as Chris for the past 2 years.
However, I first met him through the open source world many years ago.
He was working on a Ruby implementation called <a href="https://chrisseaton.com/truffleruby/">TruffleRuby</a>, and got his PhD <em>in</em> Ruby.
Can you believe that? A PhD in Ruby? I’d never heard of such a thing.
My impression was that nobody in academia cared about Ruby, but here was Chris, the Ruby Doctor.
I was impressed.</p>
<h2 id="patience">Patience</h2>
<p>As a college dropout, I’ve always felt underqualified.
Embarrassment about my lack of knowledge and credentials has driven me to study hard on my own time.
But Chris never once made me feel out of place.
Any time I had questions, without judgement, he would take the time to explain things to me.</p>
<p>I’ve always looked up to Chris.
I was at a bar in London with a few coworkers.
We started talking about age, and I found out that Chris was much younger than me.
I said “You’re so smart and accomplished! How can I possibly catch up to you?”
Chris said “Don’t worry, I’ll just tell you everything I know!”</p>
<p><img src="/images/img_0947.jpg" alt="Meeting Chris in London" /></p>
<h2 id="puns">Puns</h2>
<p>My team is fully remote, so every Friday we have a team meeting over video to just hang out and talk about stuff.
Eventually I’ll make a really great pun, most people will sigh, Kevin Menard will get angry, and Chris would just be straight faced.
No reaction from Chris. Every. Single. Time.</p>
<p>One time, someone asked Chris “do you know that he’s making a joke? Or do you just not think it’s funny?”
Chris responded “I know he’s making a pun, I just don’t react because I don’t want to encourage him.”
I said “This just encourages me more because now I feel challenged!”</p>
<p>I wish I had tried harder because now I’ll never get that reaction.</p>
<h2 id="kindness">Kindness</h2>
<p>My last conversation with Chris was Thursday December 1st at RubyConf in Houston.
We all went to dinner at a Ramen shop.
I find British English to be extremely adorable, so any time I hear fun British phrases in the news I always ask my British coworkers about it.
The latest one was “Wonky Veg” so I asked Chris if he’d been buying any at the store.
He said no, but that one of his favorite things to do was find weird things at the local supermarket, take photos of it, then share with his coworkers.
He flipped through photos on his phone, showing me pics of him shopping with his daughter.
Some of the products he showed me were quite funny and we both had a good laugh.</p>
<p><img src="/images/pc010232.jpg" alt="Dinner with Chris" /></p>
<h2 id="memory">Memory</h2>
<p>I feel honored to have had the opportunity to work with Chris.</p>
<p>I feel grateful for the time that we had together.</p>
<p>I feel angry that I can’t learn more from him.</p>
<p>I feel sad that he is gone from my life.</p>
<p>Chris was an important part of the community, his family, and his country.
I will never forget the time I spent with Chris, a Giant.</p>
http://tenderlovemaking.com/2022/06/12/cross-platform-machine-code.html
Cross Platform Machine Code
2022-06-12T12:54:35-07:00
2022-06-12T12:54:35-07:00
<p>I hate writing <code>if</code> statements.</p>
<p>I’ve been working on a couple different assemblers for Ruby.
<a href="https://github.com/tenderlove/fisk">Fisk</a> is a pure Ruby x86 assembler.
You can use it to generate bytes that can be executed on x86 machines.
<a href="https://github.com/tenderlove/aarch64">AArch64</a> is a pure Ruby ARM64 assembler.
You can use it to generate bytes that can be executed on ARM64 machines.</p>
<p>Both of these libraries just generate bytes that can be interpreted by their respective processors.
Unfortunately you can’t just generate bytes and expect the CPU to execute them.
You first need to put the bytes in <strong>executable memory</strong> before you can hand them off to the CPU for execution.
Executable memory is basically the same thing regardless of CPU architecture, so I decided to make a library called <a href="https://github.com/tenderlove/jit_buffer">JITBuffer</a> that encapsulates executable memory manipulation.</p>
<p>To use the <code>JITBuffer</code>, you write platform specific bytes to the buffer, then give the buffer to the CPU for execution.
Here is an example on the ARM64 platform:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">aarch64</span><span class="delimiter">"</span></span>
require <span class="string"><span class="delimiter">"</span><span class="content">jit_buffer</span><span class="delimiter">"</span></span>
require <span class="string"><span class="delimiter">"</span><span class="content">fiddle</span><span class="delimiter">"</span></span>
asm = <span class="constant">AArch64</span>::<span class="constant">Assembler</span>.new
<span class="comment"># Make some instructions. These instructions simply</span>
<span class="comment"># return the value 0xF00DCAFE</span>
asm.pretty <span class="keyword">do</span>
asm.movz x0, <span class="integer">0xCAFE</span>
asm.movk x0, <span class="integer">0xF00D</span>, lsl(<span class="integer">16</span>)
asm.ret
<span class="keyword">end</span>
<span class="comment"># Write the bytes to executable memory:</span>
buf = <span class="constant">JITBuffer</span>.new(<span class="integer">4096</span>)
buf.writeable!
asm.write_to buf
buf.executable!
<span class="comment"># Point the CPU at the executable memory</span>
func = <span class="constant">Fiddle</span>::<span class="constant">Function</span>.new(buf.to_i, [], -<span class="constant">Fiddle</span>::<span class="constant">TYPE_INT</span>)
p func.call.to_s(<span class="integer">16</span>) <span class="comment"># => "f00dcafe"</span>
</pre></div>
</div>
</div>
<p>The example uses the <code>AArch64</code> gem to assemble ARM64 specific bytes, the <code>JITBuffer</code> gem to allocate executable memory, and the <code>Fiddle</code> gem to point the CPU at the executable memory and run it.</p>
<p>Tests are important I guess, so I thought it would be a good idea to write tests for the <code>JITBuffer</code> gem.
My goal for the test is to ensure that it’s actually possible to execute the bytes in the buffer itself.
I’m not a huge fan of stubs or mocks and I try to avoid them if possible, so I wanted to write a test that would <em>actually execute</em> the bytes in the buffer.
I also want the test to be “cross platform” (where “cross platform” means “works on x86_64 and ARM64”).</p>
<p>Writing a test like this would mean writing something like the following:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">def</span> <span class="function">test_can_execute</span>
buf = <span class="constant">JITBuffer</span>.new(<span class="integer">4096</span>)
platform = figure_out_what_platform_we_are_on()
<span class="keyword">if</span> platform == <span class="string"><span class="delimiter">"</span><span class="content">arm64</span><span class="delimiter">"</span></span>
<span class="comment"># write arm64 specific bytes</span>
buf.write(...)
<span class="keyword">else</span>
<span class="comment"># write x86_64 specific bytes</span>
buf.write(...)
<span class="keyword">end</span>
<span class="comment"># Use fiddle to execute</span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>As I said at the start though, I hate writing <code>if</code> statements, and I’d rather avoid it if possible.
In addition, how do you reliably figure out what platform you’re executing on?
I really don’t want to figure that out.
Not to mention, I just don’t think this code is <em>cool</em>.</p>
<p>My test requirements:</p>
<ul>
<li>No if statements</li>
<li>Self contained (I don’t want to shell out or use other libraries)</li>
<li>Must have pizzazz</li>
</ul>
<p>Since machine code is just bytes that the CPU interprets, it made me wonder “is there a set of bytes that execute both on an x86_64 CPU <strong>and</strong> an ARM64 CPU?”
It turns out there are, and I want to walk through them here.</p>
<h2 id="x86_64-instructions">x86_64 Instructions</h2>
<p>First lets look at the x86_64 instructions we’ll execute.
Below is the assembly code (in Intel syntax):</p>
<pre><code>.start:
mov rax, 0x2b ; put 0x2b in the rax register
ret ; return from the function
jmp start ; jump to .start
</code></pre>
<p>This assembly code puts the value <code>0x2b</code> in the <code>rax</code> register and returns from the current “C” function.
I put “C” in quotes because we’re writing assembly code, but the assembly code is conforming to the C calling convention and we’ll treat it as if it’s a C function when we call it.
The x86 C calling convention states that the value in the <code>rax</code> register is the “return value” of the C function.
So we’ve created a function that returns <code>0x2b</code>.
At the end of the code there is a <code>jmp</code> instruction that jumps to the start of this sequence.
However, since we return from the function before getting to the jump, the jump is never used (or is it?????)</p>
<p>Machine code is just bytes, and here are the bytes for the above x86 machine code:</p>
<pre><code>0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00 ; mov rax, 0x2b
0xC3 ; ret
0xEB 0xF6 ; jmp start
</code></pre>
<p>x86 uses a “variable width” encoding, meaning that the number of bytes each instruction uses can vary.
In this example, the <code>mov</code> instruction used 7 bytes, and the <code>ret</code> instruction used 1 byte.
This means that the <code>jmp</code> instruction is the 9th byte, or offset 8.</p>
<h2 id="arm64-instructions">ARM64 Instructions</h2>
<p>Below are some ARM64 instructions we can execute:</p>
<pre><code>movz X11, 0x7b7 ; put 0x7b7 in the X11 register
movz X0, 0x2b ; put 0x2b in the X0 register
ret ; return from the function
</code></pre>
<p>This machine code puts the value 0x7b7 in to the register <code>X11</code>.
Then it puts the value 0x2b in the <code>X0</code> register.
The third instruction returns from the function.
Again we are abiding by the C calling convention, but this time on the ARM64 platform.
On the ARM64 platform, the value stored in <code>X0</code> is the return value.
So the above machine code will return the value <code>0x2b</code> to the caller just like the x86_64 machine code did.</p>
<p>Here are the bytes that represent the above ARM64 machine code:</p>
<pre><code>0xEB 0xF6 0x80 0xD2 ; movz X11, 0x7b7
0x60 0x05 0x80 0xD2 ; movz X0, 0x2b
0xC0 0x03 0x5F 0xD6 ; ret
</code></pre>
<p>ARM64 uses <em>fixed width</em> instructions.
All instructions on ARM64 are 32 bits wide.</p>
<h2 id="cross-platform-machine-code">Cross Platform Machine Code</h2>
<p>Lets look at the byte blocks next to each other:</p>
<pre><code>; x86_64 bytes
0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00 ; mov rax, 0x2b
0xC3 ; ret
0xEB 0xF6 ; jmp start
</code></pre>
<pre><code>; ARM64 bytes
0xEB 0xF6 0x80 0xD2 ; movz X11, 0x7b7
0x60 0x05 0x80 0xD2 ; movz X0, 0x2b
0xC0 0x03 0x5F 0xD6 ; ret
</code></pre>
<p>Looking at the bytes, you’ll notice that the first two bytes of the ARM64 code (<code>0xEB 0xF6</code>) are <strong>exactly the same</strong> as the last two bytes of the x86_64 code.
The first <code>movz</code> instruction in the ARM64 code was specially crafted as to have the same bytes as the last <code>jmp</code> instruction in the x86 code.</p>
<p>If we <em>combine</em> these bytes, then tell the CPU to execute starting at a particular offset, then <em>the interpretation</em> of the bytes will change depending on the CPU, but <em>the result</em> is the same.</p>
<p>Here are the bytes combined:</p>
<pre><code> 0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00 ; mov rax, 0x2b
0xC3 ; ret
start -> 0xEB 0xF6 0x80 0xD2 ; (jmp start, or movz X11, 0x7b7)
0x60 0x05 0x80 0xD2 ; movz X0, 0x2b
0xC0 0x03 0x5F 0xD6 ; ret
</code></pre>
<p>Regardless of platform, we’ll tell the CPU to start executing from offset 8 in the byte buffer.
If it’s an x86 CPU, it will interpret the bytes as a jump, execute the top bytes, return at the <code>ret</code>, and ignore the rest of the bytes in the buffer (as they are never reached).
If it’s an ARM64 machine, then it will interpret the bytes as “put 0x7b7 in the X11 register” and continue, never seeing the x86 specific bytes at the start of the buffer.</p>
<p>Both x86_64 and ARM64 platforms will return the same value 0x2b.</p>
<p>Now we can write a test without <code>if</code> statements like this:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">def</span> <span class="function">test_execute</span>
<span class="comment"># Cross platform bytes</span>
bytes = [<span class="integer">0x48</span>, <span class="integer">0xc7</span>, <span class="integer">0xc0</span>, <span class="integer">0x2b</span>, <span class="integer">0x00</span>, <span class="integer">0x00</span>, <span class="integer">0x00</span>, <span class="comment"># x86_64 mov rax, 0x2b</span>
<span class="integer">0xc3</span>, <span class="comment"># x86_64 ret</span>
<span class="integer">0xeb</span>, <span class="integer">0xf6</span>, <span class="comment"># x86 jmp</span>
<span class="integer">0x80</span>, <span class="integer">0xd2</span>, <span class="comment"># ARM movz X11, 0x7b7</span>
<span class="integer">0x60</span>, <span class="integer">0x05</span>, <span class="integer">0x80</span>, <span class="integer">0xd2</span>, <span class="comment"># ARM movz X0, #0x2b</span>
<span class="integer">0xc0</span>, <span class="integer">0x03</span>, <span class="integer">0x5f</span>, <span class="integer">0xd6</span>] <span class="comment"># ARM ret</span>
<span class="comment"># Write them to the buffer</span>
jit = <span class="constant">JITBuffer</span>.new(<span class="integer">4096</span>)
jit.writeable!
jit.write bytes.pack(<span class="string"><span class="delimiter">"</span><span class="content">C*</span><span class="delimiter">"</span></span>)
jit.executable!
<span class="comment"># start at offset 8</span>
offset = <span class="integer">8</span>
func = <span class="constant">Fiddle</span>::<span class="constant">Function</span>.new(jit.to_i + offset, [], <span class="constant">Fiddle</span>::<span class="constant">TYPE_INT</span>)
<span class="comment"># Check the return value</span>
assert_equal <span class="integer">0x2b</span>, func.call
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>So simple!</p>
<p>So cool!</p>
<p>Tons of pizzazz!</p>
<p>This test will execute machine code on both x86_64 as well as ARM64 and the machine code will return the same value.
Not to mention, there’s no way RuboCop or Flay could possibly complain about this code. 🤣</p>
<p>I hope this inspires you to try writing cross platform machine code.
This code only supports 2 platforms, but it does make me wonder how far we could stretch this and how many platforms we could support.</p>
<p>Anyway, hope you have a good day!</p>
http://tenderlovemaking.com/2022/01/07/homebrew-rosetta-and-ruby.html
Homebrew, Rosetta, and Ruby
2022-01-07T12:39:42-08:00
2022-01-07T12:39:42-08:00
<p>Hi everyone! I finally upgraded to an M1. It’s really really great, but the
main problem is that some projects I work on like <a href="https://github.com/tenderlove/tenderjit">TenderJIT</a> and
<a href="https://github.com/Shopify/yjit">YJIT</a> only really work on x86_64 and these
new M1 machines use ARM chips. Fortunately we can run x86_64 software via <a href="https://en.wikipedia.org/wiki/Rosetta_(software)">Rosetta</a>, so we can still do development work on x86 specific software.</p>
<p>I’ve seen some solutions for setting up a dev environment that uses Rosetta,
but I’d like to share what I did.</p>
<h2 id="installing-homebrew">Installing Homebrew</h2>
<p>I think most people recommend that you install
two different versions of Homebrew, one that targets ARM, and the other that targets x86.</p>
<p>So far, I’ve found this to be the best solution, so I went with it. Just do the
normal Homebrew installation for ARM like this:</p>
<pre><code>$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
</code></pre>
<p>Then run the installer again under Rosetta like this:</p>
<pre><code>$ arch -x86_64 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
</code></pre>
<p>After doing this, I ended up with a Homebrew installation in <code>/opt/homebrew</code> (the ARM version),
and another installation in <code>/usr/local</code> (the x86 version).</p>
<h2 id="configuring-your-terminal">Configuring your Terminal</h2>
<p>I read many places on the web that recommend you duplicate terminal, then rename
it and modify the renamed version to run under Rosetta.</p>
<p>I really didn’t like this solution. The problem for me is that I’d end up with two
different terminal icons when doing <code>cmd-tab</code>, and I really can’t be bothered
to read whether the terminal is the Rosetta one or not. It makes switching to
the right terminal take <em>way</em> too long.</p>
<p>Instead I decided to make my shell figure out what architecture I’m using, then
update <code>$PATH</code> depending on whether I’m using x86 or ARM. To accomplish this,
I installed Fish (I use Fish shell) in both the x86 and ARM installations of
Homebrew:</p>
<pre><code>$ /opt/homebrew/bin/brew install fish
$ arch -x86_64 /usr/local/bin/brew install fish
</code></pre>
<p>If you’re not using Fish you don’t need to do this step. 😆</p>
<p>Next is the “interesting” part. In my <code>config.fish</code>, I added this:</p>
<pre><code>if test (arch) = "i386"
set HOMEBREW_PREFIX /usr/local
else
set HOMEBREW_PREFIX /opt/homebrew
end
# Add the Homebrew prefix to $PATH. -m flag ensures it's at the beginning
# of the path since the path might already be in $PATH (just not at the start)
fish_add_path -m --path $HOMEBREW_PREFIX/bin
alias intel 'arch -x86_64 /usr/local/bin/fish'
</code></pre>
<p>The <code>arch</code> command will tell you which architecture you’re on. If I’m on i386,
set the Homebrew prefix to <code>/usr/local</code>, otherwise set it to <code>/opt/homebrew</code>.
Then use <code>fish_add_path</code> to prepend the Homebrew prefix to my <code>$PATH</code> environment
variable. The <code>-m</code> switch moves the path to the front if <code>$PATH</code> already contained
the path I’m trying to add.</p>
<p>Finally I added an alias <code>intel</code> that just starts a new shell but under Rosetta.
So my default workflow is to open a terminal under ARM, and if I need to work
on an intel project, just run <code>intel</code>.</p>
<h2 id="how-do-i-know-my-current-architecture">How do I know my current architecture?</h2>
<p>The <code>arch</code> command will tell you the current architecture, but I don’t want to
run that every time I want to verify my current architecture. My solution was
to add an emoji to my prompt. I don’t like adding more text to my prompt, but
this seems important enough to warrant the addition.</p>
<p>My <code>fish_prompt</code> function looks like this:</p>
<pre><code>function fish_prompt --description 'Write out the prompt'
if not set -q __fish_prompt_normal
set -g __fish_prompt_normal (set_color normal)
end
if not set -q __fish_prompt_cwd
set -g __fish_prompt_cwd (set_color $fish_color_cwd)
end
if test (arch) = "i386"
set emote 🧐
else
set emote 💪
end
echo -n -s "[$USER" @ (prompt_hostname) $emote ' ' "$__fish_prompt_cwd" (prompt_pwd) (__fish_vcs_prompt) "$__fish_prompt_normal" ']$ '
end
</code></pre>
<p>If I’m on ARM, the prompt will have an 💪 emoji, and if I’m on x86, the prompt
will have a 🧐 emoji.</p>
<p>Just to give an example, here is a sample session in my terminal:</p>
<pre><code>Last login: Fri Jan 7 12:37:59 on ttys001
Welcome to fish, the friendly interactive shell
[aaron@tc-lan-adapter💪 ~]$ which brew
/opt/homebrew/bin/brew
[aaron@tc-lan-adapter💪 ~]$ arch
arm64
[aaron@tc-lan-adapter💪 ~]$ intel
Welcome to fish, the friendly interactive shell
[aaron@tc-lan-adapter🧐 ~]$ which brew
/usr/local/bin/brew
[aaron@tc-lan-adapter🧐 ~]$ arch
i386
[aaron@tc-lan-adapter🧐 ~]$ exit
[aaron@tc-lan-adapter💪 ~]$ arch
arm64
[aaron@tc-lan-adapter💪 ~]$ which brew
/opt/homebrew/bin/brew
[aaron@tc-lan-adapter💪 ~]$
</code></pre>
<p>Now I can easily switch back and forth between x86 and ARM and my prompt tells
me which I’m using.</p>
<h2 id="ruby-with-chruby">Ruby with chruby</h2>
<p>My Ruby dev environment is still a work in progress. I use <a href="https://github.com/postmodern/chruby">chruby</a> for changing Ruby versions.
The problem is that all Ruby versions live in the same directory. chruby doesn’t
know the difference between ARM versions and x86 versions. So for now I’m adding
the architecture name to the directory:</p>
<pre><code>[aaron@tc-lan-adapter💪 ~]$ chruby
ruby-3.0.2
ruby-arm64
ruby-i386
ruby-trunk
[aaron@tc-lan-adapter💪 ~]$
</code></pre>
<p>So I have to be careful which Ruby I switch to. I’ve <a href="https://github.com/postmodern/ruby-install/issues/413">filed a ticket on ruby-install</a>, and I think we can make this nicer.</p>
<p>Specifically I’d like to add a subfolder in <code>~/.rubies</code> for each architecture,
then point chruby at the right subfolder depending on my current architecture.
Essentially the same trick I used for <code>$PATH</code> and Homebrew, but for pointing
chruby at the right place given my current architecture.</p>
<p>For now I just have to be careful though!</p>
<p>One <em>huge</em> caveat for Fish users is that the current version of chruby-fish is
broken such that changes to <code>$PATH</code> end up getting lost (see <a href="https://github.com/JeanMertz/chruby-fish/issues/31">this issue</a>).</p>
<p>To work around that issue, I’m using @ioquatix’s fork of chruby-fish which can
be found <a href="https://github.com/ioquatix/chruby-fish">here</a>. I just checked out
that version of chruby-fish in my git project folder and added this to my <code>config.fish</code>:</p>
<pre><code># Point Fish at our local checkout of chruby-fish
set fish_function_path $fish_function_path $HOME/git/chruby-fish/share/fish/vendor_functions.d
</code></pre>
<h2 id="conclusion">Conclusion</h2>
<p>Getting a dev environment up and running with Rosetta wasn’t too bad, but I think
having the shell fix up <code>$PATH</code> is a better solution than having two copies of Terminal.app</p>
<p>The scripts I presented here were all Fish specific, but I don’t think it should
be too hard to translate them to whatever shell you use.</p>
<p>Anyway, I hope you have a good weekend!</p>
http://tenderlovemaking.com/2021/10/26/publishing-gems-with-your-yubikey.html
Publishing Gems With Your YubiKey
2021-10-26T15:17:51-07:00
2021-10-26T15:17:51-07:00
<p>The recent <a href="https://github.com/faisalman/ua-parser-js/issues/536">compromise of <code>ua-parser-js</code></a> has put the security and trust of published packages at the top of my mind lately.
In order to mitigate the risk of any Ruby Gems I manage from being hijacked, I enabled 2FA on my RubyGems.org account.
This means that whenever I publish a Ruby Gem, I have to enter a one time passcode.</p>
<p>I have to admit, I find this to be a pain. Whenever I do a release of Rails, I
have to enter a passcode over and over again because you can only push one Gem
at a time.</p>
<p>Finally I’ve found a way to deal with this. I can maintain account security
and also not be hassled with OTP codes again, thanks to my YubiKey.</p>
<p>This is just a short post about how to set up your YubiKey as an authenticator
for RubyGems.org, and how to publish Gems without getting an OTP prompt.</p>
<h2 id="install-ykman">Install <code>ykman</code></h2>
<p><code>ykman</code> is a command line utility for interacting with your YubiKey. I installed
it on my Mac with Homebrew:</p>
<pre><code>$ brew install ykman
</code></pre>
<h2 id="set-up-2fa-as-usual">Set up 2FA as usual</h2>
<p>If you already have 2FA enabled, you’ll have to temporarily disable it.</p>
<p>Just go through <a href="https://guides.rubygems.org/setting-up-multifactor-authentication/">the normal 2FA setup process</a> and when you’re presented with a QR code, you’ll use the text
key to configure your YubiKey.</p>
<p>Just do:</p>
<pre><code>$ ykman oath accounts add -t -o TOTP rubygems.org:youremail@example.org 123456
</code></pre>
<p>But use your email address and replace 123456 with the code you got from RubyGems.org.
The <code>-t</code> flag will require you to touch the YubiKey when you want to generate an
OTP.</p>
<h2 id="generate-an-otp">Generate an OTP</h2>
<p>You can now generate an OTP like this:</p>
<pre><code>$ ykman oath accounts code -s rubygems.org
</code></pre>
<h2 id="publishing-a-gem-without-otp-prompts">Publishing a Gem without OTP Prompts</h2>
<p>You can supply an OTP code to the <code>gem</code> interface via an environment variable
or a command line argument.</p>
<p>The environment variable version is like this:</p>
<pre><code>$ GEM_HOST_OTP_CODE=$(ykman oath accounts code -s rubygems.org) gem push cool-gem-0.0.0.gem
</code></pre>
<p>The command line argument is like this:</p>
<pre><code>$ gem push cool-gem-0.0.0.gem --otp $(ykman oath accounts code -s rubygems.org)
</code></pre>
<p>I’ve used the environment variable version, but not the command line argument though.</p>
<h2 id="final-thoughts">Final Thoughts</h2>
<p>I also did this for NPM, but I haven’t tried pushing a package yet so I’ll see how that goes.
I don’t really have any other thoughts except that everyone should enable 2FA so that we can prevent situations like <code>ua-parser-js</code>.
I’m not particularly interested in installing someone’s Bitcoin miner on my machine, and I’m also not interested in being hassled because my package was hijacked.</p>
<p>Everyone, please stay safe and enable 2FA!</p>
<p>--Aaron</p>
<p><3 <3</p>
http://tenderlovemaking.com/2021/02/03/debugging-a-segv-in-ruby.html
Debugging an Assertion Error in Ruby
2021-02-03T17:13:28-08:00
2021-02-03T17:13:28-08:00
<p>I hope nobody runs in to a problem where they need the information in this post, but in case you do, I hope this post is helpful.
(I’m talking to you, future Aaron! lol)</p>
<p>I committed <a href="https://github.com/ruby/ruby/commit/1be84e53d76cff30ae371f0b397336dee934499d">a patch to Ruby</a> that caused the tests to start failing.
This was the patch:</p>
<pre><code>commit 1be84e53d76cff30ae371f0b397336dee934499d
Author: Aaron Patterson <tenderlove@ruby-lang.org>
Date: Mon Feb 1 10:42:13 2021 -0800
Don't pin `val` passed in to `rb_define_const`.
The caller should be responsible for holding a pinned reference (if they
need that)
diff --git a/variable.c b/variable.c
index 92d7d11eab..ff4f7964a7 100644
--- a/variable.c
+++ b/variable.c
@@ -3154,7 +3154,6 @@ rb_define_const(VALUE klass, const char *name, VALUE val)
if (!rb_is_const_id(id)) {
rb_warn("rb_define_const: invalid name `%s' for constant", name);
}
- rb_gc_register_mark_object(val);
rb_const_set(klass, id, val);
}
</code></pre>
<p>This patch is supposed to allow objects passed in to <code>rb_define_const</code> to move.
As the commit message says, the caller should be responsible for keeping the value pinned.
At the time I committed the patch, I thought that most callers of the function were marking the value passed in (as <code>val</code>), so we were pinning objects that something else would already pin.
In other words, this code was being wasteful by chewing up GC time by pinning objects that were already pinned.</p>
<p>Unfortunately the CI started to error shortly after I committed this patch.
Clearly the patch was related, but how?</p>
<p>In this post I am going to walk through the debugging tricks I used to find the error.</p>
<h2 id="reproduction">Reproduction</h2>
<p>I was able to reproduce the error on my Linux machine by running the same command CI ran.
Unfortunately since this bug is related to GC, the error was intermittent.
To reproduce it, I just ran the tests in a loop until the process crashed like this:</p>
<pre><code>$ while test $status -eq 0
env RUBY_TESTOPTS='-q --tty=no' make -j16 -s check
end
</code></pre>
<p>Before running this loop though, I made sure to do <code>ulimit -c unlimited</code> so that I would get a core file when the process crashed.</p>
<h2 id="the-error">The Error</h2>
<p>After the process crashed, the top of the error looked like this:</p>
<pre><code><OBJ_INFO:rb_ractor_confirm_belonging@./ractor_core.h:327> 0x000055be8657f180 [0 ] T_NONE
/home/aaron/git/ruby/lib/bundler/environment_preserver.rb:47: [BUG] id == 0 but not shareable
ruby 3.1.0dev (2021-02-03T17:35:37Z master 6b4814083b) [x86_64-linux]
</code></pre>
<p>The Ractor verification routines crashed the process because a <code>T_NONE</code> object is “not sharable”.
In other words you can’t share an object of type T_NONE between Ractors.
This makes sense because <code>T_NONE</code> objects are actually empty slots in the GC.
If a Ractor, or any other Ruby code sees a <code>T_NONE</code> object, then it’s clearly an error.
Only the GC internals should ever be dealing with this type.</p>
<p>The top of the C backtrace looked like this:</p>
<pre><code>-- C level backtrace information -------------------------------------------
/home/aaron/git/ruby/ruby(rb_print_backtrace+0x14) [0x55be856e9816] vm_dump.c:758
/home/aaron/git/ruby/ruby(rb_vm_bugreport) vm_dump.c:1020
/home/aaron/git/ruby/ruby(bug_report_end+0x0) [0x55be854e2a69] error.c:778
/home/aaron/git/ruby/ruby(rb_bug_without_die) error.c:778
/home/aaron/git/ruby/ruby(rb_bug+0x7d) [0x55be854e2bb0] error.c:786
/home/aaron/git/ruby/ruby(rb_ractor_confirm_belonging+0x102) [0x55be856cf6e2] ./ractor_core.h:328
/home/aaron/git/ruby/ruby(vm_exec_core+0x4ff3) [0x55be856b0003] vm.inc:2224
/home/aaron/git/ruby/tool/lib/test/unit/parallel.rb(rb_vm_exec+0x886) [0x55be856c9946]
/home/aaron/git/ruby/ruby(load_iseq_eval+0xbb) [0x55be8554f66b] load.c:594
/home/aaron/git/ruby/ruby(require_internal+0x394) [0x55be8554e3e4] load.c:1065
/home/aaron/git/ruby/ruby(rb_require_string+0x973c4) [0x55be8554d8a4] load.c:1142
/home/aaron/git/ruby/ruby(rb_f_require) load.c:838
/home/aaron/git/ruby/ruby(vm_call_cfunc_with_frame+0x11a) [0x55be856dd6fa] ./vm_insnhelper.c:2897
/home/aaron/git/ruby/ruby(vm_call_method_each_type+0xaa) [0x55be856d4d3a] ./vm_insnhelper.c:3387
/home/aaron/git/ruby/ruby(vm_call_alias+0x87) [0x55be856d68e7] ./vm_insnhelper.c:3037
/home/aaron/git/ruby/ruby(vm_sendish+0x200) [0x55be856d08e0] ./vm_insnhelper.c:4498
</code></pre>
<p>The function <code>rb_ractor_confirm_belonging</code> was the function raising an exception.</p>
<h2 id="debugging-the-core-file-with-lldb">Debugging the Core File with LLDB</h2>
<p>I usually use clang / lldb when debugging.
I’ve added scripts to Ruby’s lldb tools that let me track down problems more easily, so I prefer it over gcc / gdb.</p>
<p>First I inspected the backtrace in the corefile:</p>
<pre><code>(lldb) target create "./ruby" --core "core.456156"
Core file '/home/aaron/git/ruby/core.456156' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'ruby', stop reason = signal SIGABRT
* frame #0: 0x00007fdc5fc8918b libc.so.6`raise + 203
frame #1: 0x00007fdc5fc68859 libc.so.6`abort + 299
frame #2: 0x000056362ac38bc6 ruby`die at error.c:765:5
frame #3: 0x000056362ac38bb5 ruby`rb_bug(fmt=<unavailable>) at error.c:788:5
frame #4: 0x000056362ae256e2 ruby`rb_ractor_confirm_belonging(obj=<unavailable>) at ractor_core.h:328:13
frame #5: 0x000056362ae06003 ruby`vm_exec_core(ec=<unavailable>, initial=<unavailable>) at vm.inc:2224:5
frame #6: 0x000056362ae1f946 ruby`rb_vm_exec(ec=<unavailable>, mjit_enable_p=<unavailable>) at vm.c:0
frame #7: 0x000056362aca566b ruby`load_iseq_eval(ec=0x000056362b176710, fname=0x000056362ce96660) at load.c:594:5
frame #8: 0x000056362aca43e4 ruby`require_internal(ec=<unavailable>, fname=<unavailable>, exception=1) at load.c:1065:21
frame #9: 0x000056362aca38a4 ruby`rb_f_require [inlined] rb_require_string(fname=0x00007fdc38033178) at load.c:1142:18
frame #10: 0x000056362aca3880 ruby`rb_f_require(obj=<unavailable>, fname=0x00007fdc38033178) at load.c:838
frame #11: 0x000056362ae336fa ruby`vm_call_cfunc_with_frame(ec=0x000056362b176710, reg_cfp=0x00007fdc5f958de0, calling=<unavailable>) at vm_insnhelper.c:2897:11
frame #12: 0x000056362ae2ad3a ruby`vm_call_method_each_type(ec=0x000056362b176710, cfp=0x00007fdc5f958de0, calling=0x00007ffe3b552128) at vm_insnhelper.c:3387:16
frame #13: 0x000056362ae2c8e7 ruby`vm_call_alias(ec=0x000056362b176710, cfp=0x00007fdc5f958de0, calling=0x00007ffe3b552128) at vm_insnhelper.c:3037:12
</code></pre>
<p>It’s very similar to the backtrace in the crash report.
The first thing that was interesting to me was frame 5 in <code>vm_exec_core</code>.
<code>vm_exec_core</code> is the main loop for the YARV VM.
This program was crashing when executing some kind of instruction in the virtual machine.</p>
<pre><code>(lldb) f 5
frame #5: 0x000056362ae06003 ruby`vm_exec_core(ec=<unavailable>, initial=<unavailable>) at vm.inc:2224:5
2221 /* ### Instruction trailers. ### */
2222 CHECK_VM_STACK_OVERFLOW_FOR_INSN(VM_REG_CFP, INSN_ATTR(retn));
2223 CHECK_CANARY(leaf, INSN_ATTR(bin));
-> 2224 PUSH(val);
2225 if (leaf) ADD_PC(INSN_ATTR(width));
2226 # undef INSN_ATTR
2227
(lldb)
</code></pre>
<p>Checking frame 5, we can see that it’s crashing when we <em>push</em> a value on to the stack.
The Ractor function checks the value of objects being pushed on the VM stack, and in this case we have an object that is a <code>T_NONE</code>.
The question is where did this value come from?</p>
<p>The crash happened in the file <code>vm.inc</code>, line 2224. This file is a generated
file, so I can’t link to it, but I wanted to know <em>which</em> instruction was being
executed, so I pulled up that file.</p>
<p>Line 2224 happened to be inside the <code>opt_send_without_block</code> instruction.
So something is calling a method, and the return value of the method is a <code>T_NONE</code> object.</p>
<p>But what method is being called, and on what object?</p>
<h2 id="finding-the-called-method">Finding the called method</h2>
<p>The value <code>ec</code>, or “Execution Context” contains information about the virtual machine at runtime.
On the <code>ec</code>, we can find the <code>cfp</code> or “Control Frame Pointer” which is a data structure representing the current executing stack frame.
In lldb, I could see that frame 7 had the <code>ec</code> available, so I went to that frame to look at the <code>cfp</code>:</p>
<pre><code>(lldb) f 7
frame #7: 0x000056362aca566b ruby`load_iseq_eval(ec=0x000056362b176710, fname=0x000056362ce96660) at load.c:594:5
591 rb_ast_dispose(ast);
592 }
593 rb_exec_event_hook_script_compiled(ec, iseq, Qnil);
-> 594 rb_iseq_eval(iseq);
595 }
596
597 static inline enum ruby_tag_type
(lldb) p *ec->cfp
(rb_control_frame_t) $1 = {
pc = 0x000056362c095d58
sp = 0x00007fdc5f859330
iseq = 0x000056362ca051f0
self = 0x000056362b1d92c0
ep = 0x00007fdc5f859328
block_code = 0x0000000000000000
__bp__ = 0x00007fdc5f859330
}
</code></pre>
<p>The control frame pointer has a pointer to the <code>iseq</code> or “Instruction Sequence” that is currently being executed.
It also has a <code>pc</code> or “Program Counter”, and the program counter usually points at the instruction that will be executed <em>next</em> (in other words, not the currently executing instruction).
Of other interest, the <code>iseq</code> also has the source location that corresponds to those instructions.</p>
<h2 id="getting-the-source-file">Getting the Source File</h2>
<p>If we examine the iseq structure, we can find the source location of the code that is currently being executed:</p>
<pre><code>(lldb) p ec->cfp->iseq->body->location
(rb_iseq_location_t) $4 = {
pathobj = 0x000056362ca06960
base_label = 0x000056362ce95a30
label = 0x000056362ce95a30
first_lineno = 0x0000000000000051
node_id = 137
code_location = {
beg_pos = (lineno = 40, column = 4)
end_pos = (lineno = 50, column = 7)
}
}
(lldb) command script import -r ~/git/ruby/misc/lldb_cruby.py
lldb scripts for ruby has been installed.
(lldb) rp 0x000056362ca06960
bits [ ]
T_STRING: [FROZEN] (const char [57]) $6 = "/home/aaron/git/ruby/lib/bundler/environment_preserver.rb"
(lldb)
</code></pre>
<p>The location info clearly shows us that the instructions are on line 40.
The <code>pathobj</code> member contains the file name, but it is stored as a Ruby string.
To print out the string, I imported the lldb CRuby extensions, then used the <code>rp</code> command and gave it the address of the path object.</p>
<p>From the output, we can see that it’s crashing in the “environment_preserver.rb” file inside of the instructions that are defined on line 40.
We’re not crashing on line 40, but the instructions are defined there.</p>
<p>Those instructions are <a href="https://github.com/ruby/ruby/blob/33d6e92e0c6eaf1308ce7108e653c53bb5fb106c/lib/bundler/environment_preserver.rb#L40-L50">this method</a>:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> <span class="keyword">def</span> <span class="function">replace_with_backup</span>
<span class="predefined-constant">ENV</span>.replace(backup) <span class="keyword">unless</span> <span class="constant">Gem</span>.win_platform?
<span class="comment"># Fallback logic for Windows below to workaround</span>
<span class="comment"># https://bugs.ruby-lang.org/issues/16798. Can be dropped once all</span>
<span class="comment"># supported rubies include the fix for that.</span>
<span class="predefined-constant">ENV</span>.clear
backup.each {|k, v| <span class="predefined-constant">ENV</span>[k] = v }
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>It’s still not clear which of these method calls is breaking.
In this function we have some method call that is returning a <code>T_NONE</code>.</p>
<h2 id="finding-the-method-call">Finding The Method Call</h2>
<p>To find the method call, I disassembled the instruction sequence and checked the program counter:</p>
<pre><code>(lldb) command script import -r misc/lldb_disasm.py
lldb Ruby disasm installed.
(lldb) rbdisasm ec->cfp->iseq
PC IDX insn_name(operands)
0x56362c095c20 0000 opt_getinlinecache( 6, (struct iseq_inline_cache_entry *)0x56362c095ee0 )
0x56362c095c38 0003 putobject( (VALUE)0x14 )
0x56362c095c48 0005 getconstant( ID: 0x807b )
0x56362c095c58 0007 opt_setinlinecache( (struct iseq_inline_cache_entry *)0x56362c095ee0 )
0x56362c095c68 0009 opt_send_without_block( (struct rb_call_data *)0x56362c095f20 )
0x56362c095c78 0011 branchif( 15 )
0x56362c095c88 0013 opt_getinlinecache( 6, (struct iseq_inline_cache_entry *)0x56362c095ef0 )
0x56362c095ca0 0016 putobject( (VALUE)0x14 )
0x56362c095cb0 0018 getconstant( ID: 0x370b )
0x56362c095cc0 0020 opt_setinlinecache( (struct iseq_inline_cache_entry *)0x56362c095ef0 )
0x56362c095cd0 0022 putself
0x56362c095cd8 0023 opt_send_without_block( (struct rb_call_data *)0x56362c095f30 )
0x56362c095ce8 0025 opt_send_without_block( (struct rb_call_data *)0x56362c095f40 )
0x56362c095cf8 0027 pop
0x56362c095d00 0028 opt_getinlinecache( 6, (struct iseq_inline_cache_entry *)0x56362c095f00 )
0x56362c095d18 0031 putobject( (VALUE)0x14 )
0x56362c095d28 0033 getconstant( ID: 0x370b )
0x56362c095d38 0035 opt_setinlinecache( (struct iseq_inline_cache_entry *)0x56362c095f00 )
0x56362c095d48 0037 opt_send_without_block( (struct rb_call_data *)0x56362c095f50 )
0x56362c095d58 0039 pop
0x56362c095d60 0040 putself
0x56362c095d68 0041 opt_send_without_block( (struct rb_call_data *)0x56362c095f60 )
0x56362c095d78 0043 send( (struct rb_call_data *)0x56362c095f70, (rb_iseq_t *)0x56362ca05178 )
0x56362c095d90 0046 leave
(lldb) p ec->cfp->pc
(const VALUE *) $9 = 0x000056362c095d58
</code></pre>
<p>First I loaded the disassembly helper script. It provides the <code>rbdisasm</code> function.
Then I used <code>rbdisasm</code> on the instruction sequence.
This printed out the instructions in mostly human readable form.
Printing the PC showed a value of <code>0x000056362c095d58</code>.
Looking at the PC list in the disassembly shows that <code>0x000056362c095d58</code> corresponds to a <code>pop</code> instruction.
But the PC always points at the <em>next</em> instruction that will execute, not the <em>currently</em> executing instruction.
The currently executing instruction is the one right before the PC.
In this case we can see it is <code>opt_send_without_block</code>, which lines up with the information we discovered from <code>vm.inc</code>.</p>
<p>This is the 3rd from last method call in the block.
At <code>0041</code> there is another <code>opt_send_without_block</code>, and then at <code>0043</code> there is a generic <code>send</code> call.</p>
<p>Looking at the Ruby code, from the bottom of the method, we see a call to <code>backup</code>.
It’s not a local variable, so it must be a method call.
The code calls <code>each</code> on that, and <code>each</code> takes a block.
These must correspond to the <code>opt_send_without_block</code> and the <code>send</code> at the end of the instruction sequence.
Our crash is happening just before these two, so it must be the call to <code>ENV.clear</code>.</p>
<p>If we read the implementation of <code>ENV.clear</code>, we can see that <a href="https://github.com/ruby/ruby/blob/986b38f301ff0f39961f03c568dd7498f48e9852/hash.c#L5860">it returns a global variable</a> called <code>envtbl</code>:</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>VALUE
rb_env_clear(<span class="directive">void</span>)
{
VALUE keys;
<span class="predefined-type">long</span> i;
keys = env_keys(TRUE);
<span class="keyword">for</span> (i=<span class="integer">0</span>; i<RARRAY_LEN(keys); i++) {
VALUE key = RARRAY_AREF(keys, i);
<span class="directive">const</span> <span class="predefined-type">char</span> *nam = RSTRING_PTR(key);
ruby_setenv(nam, <span class="integer">0</span>);
}
RB_GC_GUARD(keys);
<span class="keyword">return</span> envtbl;
}
</pre></div>
</div>
</div>
<p>This object is allocated <a href="https://github.com/ruby/ruby/blob/986b38f301ff0f39961f03c568dd7498f48e9852/hash.c#L7162">here</a>:</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> envtbl = rb_obj_alloc(rb_cObject);
</pre></div>
</div>
</div>
<p>And then it <a href="https://github.com/ruby/ruby/blob/986b38f301ff0f39961f03c568dd7498f48e9852/hash.c#L7218">calls <code>rb_define_global_const</code></a> to define the <code>ENV</code> constant as a global:</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> <span class="comment">/*
* ENV is a Hash-like accessor for environment variables.
*
* See ENV (the class) for more details.
*/</span>
rb_define_global_const(<span class="string"><span class="delimiter">"</span><span class="content">ENV</span><span class="delimiter">"</span></span>, envtbl);
</pre></div>
</div>
</div>
<p>If we read <code>rb_define_global_const</code> we can see <a href="https://github.com/ruby/ruby/blob/986b38f301ff0f39961f03c568dd7498f48e9852/variable.c#L3161-L3165">that it just calls <code>rb_define_const</code></a>:</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="directive">void</span>
rb_define_global_const(<span class="directive">const</span> <span class="predefined-type">char</span> *name, VALUE val)
{
rb_define_const(rb_cObject, name, val);
}
</pre></div>
</div>
</div>
<p>Before my patch, any object passed to <code>rb_define_const</code> would be pinned.
Once I removed the pinning, that allowed the <code>ENV</code> variable to move around even though it shouldn’t.</p>
<p>I reverted that patch <a href="https://github.com/ruby/ruby/commit/33d6e92e0c6eaf1308ce7108e653c53bb5fb106c">here</a>, and then sent a pull request to make <code>rb_gc_register_mark_object</code> a little bit smarter <a href="https://github.com/ruby/ruby/pull/4152">here</a>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>TBH I don’t know what to conclude this with.
Debugging errors kind of sucks, but I hope that the LLDB scripts I wrote make it suck a little less.
Hope you’re having a good day!!!</p>
http://tenderlovemaking.com/2020/08/26/counting-write-barrier-unprotected-objects.html
Counting Write Barrier Unprotected Objects
2020-08-26T15:29:01-07:00
2020-08-26T15:29:01-07:00
<p>This is just a quick post mostly as a note to myself (because I forget the <code>jq</code> commands).
Ruby objects that are not protected with a write barrier must be examined on every minor GC.
That means that any objects in your system that live for a long time and <em>don’t</em> have write barrier protection will cause unnecessary overhead on every minor collection.</p>
<p>Heap dumps will tell you which objects have a write barrier.
In Rails apps I use a small script to get a dump of the heap after boot:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">'</span><span class="content">objspace</span><span class="delimiter">'</span></span>
require <span class="string"><span class="delimiter">'</span><span class="content">config/environment</span><span class="delimiter">'</span></span>
<span class="constant">GC</span>.start
<span class="constant">File</span>.open(<span class="string"><span class="delimiter">"</span><span class="content">heap.dump</span><span class="delimiter">"</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">wb</span><span class="delimiter">"</span></span>) <span class="keyword">do</span> |f|
<span class="constant">ObjectSpace</span>.dump_all(<span class="key">output</span>: f)
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>The <code>heap.dump</code> file will have a list of all of the objects in the heap.</p>
<p>Here is an example of an object <em>with</em> a write barrier:</p>
<pre><code>{"address":"0x7fec1b2ff940", "type":"IMEMO", "class":"0x7fec1b2ffd50", "imemo_type":"ment", "references":["0x7fec1b314908", "0x7fec1b2ffcd8"], "memsize":48, "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}}
</code></pre>
<p>Here is an example of an object <em>without</em> a write barrier:</p>
<pre><code>{"address":"0x7fec1b2ff760", "type":"ICLASS", "class":"0x7fec1a8c0f60", "references":["0x7fec1a8c9250", "0x7fec1b2fefe0"], "memsize":40}
</code></pre>
<p>Objects <em>with</em> a write barrier will have <code>"wb_protected":true</code> in their flags section.</p>
<p>I like to use <code>jq</code> to process heap dumps.
Here is a command to find all of the unprotected objects, group them by type, then count them up:</p>
<pre><code>$ jq 'select(.flags.wb_protected | not) | .type' heap.dump | sort | uniq -c | sort -n
1 "MATCH"
2 "ARRAY"
5 "ROOT"
9 "FILE"
323 "MODULE"
927 "ICLASS"
1631 "DATA"
</code></pre>
<p>All of the objects listed here will be examined on every minor GC.
If my Rails app is spending a lot of time in minor GCs, this is a good place to look.</p>
<p>Ruby 2.8 (or 3.0) will eliminate <code>ICLASS</code> from this list (<a href="https://github.com/ruby/ruby/commit/264e4cd04fbcdcb739a1ff9a84e19afe66005cb2">here is the commit</a>).</p>
http://tenderlovemaking.com/2020/01/13/guide-to-string-encoding-in-ruby.html
Guide to String Encoding in Ruby
2020-01-13T06:00:39-08:00
2020-01-13T06:00:39-08:00
<p>Encoding issues don’t seem to happen frequently, but that is a blessing and a curse.
It’s great not to fix them very frequently, but when you do need to fix them, lack
of experience can leave you feeling lost.</p>
<p>This post is meant to be a sort of guide about what to do when you encounter different types
of encoding errors in Ruby.
First we’ll cover what an encoding object is, then we’ll look at common encoding exceptions
and how to fix them.</p>
<h2 id="what-are-string-encodings-in-ruby">What are String encodings in Ruby?</h2>
<p>In Ruby, strings are a combination of an array of bytes, and an encoding object.
We can access the encoding object on the string by calling <code>encoding</code> on the
string object.</p>
<p>For example:</p>
<pre><code>>> x = 'Hello World'
>> x.encoding
=> #<Encoding:UTF-8>
</code></pre>
<p>In my environment, the default encoding object associated with a string us the “UTF-8” encoding object.
A graph of the object relationship looks something like this:</p>
<p><img src="/images/encoding1.png" alt="string points at encoding" /></p>
<h2 id="changing-a-strings-encoding">Changing a String’s Encoding</h2>
<p>We can change encoding by two different methods:</p>
<ul>
<li>String#force_encoding</li>
<li>String#encode</li>
</ul>
<p>The <code>force_encoding</code> method will mutate the string object and only change which encoding object the string points to.
It does nothing to the bytes of the string, it merely changes the encoding object associated with the string.
Here we can see that the return value of <code>encoding</code> changes after we call the <code>force_encode</code> method:</p>
<pre><code>>> x = 'Hello World'
>> x.encoding
=> #<Encoding:UTF-8>
>> x.force_encoding "US-ASCII"
=> "Hello World"
>> x.encoding
=> #<Encoding:US-ASCII>
</code></pre>
<p>The <code>encode</code> method will create a new string based on the bytes of the old string and associate the encoding object with the new string.</p>
<p>Here we can see that the encoding of <code>x</code> remains the same, and
calling <code>encode</code> returns a new string <code>y</code> which is associated with the new encoding:</p>
<pre><code>>> x = 'Hello World'
>> x.encoding
=> #<Encoding:UTF-8>
>> y = x.encode("US-ASCII")
>> x.encoding
=> #<Encoding:UTF-8>
>> y.encoding
=> #<Encoding:US-ASCII>
</code></pre>
<p>Here is a visualization of the difference:</p>
<p><img src="/images/change_encoding.png" alt="changing encoding" /></p>
<p>Calling <code>force_encoding</code> mutates the original string, where <code>encode</code> creates a new string with a different encoding.
Translating a string from one encoding to another is probably the “normal” use of encodings.
However, developers will rarely call the <code>encode</code> method because Ruby will typically handle any necessary translations automatically.
It’s probably more common to call the <code>force_encoding</code> method, and that is because strings can be associated with the <em>wrong</em> encoding.</p>
<h2 id="strings-can-have-the-wrong-encoding">Strings Can Have the Wrong Encoding</h2>
<p>Strings can be associated with the wrong encoding object, and that is the source of most if not all encoding related exceptions.
Let’s look at an example:</p>
<pre><code>>> x = "Hello \x93\xfa\x96\x7b"
>> x.encoding
=> #<Encoding:UTF-8>
>> x.valid_encoding?
=> false
</code></pre>
<p>In this case, Ruby associated the string <code>"Hello \x93\xfa\x96\x7b"</code> with the default encoding UTF-8.
However, many of the bytes in the string are not valid Unicode characters.
We can check if the string is associated with a valid encoding object by calling <code>valid_encoding?</code> method.
The <code>valid_encoding?</code> method will scan all bytes to see if they are valid for that particular encoding object.</p>
<p>So how do we fix this?
The answer depends on the situation.
We need to think about where the data came from and where the data is going.
Let’s say we’ll display this string on a webpage, but we do not know the correct encoding for the string.
In that case we probably want to make sure the string is valid UTF-8, but since we don’t know the correct encoding for the string, our only choice is to remove the bad bytes from the string.</p>
<p>We can remove the unknown bytes by using the <code>scrub</code> method:</p>
<pre><code>>> x = "Hello \x93\xfa\x96\x7b"
>> x.valid_encoding?
=> false
>> y = x.scrub
>> y
=> "Hello ���{"
>> y.encoding
=> #<Encoding:UTF-8>
>> y.valid_encoding?
=> true
</code></pre>
<p>The <code>scrub</code> method will return a new string associated with the encoding but with all of the invalid bytes replaced by a replacement character, the diamond question mark thing.</p>
<p>What if we do know the encoding of the source string?
Actually the example above is using a string that’s encoding using Shift JIS.
Let’s say we know the encoding, and we want to display the string on a webpage.
In that case we tag the string by using <code>force_encoding</code>, and transcode to UTF-8:</p>
<pre><code>>> x = "Hello \x93\xfa\x96\x7b"
>> x.force_encoding "Shift_JIS"
=> "Hello \x{93FA}\x{967B}"
>> x.valid_encoding?
=> true
>> x.encode "UTF-8" # display as UTF-8
=> "Hello 日本"
</code></pre>
<p>The most important thing to think about when dealing with encoding issues is “where did this data come from?” and “what will we do with this data?”
Answering those two questions will drive all decisions about which encoding to use with which string.</p>
<h2 id="encoding-depends-on-the-context">Encoding Depends on the Context</h2>
<p>Before we look at some common errors and their remediation, let’s look at one more example of the encoding context dependency.
In this example, we’ll use some user input as a cache key, but we’ll also display the user input on a webpage.
We’re going to use our source data (the user input) in two places: as a cache key, and something to display on a web page.</p>
<p>Here’s the code:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">digest/md5</span><span class="delimiter">"</span></span>
require <span class="string"><span class="delimiter">"</span><span class="content">cgi</span><span class="delimiter">"</span></span>
<span class="comment"># Make a checksum</span>
<span class="keyword">def</span> <span class="function">make_checksum</span> string
<span class="constant">Digest</span>::<span class="constant">MD5</span>.hexdigest string
<span class="keyword">end</span>
<span class="comment"># Not good HTML escaping (don't use this)</span>
<span class="comment"># Returns a string with UTF-8 compatible encoding for display on a webpage</span>
<span class="keyword">def</span> <span class="function">display_on_web</span> string
string.gsub(<span class="regexp"><span class="delimiter">/</span><span class="content">></span><span class="delimiter">/</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">&gt;</span><span class="delimiter">"</span></span>)
<span class="keyword">end</span>
<span class="comment"># User input from an unknown source</span>
x = <span class="string"><span class="delimiter">"</span><span class="content">Hello </span><span class="char">\x93</span><span class="char">\xfa</span><span class="char">\x96</span><span class="char">\x7b</span><span class="delimiter">"</span></span>
p <span class="key">ENCODING</span>: x.encoding
p <span class="key">VALID_ENCODING</span>: x.valid_encoding?
p display_on_web x
p make_checksum x
</pre></div>
</div>
</div>
<p>If we run this code, we’ll get an exception:</p>
<pre><code>$ ruby thing.rb
{:ENCODING=>#<Encoding:UTF-8>}
{:VALID_ENCODING=>false}
Traceback (most recent call last):
2: from thing.rb:20:in `<main>'
1: from thing.rb:12:in `display_on_web'
thing.rb:12:in `gsub': invalid byte sequence in UTF-8 (ArgumentError)
</code></pre>
<p>The problem is that we have a string of unknown input with bytes that are not valid UTF-8 characters.
We know we want to display this string on a UTF-8 encoded webpage, so lets scrub the string:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">digest/md5</span><span class="delimiter">"</span></span>
require <span class="string"><span class="delimiter">"</span><span class="content">cgi</span><span class="delimiter">"</span></span>
<span class="comment"># Make a checksum</span>
<span class="keyword">def</span> <span class="function">make_checksum</span> string
<span class="constant">Digest</span>::<span class="constant">MD5</span>.hexdigest string
<span class="keyword">end</span>
<span class="comment"># Not good HTML escaping (don't use this)</span>
<span class="comment"># Returns a string with UTF-8 compatible encoding for display on a webpage</span>
<span class="keyword">def</span> <span class="function">display_on_web</span> string
string.gsub(<span class="regexp"><span class="delimiter">/</span><span class="content">></span><span class="delimiter">/</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">&gt;</span><span class="delimiter">"</span></span>)
<span class="keyword">end</span>
<span class="comment"># User input from an unknown source</span>
x = <span class="string"><span class="delimiter">"</span><span class="content">Hello </span><span class="char">\x93</span><span class="char">\xfa</span><span class="char">\x96</span><span class="char">\x7b</span><span class="delimiter">"</span></span>.scrub
p <span class="key">ENCODING</span>: x.encoding
p <span class="key">VALID_ENCODING</span>: x.valid_encoding?
p display_on_web x
p make_checksum x
</pre></div>
</div>
</div>
<p>Now when we run the program, the output is like this:</p>
<pre><code>$ ruby thing.rb
{:ENCODING=>#<Encoding:UTF-8>}
{:VALID_ENCODING=>true}
"Hello ���{"
"4dab6f63b4d3ae3279345c9df31091eb"
</code></pre>
<p>Great! We’ve build some HTML and generated a checksum.
Unfortunately there is a bug in this code (of course the mere fact that we’ve written code means there’s a bug! lol)
Let’s introduce a second user input string with slightly different bytes than the first input string:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">digest/md5</span><span class="delimiter">"</span></span>
require <span class="string"><span class="delimiter">"</span><span class="content">cgi</span><span class="delimiter">"</span></span>
<span class="comment"># Make a checksum</span>
<span class="keyword">def</span> <span class="function">make_checksum</span> string
<span class="constant">Digest</span>::<span class="constant">MD5</span>.hexdigest string
<span class="keyword">end</span>
<span class="comment"># Not good HTML escaping (don't use this)</span>
<span class="comment"># Returns a string with UTF-8 compatible encoding for display on a webpage</span>
<span class="keyword">def</span> <span class="function">display_on_web</span> string
string.gsub(<span class="regexp"><span class="delimiter">/</span><span class="content">></span><span class="delimiter">/</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">&gt;</span><span class="delimiter">"</span></span>)
<span class="keyword">end</span>
<span class="comment"># User input from an unknown source</span>
x = <span class="string"><span class="delimiter">"</span><span class="content">Hello </span><span class="char">\x93</span><span class="char">\xfa</span><span class="char">\x96</span><span class="char">\x7b</span><span class="delimiter">"</span></span>.scrub
p <span class="key">ENCODING</span>: x.encoding
p <span class="key">VALID_ENCODING</span>: x.valid_encoding?
p display_on_web x
p make_checksum x
<span class="comment"># Second user input from an unknown source with slightly different bytes</span>
y = <span class="string"><span class="delimiter">"</span><span class="content">Hello </span><span class="char">\x94</span><span class="char">\xfa</span><span class="char">\x97</span><span class="char">\x7b</span><span class="delimiter">"</span></span>.scrub
p <span class="key">ENCODING</span>: y.encoding
p <span class="key">VALID_ENCODING</span>: y.valid_encoding?
p display_on_web y
p make_checksum y
</pre></div>
</div>
</div>
<p>Here is the output from the program:</p>
<pre><code>$ ruby thing.rb
{:ENCODING=>#<Encoding:UTF-8>}
{:VALID_ENCODING=>true}
"Hello ���{"
"4dab6f63b4d3ae3279345c9df31091eb"
{:ENCODING=>#<Encoding:UTF-8>}
{:VALID_ENCODING=>true}
"Hello ���{"
"4dab6f63b4d3ae3279345c9df31091eb"
</code></pre>
<p>The program works in the sense that there is no exception.
But both user input strings have the same checksum despite the fact that the original strings clearly have different bytes!
So what is the correct fix for this program?
Again, we need to think about the source of the data (where did it come from), as well as what we will do with it (where it is going).
In this case we have one source, from a user, and the user provided us with no encoding information.
In other words, the encoding information of the source data is unknown, so we can only treat it as a sequence of bytes.
We have two output cases, one is a UTF-8 HTML the other output is <em>the input</em> to our checksum function.
The HTML requires that our string be UTF-8 so making the string valid UTF-8, in other words “scrubbing” it, before displaying makes sense.
However, our checksum function requires seeing the original bytes of the string.
Since the checksum is only concerned with the bytes in the string, any encoding including an invalid encoding will work.
It’s nice to make sure all our strings have valid encodings though, so we’ll fix this example such that everything has a valid encoding.</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">digest/md5</span><span class="delimiter">"</span></span>
require <span class="string"><span class="delimiter">"</span><span class="content">cgi</span><span class="delimiter">"</span></span>
<span class="comment"># Make a checksum</span>
<span class="keyword">def</span> <span class="function">make_checksum</span> string
<span class="constant">Digest</span>::<span class="constant">MD5</span>.hexdigest string
<span class="keyword">end</span>
<span class="comment"># Not good HTML escaping (don't use this)</span>
<span class="comment"># Returns a string with UTF-8 compatible encoding for display on a webpage</span>
<span class="keyword">def</span> <span class="function">display_on_web</span> string
string.gsub(<span class="regexp"><span class="delimiter">/</span><span class="content">></span><span class="delimiter">/</span></span>, <span class="string"><span class="delimiter">"</span><span class="content">&gt;</span><span class="delimiter">"</span></span>)
<span class="keyword">end</span>
<span class="comment"># User input from an unknown source</span>
x = <span class="string"><span class="delimiter">"</span><span class="content">Hello </span><span class="char">\x93</span><span class="char">\xfa</span><span class="char">\x96</span><span class="char">\x7b</span><span class="delimiter">"</span></span>.b
p <span class="key">ENCODING</span>: x.encoding
p <span class="key">VALID_ENCODING</span>: x.valid_encoding?
p display_on_web x.encode(<span class="string"><span class="delimiter">"</span><span class="content">UTF-8</span><span class="delimiter">"</span></span>, <span class="key">undef</span>: <span class="symbol">:replace</span>)
p make_checksum x
<span class="comment"># Second user input from an unknown source with slightly different bytes</span>
y = <span class="string"><span class="delimiter">"</span><span class="content">Hello </span><span class="char">\x94</span><span class="char">\xfa</span><span class="char">\x97</span><span class="char">\x7b</span><span class="delimiter">"</span></span>.b
p <span class="key">ENCODING</span>: y.encoding
p <span class="key">VALID_ENCODING</span>: y.valid_encoding?
p display_on_web y.encode(<span class="string"><span class="delimiter">"</span><span class="content">UTF-8</span><span class="delimiter">"</span></span>, <span class="key">undef</span>: <span class="symbol">:replace</span>)
p make_checksum y
</pre></div>
</div>
</div>
<p>Here is the output of the program:</p>
<pre><code>$ ruby thing.rb
{:ENCODING=>#<Encoding:ASCII-8BIT>}
{:VALID_ENCODING=>true}
"Hello ���{"
"96cf6db2750fd4d2488fac57d8e4d45a"
{:ENCODING=>#<Encoding:ASCII-8BIT>}
{:VALID_ENCODING=>true}
"Hello ���{"
"b92854c0db4f2c2c20eff349a9a8e3a0"
</code></pre>
<p>To fix our program, we’ve changed a couple things.
First we tagged the string of unknown encoding as “binary” by using the <code>.b</code> method.
The <code>.b</code> method returns a new string that is associated with the <code>ASCII-8BIT</code> encoding.
The name <code>ASCII-8BIT</code> is somewhat confusing because it has the word “ASCII” in it.
It’s better to think of this encoding as either “unknown” or “binary data”.
Unknown meaning we have some data that may have a valid encoding, but we don’t know what it is.
Or binary data, as in the bytes read from a JPEG file or some such binary format.
Anyway, we pass the binary string in to the checksum function because the checksum only cares about the bytes in the string, not about the encoding.</p>
<p>The second change we made is to call <code>encode</code> with the encoding we want (UTF-8) along with <code>undef: :replace</code> meaning that any time Ruby encounters bytes it doesn’t know how to convert to the target encoding, it will replace them with the replacement character (the diamond question thing).</p>
<p>SIDE NOTE: This is probably not important, but it is fun!
We can specify what Ruby uses for replacing unknown bytes.
Here’s an example:</p>
<pre><code>>> x = "Hello \x94\xfa\x97\x7b".b
>> x.encoding
=> #<Encoding:ASCII-8BIT>
>> x.encode("UTF-8", undef: :replace, replace: "Aaron")
=> "Hello AaronAaronAaron{"
>> x.encode("UTF-8", undef: :replace, replace: "🤣")
=> "Hello 🤣🤣🤣{"
>> [_.encoding, _.valid_encoding?]
=> [#<Encoding:UTF-8>, true]
</code></pre>
<p>Now lets take a look at some common encoding errors in Ruby and what to do about them.</p>
<h2 id="encodinginvalidbytesequenceerror"><code>Encoding::InvalidByteSequenceError</code></h2>
<p>This exception occurs when Ruby needs to examine the bytes in a string and the bytes do not match the encoding.
Here is an example of this exception:</p>
<pre><code>>> x = "Hello \x93\xfa\x96\x7b"
>> x.encode "UTF-16"
Traceback (most recent call last):
5: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `<main>'
4: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `load'
3: from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.7.0/gems/irb-1.2.0/exe/irb:11:in `<top (required)>'
2: from (irb):4
1: from (irb):4:in `encode'
Encoding::InvalidByteSequenceError ("\x93" on UTF-8)
>> x.encoding
=> #<Encoding:UTF-8>
>> x.valid_encoding?
=> false
</code></pre>
<p>The string <code>x</code> contains bytes that aren’t valid UTF-8, yet it is associated with the UTF-8 encoding object.
When we try to convert <code>x</code> to UTF-16, an exception occurs.</p>
<h3 id="how-to-fix-encodinginvalidbytesequenceerror">How to fix <code>Encoding::InvalidByteSequenceError</code></h3>
<p>Like most encoding issues, our string <code>x</code> is tagged with the wrong encoding.
The way to fix this issue is to tag the string with the correct encoding.
But what is the correct encoding?
To figure out the correct encoding, you need to know where the string came from.
For example if the string came from a Mime attachment, the Mime attachment should specify the encoding (or the RFC will tell you).</p>
<p>In this case, the string is a valid Shift JIS string, but I know that because I looked up the bytes and manually entered them.
So we’ll tag this as Shift JIS, and the exception goes away:</p>
<pre><code>>> x = "Hello \x93\xfa\x96\x7b"
>> x.force_encoding "Shift_JIS"
=> "Hello \x{93FA}\x{967B}"
>> x.encode "UTF-16"
=> "\uFEFFHello \u65E5\u672C"
>> x.encoding
=> #<Encoding:Shift_JIS>
>> x.valid_encoding?
=> true
</code></pre>
<p>If you don’t know the source of the string, an alternative solution is to tag as UTF-8 and then scrub the bytes:</p>
<pre><code>>> x = "Hello \x93\xfa\x96\x7b"
>> x.force_encoding "UTF-8"
=> "Hello \x93\xFA\x96{"
>> x.scrub!
=> "Hello ���{"
>> x.encode "UTF-16"
=> "\uFEFFHello \uFFFD\uFFFD\uFFFD{"
>> x.encoding
=> #<Encoding:UTF-8>
>> x.valid_encoding?
=> true
</code></pre>
<p>Of course this works, but it means that you’ve lost data.
The best solution is to figure out what the encoding of the string <em>should</em> be depending on its source and tag it with the correct encoding.</p>
<h2 id="encodingundefinedconversionerror"><code>Encoding::UndefinedConversionError</code></h2>
<p>This exception occurs when a string of one encoding can’t be converted to another encoding.</p>
<p>Here is an example:</p>
<pre><code>>> x = "四\u2160"
>> x
=> "四Ⅰ"
>> x.encoding
=> #<Encoding:UTF-8>
>> x.valid_encoding?
=> true
>> x.encode "Shift_JIS"
Traceback (most recent call last):
5: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `<main>'
4: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `load'
3: from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.7.0/gems/irb-1.2.0/exe/irb:11:in `<top (required)>'
2: from (irb):23
1: from (irb):23:in `encode'
Encoding::UndefinedConversionError (U+2160 from UTF-8 to Shift_JIS)
</code></pre>
<p>In this example, we have two characters: “四”, and the Roman numeral 1 (“Ⅰ”).
Unicode Roman numeral 1 cannot be converted to Shift JIS because there are <em>two</em> codepoints that represent that character in Shift JIS.
This means the conversion is ambiguous, so Ruby will raise an exception.</p>
<h3 id="how-to-fix-encodingundefinedconversionerror">How to fix <code>Encoding::UndefinedConversionError</code></h3>
<p>Our original string is correctly tagged as UTF-8, but we need to convert to Shift JIS.
In this case we’ll use a replacement character when converting to Shift JIS:</p>
<pre><code>>> x = "四\u2160"
>> y = x.encode("Shift_JIS", undef: :replace)
>> y
=> "\x{8E6C}?"
>> y.encoding
=> #<Encoding:Shift_JIS>
>> y.valid_encoding?
=> true
>> y.encode "UTF-8"
=> "四?"
</code></pre>
<p>We were able to convert to Shift JIS, but we did lose some data.</p>
<h2 id="argumenterror"><code>ArgumentError</code></h2>
<p>When a string contains invalid bytes, sometimes Ruby will raise an <code>ArgumentError</code> exception:</p>
<pre><code>>> x = "Hello \x93\xfa\x96\x7b"
>> x.downcase
Traceback (most recent call last):
5: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `<main>'
4: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `load'
3: from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.7.0/gems/irb-1.2.0/exe/irb:11:in `<top (required)>'
2: from (irb):34
1: from (irb):34:in `downcase'
ArgumentError (input string invalid)
>> x.gsub(/ello/, "i")
Traceback (most recent call last):
6: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `<main>'
5: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `load'
4: from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.7.0/gems/irb-1.2.0/exe/irb:11:in `<top (required)>'
3: from (irb):34
2: from (irb):35:in `rescue in irb_binding'
1: from (irb):35:in `gsub'
ArgumentError (invalid byte sequence in UTF-8)
</code></pre>
<p>Again we use our incorrectly tagged Shift JIS string.
Calling <code>downcase</code> or <code>gsub</code> both result in an <code>ArgumentError</code>.
I personally think these exceptions are not great.
We didn’t pass anything to <code>downcase</code>, so why is it an <code>ArgumentError</code>?
There is nothing wrong with the arguments we passed to <code>gsub</code>, so why is it an <code>ArgumentError</code>?
Why does one say “input string invalid” where the other gives us a slightly more helpful exception of “invalid byte sequence in UTF-8”?
I think these should both result in <code>Encoding::InvalidByteSequenceError</code> exceptions, as it’s a problem with the encoding, not the arguments.</p>
<p>Regardless, these errors both stem from the fact that the Shift JIS string is incorrectly tagged as UTF-8.</p>
<h3 id="fixing-argumenterror">Fixing <code>ArgumentError</code></h3>
<p>Fixing this issue is just like fixing <code>Encoding::InvalidByteSequenceError</code>.
We need to figure out the correct encoding of the source string, then tag the source string with that encoding.
If the encoding of the source string is truly unknown, scrub it.</p>
<pre><code>>> x = "Hello \x93\xfa\x96\x7b"
>> x.force_encoding "Shift_JIS"
=> "Hello \x{93FA}\x{967B}"
>> x.downcase
=> "hello \x{93FA}\x{967B}"
>> x.gsub(/ello/, "i")
=> "Hi \x{93FA}\x{967B}"
</code></pre>
<h2 id="encodingcompatibilityerror"><code>Encoding::CompatibilityError</code></h2>
<p>This exception occurs when we try to combine strings of two different encodings and those encodings are incompatible.
For example:</p>
<pre><code>>> x = "四\u2160"
>> y = "Hello \x93\xfa\x96\x7b".force_encoding("Shift_JIS")
>> [x.encoding, x.valid_encoding?]
=> [#<Encoding:UTF-8>, true]
>> [y.encoding, y.valid_encoding?]
=> [#<Encoding:Shift_JIS>, true]
>> x + y
Traceback (most recent call last):
5: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `<main>'
4: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `load'
3: from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.7.0/gems/irb-1.2.0/exe/irb:11:in `<top (required)>'
2: from (irb):50
1: from (irb):50:in `+'
Encoding::CompatibilityError (incompatible character encodings: UTF-8 and Shift_JIS)
</code></pre>
<p>In this example we have a valid UTF-8 string and a valid Shift JIS string.
However, these two encodings are not compatible, so we get an exception when combining.</p>
<h3 id="fixing-encodingcompatibilityerror">Fixing <code>Encoding::CompatibilityError</code></h3>
<p>To fix this exception, we need to manually convert one string to a new string that has a compatible encoding.
In the case above, we can choose whether we want the output string to be UTF-8 or Shift JIS, and then call <code>encode</code> on the appropriate string.</p>
<p>In the case we want UTF-8 output, we can do this:</p>
<pre><code>>> x = "四"
>> y = "Hello \x93\xfa\x96\x7b".force_encoding("Shift_JIS")
>> x + y.encode("UTF-8")
=> "四Hello 日本"
>> [_.encoding, _.valid_encoding?]
=> [#<Encoding:UTF-8>, true]
</code></pre>
<p>If we wanted Shift JIS, we could do this:</p>
<pre><code>>> x = "四"
>> y = "Hello \x93\xfa\x96\x7b".force_encoding("Shift_JIS")
>> x.encode("Shift_JIS") + y
=> "\x{8E6C}Hello \x{93FA}\x{967B}"
>> [_.encoding, _.valid_encoding?]
=> [#<Encoding:Shift_JIS>, true]
</code></pre>
<p>Another possible solution is to scrub bytes and concatenate, but again that results in data loss.</p>
<h3 id="what-is-a-compatible-encoding">What is a compatible encoding?</h3>
<p>If there are incompatible encodings, there must be compatible encodings too (at least I would think that).
Here is an example of compatible encodings:</p>
<pre><code>>> x = "Hello World!".force_encoding "US-ASCII"
>> [x.encoding, x.valid_encoding?]
=> [#<Encoding:US-ASCII>, true]
>> y = "こんにちは"
>> [y.encoding, y.valid_encoding?]
=> [#<Encoding:UTF-8>, true]
>> y + x
=> "こんにちはHello World!"
>> [_.encoding, _.valid_encoding?]
=> [#<Encoding:UTF-8>, true]
>> x + y
=> "Hello World!こんにちは"
>> [_.encoding, _.valid_encoding?]
=> [#<Encoding:UTF-8>, true]
</code></pre>
<p>The <code>x</code> string is encoded with “US ASCII” encoding and the <code>y</code> string UTF-8.
US ASCII is fully compatible with UTF-8, so even though these two strings have different encoding, concatenation works fine.</p>
<p>String literals may default to UTF-8, but some functions will return US ASCII encoded strings.
For example:</p>
<pre><code>>> require "digest/md5"
=> true
>> Digest::MD5.hexdigest("foo").encoding
=> #<Encoding:US-ASCII>
</code></pre>
<p>A hexdigest will only ever contain ASCII characters, so the implementation tags the returned string as US-ASCII.</p>
<h2 id="encoding-gotchas">Encoding Gotchas</h2>
<p>Let’s look at a couple encoding gotcha’s.</p>
<h3 id="infectious-invalid-encodings">Infectious Invalid Encodings</h3>
<p>When a string is incorrectly tagged, Ruby will typically only raise an exception when it needs to actually examine the bytes.
Here is an example:</p>
<pre><code>>> x = "Hello \x93\xfa\x96\x7b"
>> x.encoding
=> #<Encoding:UTF-8>
>> x.valid_encoding?
=> false
>> x + "ほげ"
=> "Hello \x93\xFA\x96{ほげ"
>> y = _
>> y
=> "Hello \x93\xFA\x96{ほげ"
>> [y.encoding, y.valid_encoding?]
=> [#<Encoding:UTF-8>, false]
</code></pre>
<p>Again we have the incorrectly tagged Shift JIS string.
We’re able to append a correctly tagged UTF-8 string and no exception is raised.
Why is that?
Ruby assumes that if both strings have the same encoding, there is no reason to validate the bytes in either string so it will just append them.
That means we can have an incorrectly tagged string “infect” what would otherwise be correctly tagged UTF-8 strings.
Say we have some code like this:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">def</span> <span class="function">append</span> string
string + <span class="string"><span class="delimiter">"</span><span class="content">ほげ</span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
p append(<span class="string"><span class="delimiter">"</span><span class="content">ほげ</span><span class="delimiter">"</span></span>).valid_encoding? <span class="comment"># => true</span>
p append(<span class="string"><span class="delimiter">"</span><span class="content">Hello </span><span class="char">\x93</span><span class="char">\xfa</span><span class="char">\x96</span><span class="char">\x7b</span><span class="delimiter">"</span></span>).valid_encoding? <span class="comment"># = false</span>
</pre></div>
</div>
</div>
<p>When debugging this code, we may be tempted to think the problem is in the <code>append</code> method.
But actually the issue is with <em>the caller</em>.
The caller is passing in incorrectly tagged strings, and unfortunately we might not get an exception until the return value of <code>append</code> is used somewhere far away.</p>
<h3 id="ascii-8bit-is-special">ASCII-8BIT is Special</h3>
<p>Sometimes ASCII-8BIT is considered to be a “compatible” encoding and sometimes it isn’t.
Here is an example:</p>
<pre><code>>> x = "\x93\xfa\x96\x7b".b
>> x.encoding
=> #<Encoding:ASCII-8BIT>
>> y = "ほげ"
>> y + x
Traceback (most recent call last):
5: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `<main>'
4: from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:23:in `load'
3: from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/gems/2.7.0/gems/irb-1.2.0/exe/irb:11:in `<top (required)>'
2: from (irb):89
1: from (irb):89:in `+'
Encoding::CompatibilityError (incompatible character encodings: UTF-8 and ASCII-8BIT)
</code></pre>
<p>Here we have a binary string stored in <code>x</code>.
Maybe it came from a JPEG file or something (it didn’t, I just typed it in!)
When we try to concatenate the binary string with the UTF-8 string, we get an exception.
But this may actually be an exception we want!
It doesn’t make sense to be concatenating some JPEG data with an actual string we want to view, so it’s <em>good</em> we got an exception here.</p>
<p>Now here is the same code, but with the contents of <code>x</code> changed somewhat:</p>
<pre><code>>> x = "Hello World".b
>> x.encoding
=> #<Encoding:ASCII-8BIT>
>> y = "ほげ"
>> y + x
=> "ほげHello World"
</code></pre>
<p>We have the same code with the same encodings at play.
The only thing that changed is the actual contents of the <code>x</code> string.</p>
<p>When Ruby concatenates ASCII-8BIT strings, it will examine the contents of that string.
If all bytes in the string are ASCII characters, it will treat it as a US-ASCII string and consider it to be “compatible”.
If the string contains non-ASCII characters, it will consider it to be incompatible.</p>
<p>This means that if you had read some data from your JPEG, and that data happened to all be ASCII characters, you would not get an exception even though maybe you really wanted one.</p>
<p>In my personal opinion, concatenating an ASCII-8BIT string with anything besides another ASCII-8BIT string should be an exception.</p>
<p>Anyway, this is all I feel like writing today. I hope you have a good day, and remember to check your encodings!</p>
http://tenderlovemaking.com/2019/10/12/my-career-goals.html
My Career Goals
2019-10-12T12:40:41-07:00
2019-10-12T12:40:41-07:00
<p>I was going to tweet about this, but then I thought I’d have to make a bunch of tweets, and writing a blurgh post just seemed easier.
Plus I don’t really have any puns in this post, so I can’t tweet it!</p>
<h2 id="my-career-goals">My Career Goals</h2>
<p>I think many people aren’t sure what they want to do in their career.
When I first started programming, I wasn’t sure what I wanted to do with my career.
But after years of experience, my career aspirations have become crystal clear.
I would like my job to be:</p>
<ul>
<li>Improving Ruby and Rails internals</li>
<li>Teaching people</li>
</ul>
<h2 id="improving-ruby-and-rails-internals">Improving Ruby and Rails internals</h2>
<p>I got my first job programming in 1999.
At that time, I didn’t know I wanted to be a programmer, it was just a way for me to pay for school.
It turned out that I was pretty good at programming, so I decided that would be my career.
To be honest, at that time I didn’t really love programming.
I just found that I was good at it, and I could make decent money.
In 2005 I found Ruby and Rails and that’s when I actually learned that I love programming.
I loved Ruby so much that I learned Japanese so I could read blog posts about Ruby.
14 years later, I can easily read those blog posts, but I don’t actually need them. Oops!</p>
<p>The reason I want to work on Ruby and Rails internals is that I want the language and framework to be performant, stable, easy to use.
I want Ruby and Rails to be a great choice for people to use in production.
I want others to experience the same joy I felt writing Ruby, and I want to make sure there are business that will employ those people.</p>
<h2 id="teaching-people">Teaching People</h2>
<p>I love to teach people things I know.
I also love learning new things.
As I hack on language and framework internals, I try to take that knowledge an disseminate it to as many people as I can.</p>
<p>Why?</p>
<p>First, I don’t think people can feel the joy of programming in Ruby/Rails unless they know how to actually program with Ruby/Rails.
So I’m happy to help new folks get in to the language and framework.</p>
<p>Second, I realize I’m not going to be around forever, and I want to make sure that these technologies will outlive me.
If these technologies are going to survive in to the future, people need to understand how they work.
Simply put: it’s an insurance policy for the future.</p>
<p>Third, it’s just fun.</p>
<h2 id="summary">Summary</h2>
<p>My dream job is to hack Ruby/Rails internals and teach people everything I know.
Doing it is fun for me, and it’s the best way I can use my skills to make a real impact on the world.</p>
<p>The End.</p>
http://tenderlovemaking.com/2019/09/03/esp8266-and-plantower-particle-sensor.html
ESP8266 and Plantower Particle Sensor
2019-09-03T08:20:26-07:00
2019-09-03T08:20:26-07:00
<p>Since forest fires have started to become a normal thing in the PNW, I’ve gotten interested in monitoring the air quality in and around my house.
I found some sensors that will measure PM2.5 which is a standard for measuring air quality.
The sensor I’m using is a <a href="https://www.aliexpress.com/item/32834164058.html">PMS5003</a>, and you can see the data sheet for it <a href="http://www.aqmd.gov/docs/default-source/aq-spec/resources-page/plantower-pms5003-manual_v2-3.pdf">here</a>.
I like this sensor because it supports UART, so I was able to hook it to an <a href="https://www.aliexpress.com/item/2035873939.html">FTDI</a> and read data directly from my computer.
I wanted to log the data, so I hooked it up to a Raspberry PI.
However, I decided I’d like to measure the air quality in my office, a second room in the house, and also outside.
Buying a Raspberry Pi for every sensor I purchase seems a little unreasonable, so I investigated a different solution.
I settled on the <a href="https://www.aliexpress.com/item/32279043338.html">ESP8266 E-01</a>.
This part can connect to wifi, knows how to speak UART, and is powerful enough to program directly.
My plan was to read data from the sensor, then broadcast the data via UDP and have a central Raspberry Pi collect the data and report on it.
Unfortunately, this plan has taken me many months to execute, so I’m going to write here the stuff I wish I had known when getting started.</p>
<h2 id="parts">Parts</h2>
<p>Here are the parts I used:</p>
<ul>
<li><a href="https://www.aliexpress.com/item/32279043338.html">ESP8266</a></li>
<li><a href="https://www.aliexpress.com/item/32834164058.html">Plantower PMS5003</a></li>
<li><a href="https://www.amazon.com/gp/product/B01G6HK3KW/">ESP8266 Breadboard Adapter</a></li>
<li><a href="https://www.amazon.com/gp/product/B07KF119YB/">ESP8266 Programmer</a></li>
</ul>
<h2 id="wiring">Wiring</h2>
<p>Basically I just hooked the TX / RX pins to the Plantower sensor and set the CHPD and RST pins to high.</p>
<h2 id="challenges-with-the-esp8266">Challenges with the ESP8266</h2>
<p>Now I’m basically going to complain about this chip, and then I’ll post the code I used.</p>
<p>The first issue I ran in to is that I’m not sure what to call this thing, so searching the internet became a challenge.
It seems that “ESP8266” refers to the chip, but E-01 refers to the package?
I’m still not actually sure.
It seems there are several boards that have an ESP8266 mounted on them, but searching for ESP8266 with E01 seemed to work.</p>
<p>The second issue is programming the chip.
I prefer to use C when developing for embedded systems, but no matter how hard I tried, I could not get the native toolchain running on MacOS.
Finally I gave up and just used the Arduino toolchain.
Somehow, you can write programs for the ESP8266 in Arduino, but doing it in C seems impossible (on Mac anyway).</p>
<p>Building a circuit to program the chip seems impossible.
I found some schematics online for building a programmer, but I couldn’t get anything to work.
Instead, I ended up buying a <a href="https://www.amazon.com/gp/product/B07KF119YB/">dedicated programmer</a>, and it seems to work well.</p>
<p>Power requirements are extremely finicky.
The chip wants 3.3v and at times 400mA.
If either of these criteria aren’t met, the chip won’t work.
Sometimes the chip wouldn’t do anything.
Sometimes it would start, but when it tried to do wifi it would just restart.
I ended up connecting a dedicated power supply to get the right power requirements.</p>
<p>The ESP8266 E-01 is not breadboard friendly.
I ended up buying some <a href="https://www.amazon.com/gp/product/B01G6HK3KW/">breadboard adapters</a> so I could prototype.</p>
<p>CHPD and RST need to be pulled HIGH for the chip to boot.
This got me for a long time.
I was able to program the chip with the programmer, but as soon as I moved it to the breadboard, nothing worked.
In order to get the chip to actually boot, both CHPD and RST need to be pulled high.</p>
<p>The air quality sensor is 5v.
This isn’t too much of a problem, just kind of annoying that I really really have to use two different voltages for this task.</p>
<h2 id="picture">Picture</h2>
<p>Here is a picture of the breadboard setup I have now:</p>
<p><img src="/images/plantower-sensor.jpg" alt="ESP8266 and Plantower on a Breadboard" height="400" /></p>
<p>The blue box on the right is the air quality sensor, in the middle on the breadboard is the ESP8266, and up top is the power supply.</p>
<h2 id="code">Code</h2>
<p>Here is the Arduino code I used:</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="preprocessor">#include</span> <span class="include"><ESP8266WiFi.h></span>
<span class="preprocessor">#include</span> <span class="include"><WiFiUdp.h></span>
<span class="preprocessor">#include</span> <span class="include"><ESP8266WiFiMulti.h></span>
<span class="preprocessor">#include</span> <span class="include"><base64.h></span>
<span class="preprocessor">#ifndef</span> STASSID
<span class="preprocessor">#define</span> STASSID <span class="string"><span class="delimiter">"</span><span class="content">WifiAPName</span><span class="delimiter">"</span></span>
<span class="preprocessor">#define</span> STAPSK <span class="string"><span class="delimiter">"</span><span class="content">WifiPassword</span><span class="delimiter">"</span></span>
<span class="preprocessor">#endif</span>
<span class="directive">const</span> <span class="predefined-type">char</span>* ssid = STASSID;
<span class="directive">const</span> <span class="predefined-type">char</span>* password = STAPSK;
ESP8266WiFiMulti WiFiMulti;
WiFiUDP udp;
IPAddress broadcastIp(<span class="integer">224</span>, <span class="integer">0</span>, <span class="integer">0</span>, <span class="integer">1</span>);
byte inputString[<span class="integer">32</span>];
<span class="predefined-type">int</span> i = <span class="integer">0</span>;
<span class="predefined-type">int</span> recordId = <span class="integer">0</span>;
<span class="directive">void</span> setup() {
Serial.begin(<span class="integer">9600</span>);
WiFi.mode(WIFI_STA);
WiFiMulti.addAP(ssid, password);
<span class="keyword">while</span> (WiFiMulti.run() != WL_CONNECTED) {
delay(<span class="integer">500</span>);
}
delay(<span class="integer">500</span>);
}
<span class="directive">void</span> loop() {
<span class="keyword">while</span> (Serial.available()) {
inputString[i] = Serial.read();
i++;
<span class="keyword">if</span> (i == <span class="integer">2</span>) { <span class="comment">// Check for start of packet</span>
<span class="keyword">if</span> (!(inputString[<span class="integer">0</span>] == <span class="hex">0x42</span> && inputString[<span class="integer">1</span>] == <span class="hex">0x4d</span>)) {
i = <span class="integer">0</span>;
}
}
<span class="keyword">if</span> (i == <span class="integer">32</span>) {
i = <span class="integer">0</span>;
String encoded = base64::encode(inputString, <span class="integer">32</span>);
udp.beginPacketMulticast(broadcastIp, <span class="integer">9000</span>, WiFi.localIP());
udp.print(<span class="string"><span class="delimiter">"</span><span class="content">[</span><span class="char">\"</span><span class="content">aq</span><span class="char">\"</span><span class="content">,{</span><span class="char">\"</span><span class="content">mac</span><span class="char">\"</span><span class="content">:</span><span class="char">\"</span><span class="delimiter">"</span></span>);
udp.print(WiFi.macAddress());
udp.print(<span class="string"><span class="delimiter">"</span><span class="char">\"</span><span class="content">,</span><span class="char">\"</span><span class="content">record_id</span><span class="char">\"</span><span class="content">:</span><span class="delimiter">"</span></span>);
udp.print(recordId);
udp.print(<span class="string"><span class="delimiter">"</span><span class="content">,</span><span class="char">\"</span><span class="content">record</span><span class="char">\"</span><span class="content">:</span><span class="char">\"</span><span class="delimiter">"</span></span>);
udp.print(encoded);
udp.print(<span class="string"><span class="delimiter">"</span><span class="char">\"</span><span class="content">}]</span><span class="delimiter">"</span></span>);
udp.endPacket();
recordId++;
}
}
}
</pre></div>
</div>
</div>
<p>I haven’t added CRC checking in this code, but it seems to work fine.
Basically it reads data from the AQ sensor, Base64 encodes the data, then broadcasts the info as JSON over UDP on my network.</p>
<p>Here is the client code:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">socket</span><span class="delimiter">"</span></span>
require <span class="string"><span class="delimiter">"</span><span class="content">ipaddr</span><span class="delimiter">"</span></span>
require <span class="string"><span class="delimiter">"</span><span class="content">json</span><span class="delimiter">"</span></span>
<span class="constant">MULTICAST_ADDR</span> = <span class="string"><span class="delimiter">"</span><span class="content">224.0.0.1</span><span class="delimiter">"</span></span>
<span class="constant">BIND_ADDR</span> = <span class="string"><span class="delimiter">"</span><span class="content">0.0.0.0</span><span class="delimiter">"</span></span>
<span class="constant">PORT</span> = <span class="integer">9000</span>
if_addr = <span class="constant">Socket</span>.getifaddrs.find { |s| s.addr.ipv4? && !s.addr.ipv4_loopback? }
p if_addr.addr.ip_address
socket = <span class="constant">UDPSocket</span>.new
membership = <span class="constant">IPAddr</span>.new(<span class="constant">MULTICAST_ADDR</span>).hton + <span class="constant">IPAddr</span>.new(<span class="constant">BIND_ADDR</span>).hton
socket.setsockopt(<span class="symbol">:IPPROTO_IP</span>, <span class="symbol">:IP_ADD_MEMBERSHIP</span>, membership)
socket.setsockopt(<span class="symbol">:IPPROTO_IP</span>, <span class="symbol">:IP_MULTICAST_TTL</span>, <span class="integer">1</span>)
socket.setsockopt(<span class="symbol">:SOL_SOCKET</span>, <span class="symbol">:SO_REUSEPORT</span>, <span class="integer">1</span>)
socket.bind(<span class="constant">BIND_ADDR</span>, <span class="constant">PORT</span>)
<span class="keyword">class</span> <span class="class">Sample</span> < <span class="constant">Struct</span>.new(<span class="symbol">:time</span>,
<span class="symbol">:pm1_0_standard</span>, <span class="symbol">:pm2_5_standard</span>, <span class="symbol">:pm10_standard</span>,
<span class="symbol">:pm1_0_env</span>, <span class="symbol">:pm2_5_env</span>,
<span class="symbol">:concentration_unit</span>,
<span class="comment"># These fields are "number of particles beyond N um</span>
<span class="comment"># per 0.1L of air". These numbers are multiplied by</span>
<span class="comment"># 10, so 03um == "number of particles beyond 0.3um</span>
<span class="comment"># in 0.1L of air"</span>
<span class="symbol">:particle_03um</span>, <span class="symbol">:particle_05um</span>, <span class="symbol">:particle_10um</span>,
<span class="symbol">:particle_25um</span>, <span class="symbol">:particle_50um</span>, <span class="symbol">:particle_100um</span>)
<span class="keyword">end</span>
loop <span class="keyword">do</span>
m, _ = socket.recvfrom(<span class="integer">2000</span>)
record = <span class="constant">JSON</span>.load(m)[<span class="integer">1</span>]
data = record[<span class="string"><span class="delimiter">"</span><span class="content">record</span><span class="delimiter">"</span></span>].unpack(<span class="string"><span class="delimiter">"</span><span class="content">m0</span><span class="delimiter">"</span></span>).first
unpack = data.unpack(<span class="string"><span class="delimiter">'</span><span class="content">CCnn14</span><span class="delimiter">'</span></span>)
crc = <span class="integer">0x42</span> + <span class="integer">0x4d</span> + <span class="integer">28</span> + data.bytes.drop(<span class="integer">4</span>).first(<span class="integer">26</span>).inject(<span class="symbol">:+</span>)
<span class="keyword">unless</span> crc != unpack.last
p <span class="constant">Sample</span>.new(<span class="constant">Time</span>.now.utc, *unpack.drop(<span class="integer">3</span>).first(<span class="integer">12</span>))
<span class="keyword">end</span>
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>This code just listens for incoming data and prints it out.</p>
<p>I’ve posted the code <a href="https://github.com/tenderlove/esp8266aq">here</a>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This is what I did over the long weekend!
Since the AQ sensor only uses the RX and TX pins on the ESP8266, it means I’ve got at least two more GPIO pins left.
Next I’ll add temperature and humidity sensor, then make something a bit more permanent.</p>
http://tenderlovemaking.com/2019/06/26/instance-variable-performance.html
Instance Variable Performance
2019-06-26T08:14:14-07:00
2019-06-26T08:14:14-07:00
<p>Let’s start today’s post with a weird Ruby benchmark:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">benchmark/ips</span><span class="delimiter">"</span></span>
<span class="keyword">class</span> <span class="class">Foo</span>
<span class="keyword">def</span> <span class="function">initialize</span> forward
forward ? go_forward : go_backward
<span class="keyword">end</span>
ivars = (<span class="string"><span class="delimiter">"</span><span class="content">a</span><span class="delimiter">"</span></span>..<span class="string"><span class="delimiter">"</span><span class="content">zz</span><span class="delimiter">"</span></span>).map { |name| <span class="string"><span class="delimiter">"</span><span class="content">@</span><span class="inline"><span class="inline-delimiter">#{</span>name<span class="inline-delimiter">}</span></span><span class="content"> = 5</span><span class="delimiter">"</span></span> }
<span class="comment"># define the go_forward method</span>
eval <span class="string"><span class="delimiter">"</span><span class="content">def go_forward; </span><span class="inline"><span class="inline-delimiter">#{</span>ivars.join(<span class="string"><span class="delimiter">"</span><span class="content">; </span><span class="delimiter">"</span></span>)<span class="inline-delimiter">}</span></span><span class="content"> end</span><span class="delimiter">"</span></span>
<span class="comment"># define the go_backward method</span>
eval <span class="string"><span class="delimiter">"</span><span class="content">def go_backward; </span><span class="inline"><span class="inline-delimiter">#{</span>ivars.reverse.join(<span class="string"><span class="delimiter">"</span><span class="content">; </span><span class="delimiter">"</span></span>)<span class="inline-delimiter">}</span></span><span class="content"> end</span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
<span class="comment"># Heat</span>
<span class="constant">Foo</span>.new <span class="predefined-constant">true</span>
<span class="constant">Foo</span>.new <span class="predefined-constant">false</span>
<span class="constant">Benchmark</span>.ips <span class="keyword">do</span> |x|
x.report(<span class="string"><span class="delimiter">"</span><span class="content">backward</span><span class="delimiter">"</span></span>) { <span class="integer">5000</span>.times { <span class="constant">Foo</span>.new <span class="predefined-constant">false</span> } }
x.report(<span class="string"><span class="delimiter">"</span><span class="content">forward</span><span class="delimiter">"</span></span>) { <span class="integer">5000</span>.times { <span class="constant">Foo</span>.new <span class="predefined-constant">true</span> } }
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>This code defines a class that sets a bunch of instance variables, but the order
that the instance variables are set depends on the parameter passed in to the
constructor. When we pass <code>true</code>, it defines instance variables “a” through
“zz”, and when we pass <code>false</code> it defines them “zz” through “a”.</p>
<p>Here’s the result of the benchmark on my machine:</p>
<pre><code>$ ruby weird_bench.rb
Warming up --------------------------------------
backward 3.000 i/100ms
forward 2.000 i/100ms
Calculating -------------------------------------
backward 38.491 (±10.4%) i/s - 192.000 in 5.042515s
forward 23.038 (± 8.7%) i/s - 114.000 in 5.004367s
</code></pre>
<p>For some reason, defining the instance variables backwards is faster than
defining the instance variables forwards. In this post we’ll discuss why. But
for now, just know that if you want performant code, always define your instance
variables backwards (just kidding, don’t do that).</p>
<h2 id="how-are-instance-variables-stored">How Are Instance Variables Stored?</h2>
<p>In Ruby (specifically MRI), object instances point at an array, and instance
variables are stored in that array. Of course, we refer to instance variables
by names, not by array indexes, so Ruby keeps a map of “names to indexes” which
is stored <em>on the class</em> of the object.</p>
<p>Let’s say we have some code like this:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">class</span> <span class="class">Foo</span>
<span class="keyword">def</span> <span class="function">initialize</span>
<span class="instance-variable">@a</span> = <span class="string"><span class="delimiter">"</span><span class="content">foo</span><span class="delimiter">"</span></span>
<span class="instance-variable">@b</span> = <span class="string"><span class="delimiter">"</span><span class="content">bar</span><span class="delimiter">"</span></span>
<span class="instance-variable">@c</span> = <span class="string"><span class="delimiter">"</span><span class="content">baz</span><span class="delimiter">"</span></span>
<span class="instance-variable">@d</span> = <span class="string"><span class="delimiter">"</span><span class="content">hoge</span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="constant">Foo</span>.new
</pre></div>
</div>
</div>
<p>Internally, the object relationship will look something like this:</p>
<p><img src="/images/ivar_rel.png" alt="Instance Variable Relationship" height="400" /></p>
<p>The class points at a map of “names to indexes” called the “IV Index Table”.
The IV Index Table contains the names of the instance variables along with the
index of where to find that instance variable.</p>
<p>The instance points at the class, and also points at an array that contains the
actual values of the instance variables.</p>
<p>Why go to all this trouble to map instance variable names to array offsets? The
reason is that it is much faster to access an array element than look up
something from a hash. We do have to do a hash lookup to find the array
element, but instance variables have their own <a href="/2015/12/23/inline-caching-in-mri.html">inline cache</a>,
so the lookup doesn’t occur very often.</p>
<h2 id="setting-instance-variables-in-slow-motion">Setting Instance Variables in Slow Motion</h2>
<p>I want to walk through exactly what happens when instance variables are set, but
we’re going to do it twice. We’ll use the code below:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">class</span> <span class="class">Foo</span>
<span class="keyword">def</span> <span class="function">initialize</span>
<span class="instance-variable">@a</span> = <span class="string"><span class="delimiter">"</span><span class="content">foo</span><span class="delimiter">"</span></span>
<span class="instance-variable">@b</span> = <span class="string"><span class="delimiter">"</span><span class="content">bar</span><span class="delimiter">"</span></span>
<span class="instance-variable">@c</span> = <span class="string"><span class="delimiter">"</span><span class="content">baz</span><span class="delimiter">"</span></span>
<span class="instance-variable">@d</span> = <span class="string"><span class="delimiter">"</span><span class="content">hoge</span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="constant">Foo</span>.new
<span class="constant">Foo</span>.new
</pre></div>
</div>
</div>
<p>Ruby creates the instance variable index table lazily, so it doesn’t actually
exist until the first time the code executes. The following GIF shows the
execution flow for the first time <code>Foo.new</code> is called:</p>
<p><img src="/images/ivar_first_time.gif" alt="Ivar Execution" /></p>
<p>The first time <code>initialize</code> is executed, the <code>Foo</code> class doesn’t have an
instance variable index table associated with it, so when the first instance
variable <code>@a</code> is set, we create a new index table, then set <code>@a</code> to be index 0,
then set the value “foo” in the instance variable array at index 0.</p>
<p>When we see instance variable <code>@b</code>, it doesn’t have an entry in the index table,
so we add a new entry that points to position 1, then set position 1 in the
array to “bar”.</p>
<p>This process repeats for each of the instance variables in the method.</p>
<p>Now lets look at what happens the second time we call <code>Foo.new</code>:</p>
<p><img src="/images/ivar_second_time.gif" alt="Ivar Execution Second Time" /></p>
<p>This time, the class already has an instance variable index table associated
with it. When the instance variable <code>@a</code> is set, it exists in the index table
with position 0, so we set “foo” to position 0 in the instance variable list.</p>
<p>When we see instance variable <code>@b</code>, it already has an entry in the index table
with position 1, so we set “bar” to position 1 in the instance variable list.</p>
<p>This process repeats for each of the variables in the method.</p>
<p>We can actually observe the lazy creation of the index table by using
<code>ObjectSpace.memsize_of</code>:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">objspace</span><span class="delimiter">"</span></span>
<span class="keyword">class</span> <span class="class">Foo</span>
<span class="keyword">def</span> <span class="function">initialize</span>
<span class="instance-variable">@a</span> = <span class="string"><span class="delimiter">"</span><span class="content">foo</span><span class="delimiter">"</span></span>
<span class="instance-variable">@b</span> = <span class="string"><span class="delimiter">"</span><span class="content">bar</span><span class="delimiter">"</span></span>
<span class="instance-variable">@c</span> = <span class="string"><span class="delimiter">"</span><span class="content">baz</span><span class="delimiter">"</span></span>
<span class="instance-variable">@d</span> = <span class="string"><span class="delimiter">"</span><span class="content">hoge</span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
<span class="keyword">end</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="constant">Foo</span>) <span class="comment"># => 520</span>
<span class="constant">Foo</span>.new
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="constant">Foo</span>) <span class="comment"># => 672</span>
<span class="constant">Foo</span>.new
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="constant">Foo</span>) <span class="comment"># => 672</span>
</pre></div>
</div>
</div>
<p>The size of <code>Foo</code> is smaller before we instantiate our first instance, but
remains the same size after subsequent allocations. Neat!</p>
<p>Lets do one more example, but with the following code:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">class</span> <span class="class">Foo</span>
<span class="keyword">def</span> <span class="function">initialize</span> init_all
<span class="keyword">if</span> init_all
<span class="instance-variable">@a</span> = <span class="string"><span class="delimiter">"</span><span class="content">foo</span><span class="delimiter">"</span></span>
<span class="instance-variable">@b</span> = <span class="string"><span class="delimiter">"</span><span class="content">bar</span><span class="delimiter">"</span></span>
<span class="instance-variable">@c</span> = <span class="string"><span class="delimiter">"</span><span class="content">baz</span><span class="delimiter">"</span></span>
<span class="instance-variable">@d</span> = <span class="string"><span class="delimiter">"</span><span class="content">hoge</span><span class="delimiter">"</span></span>
<span class="keyword">else</span>
<span class="instance-variable">@c</span> = <span class="string"><span class="delimiter">"</span><span class="content">baz</span><span class="delimiter">"</span></span>
<span class="instance-variable">@d</span> = <span class="string"><span class="delimiter">"</span><span class="content">hoge</span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="constant">Foo</span>.new <span class="predefined-constant">true</span>
<span class="constant">Foo</span>.new <span class="predefined-constant">false</span>
</pre></div>
</div>
</div>
<p>After the first call of <code>Foo.new true</code>, the <code>Foo</code> class will have an instance
variable index table just like the previous examples. <code>@a</code> will be associated
with position 0, <code>@b</code> with position 1, and so on. But what happens on the
second allocation at <code>Foo.new false</code>?</p>
<p><img src="/images/ivar_third_time.gif" alt="Ivar Execution Third Time" /></p>
<p>In this case, we already have an index table associated with the class, but <code>@c</code>
is associated with position 2 in the instance variable array, so we have to
expand the array leaving position 0 and 1 unset (internally Ruby sets them to
<code>Qundef</code>). Then <code>@d</code> is associated with position 3, and it is set as usual.</p>
<p>The important part about this is that instance variable lists must expand to the
width required for the index offset. Now lets talk about how the list expands.</p>
<h2 id="instance-variable-list-allocation-and-expansion">Instance Variable List Allocation and Expansion</h2>
<p>We saw how the instance variable index table is created. Now I want to spend
some time focusing on the instance variable list. This list is associated with
the instance and stores references to our actual instance variable values.</p>
<p>This list is lazily allocated and expands as it needs to accommodate more
values. <a href="https://github.com/ruby/ruby/blob/24c4e6dec109e105c13bd4c1b7f7cd51e534a3c3/variable.c#L947-L957">Here is the code</a> that figures out by how much the array should grow.</p>
<p>I’ve translated that function to Ruby code and added a few more comments:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">def</span> <span class="function">iv_index_tbl_newsize</span>(ivup)
index = ivup.index
newsize = (index + <span class="integer">1</span>) + (index + <span class="integer">1</span>)/<span class="integer">4</span> <span class="comment"># (index + 1) * 1.25</span>
<span class="comment"># if the index table *wasn't* extended, then clamp the newsize down to</span>
<span class="comment"># the size of the index table. Otherwise, use a size 25% larger than</span>
<span class="comment"># the requested index</span>
<span class="keyword">if</span> !ivup.iv_extended && ivup.index_table.size < newsize
ivup.index_table.size
<span class="keyword">else</span>
newsize
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="constant">IVarUpdate</span> = <span class="constant">Struct</span>.new(<span class="symbol">:index</span>, <span class="symbol">:iv_extended</span>, <span class="symbol">:index_table</span>)
index_table = { <span class="key">a</span>: <span class="integer">0</span>, <span class="key">b</span>: <span class="integer">1</span>, <span class="key">c</span>: <span class="integer">2</span>, <span class="key">d</span>: <span class="integer">3</span> } <span class="comment"># table from our examples</span>
<span class="comment"># We're setting `@c`, which has an index of 2. `false` means we didn't mutate</span>
<span class="comment"># the index table.</span>
p iv_index_tbl_newsize(<span class="constant">IVarUpdate</span>.new(index_table[<span class="symbol">:c</span>], <span class="predefined-constant">false</span>, index_table))
</pre></div>
</div>
</div>
<p>The return value of <code>iv_index_tbl_newsize</code> is used to determine how much memory
we need for the instance variable array. As you can see, its return value is
based on the index of the instance variable, and we got that index from the
index table.</p>
<p>If the index table was mutated, then we’ll allow the instance variable list to
grow without bounds. But if the index table was <em>not</em> mutated, then we clamp
the array size to the size of the index table.</p>
<p>This means that the first time we allocate a particular Ruby object, it can be
<em>larger</em> than subsequent allocations. Again, we can use
<code>ObjectSpace.memsize_of</code> to observe this behavior:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">objspace</span><span class="delimiter">"</span></span>
<span class="keyword">class</span> <span class="class">Foo</span>
<span class="keyword">def</span> <span class="function">initialize</span>
<span class="instance-variable">@a</span> = <span class="string"><span class="delimiter">"</span><span class="content">foo</span><span class="delimiter">"</span></span>
<span class="instance-variable">@b</span> = <span class="string"><span class="delimiter">"</span><span class="content">bar</span><span class="delimiter">"</span></span>
<span class="instance-variable">@c</span> = <span class="string"><span class="delimiter">"</span><span class="content">baz</span><span class="delimiter">"</span></span>
<span class="instance-variable">@d</span> = <span class="string"><span class="delimiter">"</span><span class="content">hoge</span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
<span class="keyword">end</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="constant">Foo</span>.new) <span class="comment"># => 80</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="constant">Foo</span>.new) <span class="comment"># => 72</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="constant">Foo</span>.new) <span class="comment"># => 72</span>
</pre></div>
</div>
</div>
<p>The first allocation is larger because it’s the first time we’ve “seen” these
instance variables. The subsequent allocations are smaller because Ruby clamps
the instance variable array size.</p>
<h3 id="watching-the-instance-variable-array-grow">Watching the Instance Variable Array Grow</h3>
<p>Let’s do one more experiment before we get on to why the initial benchmark behaves
the way it does. Here we’re going to watch the size of the object grow as we
add instance variables (again, using <code>ObjectSpace.memsize_of</code>):</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">objspace</span><span class="delimiter">"</span></span>
<span class="keyword">class</span> <span class="class">Foo</span>
<span class="keyword">def</span> <span class="function">initialize</span>
<span class="instance-variable">@a</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@b</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@c</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@d</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@e</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@f</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@g</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@h</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="keyword">end</span>
<span class="keyword">end</span>
puts <span class="string"><span class="delimiter">"</span><span class="content">First</span><span class="delimiter">"</span></span>
<span class="constant">Foo</span>.new
puts <span class="string"><span class="delimiter">"</span><span class="content">Second</span><span class="delimiter">"</span></span>
<span class="constant">Foo</span>.new
</pre></div>
</div>
</div>
<p>Here’s the output from the program:</p>
<pre><code>$ ruby ~/thing.rb
First
40
40
40
80
80
96
96
120
Second
40
40
40
80
80
96
96
104
</code></pre>
<p>You can see that as we add instance variables to the object, the object gets
bigger! Let’s make one change to the benchmark and run it again. This time
we’ll add an option that lets us define the “last” instance variable first:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">"</span><span class="content">objspace</span><span class="delimiter">"</span></span>
<span class="keyword">class</span> <span class="class">Foo</span>
<span class="keyword">def</span> <span class="function">initialize</span> eager_h
<span class="keyword">if</span> eager_h
<span class="instance-variable">@h</span> = <span class="integer">1</span>
<span class="keyword">end</span>
<span class="instance-variable">@a</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@b</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@c</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@d</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@e</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@f</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@g</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="instance-variable">@h</span> = <span class="integer">1</span>
p <span class="constant">ObjectSpace</span>.memsize_of(<span class="predefined-constant">self</span>)
<span class="keyword">end</span>
<span class="keyword">end</span>
puts <span class="string"><span class="delimiter">"</span><span class="content">First</span><span class="delimiter">"</span></span>
<span class="constant">Foo</span>.new <span class="predefined-constant">false</span>
puts <span class="string"><span class="delimiter">"</span><span class="content">Second</span><span class="delimiter">"</span></span>
<span class="constant">Foo</span>.new <span class="predefined-constant">true</span>
</pre></div>
</div>
</div>
<p>Here’s the output:</p>
<pre><code>$ ruby ~/thing.rb
First
40
40
40
80
80
96
96
120
Second
104
104
104
104
104
104
104
104
</code></pre>
<p>On the first allocation, we can observe the size of the object gradually expand
as usual. However, on the second allocation, we ask it to eagerly set <code>@h</code> and
the growth pattern is totally different. In fact, it doesn’t grow at all!</p>
<p>Since <code>@h</code> is last in our index table, Ruby immediately expands the array list
in order to set the value for the <code>@h</code> slot. Since the instance variable array
is now at maximum capacity, none of the subsequent instance variable sets need
the array to expand.</p>
<h2 id="back-to-our-initial-benchmark">Back To Our Initial Benchmark</h2>
<p>Every time Ruby needs to expand the instance variable array, it requires calling
<code>realloc</code> in order to expand that chunk of memory. We can observe calls to
<code>realloc</code> using <code>dtrace</code>.</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">class</span> <span class="class">Foo</span>
<span class="keyword">def</span> <span class="function">initialize</span> forward
forward ? go_forward : go_backward
<span class="keyword">end</span>
ivars = (<span class="string"><span class="delimiter">"</span><span class="content">a</span><span class="delimiter">"</span></span>..<span class="string"><span class="delimiter">"</span><span class="content">zz</span><span class="delimiter">"</span></span>).map { |name| <span class="string"><span class="delimiter">"</span><span class="content">@</span><span class="inline"><span class="inline-delimiter">#{</span>name<span class="inline-delimiter">}</span></span><span class="content"> = 5</span><span class="delimiter">"</span></span> }
<span class="comment"># define the go_forward method</span>
eval <span class="string"><span class="delimiter">"</span><span class="content">def go_forward; </span><span class="inline"><span class="inline-delimiter">#{</span>ivars.join(<span class="string"><span class="delimiter">"</span><span class="content">; </span><span class="delimiter">"</span></span>)<span class="inline-delimiter">}</span></span><span class="content"> end</span><span class="delimiter">"</span></span>
<span class="comment"># define the go_backward method</span>
eval <span class="string"><span class="delimiter">"</span><span class="content">def go_backward; </span><span class="inline"><span class="inline-delimiter">#{</span>ivars.reverse.join(<span class="string"><span class="delimiter">"</span><span class="content">; </span><span class="delimiter">"</span></span>)<span class="inline-delimiter">}</span></span><span class="content"> end</span><span class="delimiter">"</span></span>
<span class="keyword">end</span>
<span class="comment"># Heat</span>
<span class="constant">Foo</span>.new <span class="predefined-constant">true</span>
<span class="keyword">if</span> <span class="predefined-constant">ARGV</span>[<span class="integer">0</span>]
<span class="integer">1000</span>.times { <span class="constant">Foo</span>.new <span class="predefined-constant">false</span> }
<span class="keyword">else</span>
<span class="integer">1000</span>.times { <span class="constant">Foo</span>.new <span class="predefined-constant">true</span> }
<span class="keyword">end</span>
</pre></div>
</div>
</div>
<p>Here I’ve rewritten the benchmark so that we can control the direction via an
environment variable. Let’s use <code>dtrace</code> to measure the number of calls to
<code>realloc</code> in both situations.</p>
<p>This case is always going forward:</p>
<pre><code>$ sudo dtrace -q -n 'pid$target::realloc:entry { @ = count(); }' -c "/Users/aaron/.rbenv/versions/ruby-trunk/bin/ruby thing.rb"
dtrace: system integrity protection is on, some features will not be available
8369
</code></pre>
<p>This case is forward once, then reverse the rest of the time:</p>
<pre><code>$ sudo dtrace -q -n 'pid$target::realloc:entry { @ = count(); }' -c "/Users/aaron/.rbenv/versions/ruby-trunk/bin/ruby thing.rb reverse"
dtrace: system integrity protection is on, some features will not be available
4369
</code></pre>
<p>We can see that “starting from the end” decreases the number of calls to
<code>realloc</code> significantly. These increased calls to <code>realloc</code> are why it’s faster
to define our instance variables forward once, then backward the rest of the
time!</p>
<p>I hope this was an interesting article. Please have a good day!</p>
http://tenderlovemaking.com/2018/02/12/speeding-up-ruby-with-shared-strings.html
Speeding up Ruby with Shared Strings
2018-02-12T10:00:00-08:00
2018-02-12T10:00:00-08:00
<p>It’s not often I am able to write a patch that not only reduces memory usage,
but increases speed as well. Usually I find myself trading memory for speed, so
it’s a real treat when I can improve both in one patch. Today I want to talk
about the patch I submitted to Ruby in <a href="https://bugs.ruby-lang.org/issues/14460">this ticket</a>.
It decreases “after boot” memory usage of a Rails application by 4% and speeds
up <code>require</code> by about 35%.</p>
<p>When I was writing this patch, I was actually focusing on trying to reduce
memory usage. It just happens that reducing memory usage also resulted in
faster runtime. So really I wanted to title this post “Reducing Memory Usage in
Ruby”, but <a href="/2018/01/23/reducing-memory-usage-in-ruby.html">I already made a post with that title</a>.</p>
<h2 id="shared-string-optimization">Shared String Optimization</h2>
<p>As I mentioned in previous posts, Ruby objects are limited to 40
bytes. But a string can be much longer than 40 bytes, so how are they
stored? If we look at <a href="https://github.com/ruby/ruby/blob/b16eaf86324b000c4c349e072e15b97dde701e48/include/ruby/ruby.h#L956-L969">the struct that represents strings</a>, we’ll find there is a <code>char *</code> pointer:</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="keyword">struct</span> RString {
<span class="keyword">struct</span> RBasic basic;
<span class="keyword">union</span> {
<span class="keyword">struct</span> {
<span class="predefined-type">long</span> len;
<span class="predefined-type">char</span> *ptr;
<span class="keyword">union</span> {
<span class="predefined-type">long</span> capa;
VALUE shared;
} aux;
} heap;
<span class="predefined-type">char</span> ary[RSTRING_EMBED_LEN_MAX + <span class="integer">1</span>];
} as;
};
</pre></div>
</div>
</div>
<p>The <code>ptr</code> field in the string struct points to a byte array which is our string.
So the actual memory usage of a string is approximately 40 bytes for the object,
plus however long the string is. If we were to visualize the layout, it would
look something like this:</p>
<p><img src="/images/string_layout.png" alt="RString pointing to char array" /></p>
<p>In this case, there are really two allocations: the <code>RString</code> object and the
“hello world” character array. The <code>RString</code> object is the 40 byte Ruby object
allocated using the GC, and the character array was allocated using the system’s
<code>malloc</code> implementation.</p>
<p>Side note: There is another optimization called “embedding”. Without getting
too far off track, “embedding” is just keeping strings that are “small enough”
stored directly inside the <code>RString</code> structure. We can talk about that in a
different post, but today pretend there are always two distinct allocations.</p>
<p>We can take advantage of this character array and represent substrings by just
pointing at a different location. For example, we can have two Ruby objects,
one representing the string “hello world” and the other representing the string
“world” and only allocate one character array buffer:</p>
<p><img src="/images/string_layout_shared.png" alt="RStrings sharing a char array" /></p>
<p>This example only has 3 allocations: 2 from the GC for the Ruby string objects,
and one <code>malloc</code> for the character array. Using <code>ObjectSpace</code>, we can actually
observe this optimization by measuring memory size of the objects after slicing
them:</p>
<pre><code>>> require 'objspace'
=> true
>> str = "x" * 9000; nil
=> nil
>> ObjectSpace.memsize_of str
=> 9041
>> substr = str[30, str.length - 30]; nil
=> nil
>> str.length
=> 9000
>> substr.length
=> 8970
>> ObjectSpace.memsize_of substr
=> 40
</code></pre>
<p>The example above first allocates a string that is 9000 characters. Next we
measure the memory size of the string. The total size is 9000 for the
characters, plus some overhead for the Ruby object for a total of 9041. Next we
take a substring, slicing off the first 30 characters of the original. As
expected, the original string is 9000 characters, and the substring is 8970.
However, if we measure the size of the substring it is only 40 bytes! This is
because the new string only requires a new Ruby object to be allocated, and the
new object just points at a different location in the original string’s
character buffer, just like the graph above showed.</p>
<p>This optimization isn’t limited to just strings, we can use it with arrays too:</p>
<pre><code>>> list = ["x"] * 9000; nil
=> nil
>> ObjectSpace.memsize_of(list)
=> 72040
>> list2 = list[30, list.length - 30]; nil
=> nil
>> ObjectSpace.memsize_of(list2)
=> 40
</code></pre>
<p>In fact, functional languages where data structures are immutable can take great
advantage of this optimization. In languages that allow mutations, we have to
deal with the case that the original string might be mutated, where languages
with immutable data structures can be even more aggressive about optimization.</p>
<h2 id="limits-of-the-shared-string-optimization">Limits of the Shared String Optimization</h2>
<p>This shared string optimization isn’t without limits though. To take advantage
of this optimization, we have to always <em>go to the end of the string</em>. In other
words, we can’t take a slice from the middle of the string and get the
optimization. Lets take our sample string and slice 15 characters off each side
and see what the memsize is:</p>
<pre><code>>> str = "x" * 9000; nil
=> nil
>> str.length
=> 9000
>> substr = str[15, str.length - 30]; nil
=> nil
>> substr.length
=> 8970
>> ObjectSpace.memsize_of(substr)
=> 9011
</code></pre>
<p>We can see in the above example that the memsize of the substring is much larger
than in the first example. That is because Ruby had to create a new buffer to
store the substring. So our lesson here is: if you have to slice strings, start
from the left and go all the way to the end.</p>
<p>Here is an interesting thing to think about. At the end of the following
program, what is the memsize of <code>substr</code>? How much memory is this program
actually consuming? Is the <code>str</code> object still alive, and how can we find out?</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre>require <span class="string"><span class="delimiter">'</span><span class="content">objspace</span><span class="delimiter">'</span></span>
str = <span class="string"><span class="delimiter">"</span><span class="content">x</span><span class="delimiter">"</span></span> * <span class="integer">9000</span>
substr = str[<span class="integer">30</span>, str.length - <span class="integer">30</span>]
str = <span class="predefined-constant">nil</span>
<span class="constant">GC</span>.start
<span class="comment"># What is the memsize of substr?</span>
<span class="comment"># How much memory is this program actually consuming?</span>
<span class="comment"># Is `str` still alive even though we did a GC?</span>
<span class="comment"># Hint: use `ObjectSpace.dump_all`</span>
<span class="comment"># (if you try this out, I recommend running the program with `--disable-gems`)</span>
</pre></div>
</div>
</div>
<p>The optimization I explained above works exactly the same way for strings in C
as it does in Ruby. We will use this optimization to reduce memory usage and
speed up <code>require</code> in Ruby.</p>
<h2 id="reducing-memory-usage-and-speeding-up-require">Reducing Memory Usage and Speeding Up <code>require</code></h2>
<p>I’ve already described the technique we’re going to use to speed up <code>require</code>,
so lets take a look at the problem. After that, we’ll apply the shared string
optimization to improve performance of <code>require</code>.</p>
<p>Every time a program requires a file, Ruby has to check to see if that file has
already been required. The global variable <code>$LOADED_FEATURES</code> is a list of all
the files that have been required so far. Of course, searching through a list
for a file would be quite slow and get slower as the list grows, so Ruby keeps a
hash to look up entries in the <code>$LOADED_FEATURES</code> list. This hash is called the
<code>loaded_features_index</code>, and it’s stored on the virtual machine structure
<a href="https://github.com/ruby/ruby/blob/b16eaf86324b000c4c349e072e15b97dde701e48/vm_core.h#L563">here</a>.</p>
<p>The keys of this hash are strings that could be passed to <code>require</code> to require a
particular file, and the value is the index in the <code>$LOADED_FEATURES</code> array of
the file that actually got required. So, for example if you have a file on your
system: <code>/a/b/c.rb</code>, the keys to the hash will be:</p>
<ul>
<li>“/a/b/c.rb”</li>
<li>“a/b/c.rb”</li>
<li>“b/c.rb”</li>
<li>“c.rb”</li>
<li>“/a/b/c”</li>
<li>“a/b/c”</li>
<li>“b/c”</li>
<li>“c”</li>
</ul>
<p>Given a well crafted load path, any of the strings above <em>could</em> be used to load
the <code>/a/b/c.rb</code> file, so the index needs to keep all of them. For example, you
could do <code>ruby -I / -e"require 'a/b/c'"</code>, or <code>ruby -I /a -e"require 'b/c'"'</code>,
etc, and they all point to the same file.</p>
<p>The <code>loaded_features_index</code> hash is built in the <a href="https://github.com/ruby/ruby/blob/b16eaf86324b000c4c349e072e15b97dde701e48/load.c#L215-L255"><code>features_index_add</code>
function</a>.
Lets pick apart this function a little.</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="directive">static</span> <span class="directive">void</span>
features_index_add(VALUE feature, VALUE offset)
{
VALUE short_feature;
<span class="directive">const</span> <span class="predefined-type">char</span> *feature_str, *feature_end, *ext, *p;
feature_str = StringValuePtr(feature);
feature_end = feature_str + RSTRING_LEN(feature);
<span class="keyword">for</span> (ext = feature_end; ext > feature_str; ext--)
<span class="keyword">if</span> (*ext == <span class="char">'.'</span> || *ext == <span class="char">'/'</span>)
<span class="keyword">break</span>;
<span class="keyword">if</span> (*ext != <span class="char">'.'</span>)
ext = <span class="predefined-constant">NULL</span>;
<span class="comment">/* Now `ext` points to the only string matching %r{^\.[^./]*$} that is
at the end of `feature`, or is NULL if there is no such string. */</span>
</pre></div>
</div>
</div>
<p>This function takes a <code>feature</code> and an <code>offset</code> as parameters. The <code>feature</code> is
the full name of the file that was required, extension and everything. <code>offset</code>
is the index in the loaded features list where this string is. The first part
of this function starts at the end of the string and scans backwards looking for
a period or a forward slash. If it finds a period, we know the file has an
extension (it is possible to require a Ruby file without an extension!), if it
finds a forward slash, it gives up and assumes there is no extension.</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> <span class="keyword">while</span> (<span class="integer">1</span>) {
<span class="predefined-type">long</span> beg;
p--;
<span class="keyword">while</span> (p >= feature_str && *p != <span class="char">'/'</span>)
p--;
<span class="keyword">if</span> (p < feature_str)
<span class="keyword">break</span>;
<span class="comment">/* Now *p == '/'. We reach this point for every '/' in `feature`. */</span>
beg = p + <span class="integer">1</span> - feature_str;
short_feature = rb_str_subseq(feature, beg, feature_end - p - <span class="integer">1</span>);
features_index_add_single(short_feature, offset);
<span class="keyword">if</span> (ext) {
short_feature = rb_str_subseq(feature, beg, ext - p - <span class="integer">1</span>);
features_index_add_single(short_feature, offset);
}
}
</pre></div>
</div>
</div>
<p>Next we scan backwards in the string looking for forward slashes. Every time
it finds a forward slash, it uses <code>rb_str_subseq</code> to get a substring and then
calls <code>features_index_add_single</code> to register that substring. <code>rb_str_subseq</code>
gets substrings in the same way we were doing above in Ruby, and applies the
same optimizations.</p>
<p>The <code>if (ext)</code> conditional deals with files that have an extension, and this is
really where our problems begin. This conditional gets a substring of
<code>feature</code>, but <em>it doesn’t go all the way to the end of the string</em>. It must
exclude the file extension. This means it will <strong>copy</strong> the underlying string.
So these two calls to <code>rb_str_subseq</code> do 3 allocations total: 2 Ruby objects
(the function returns a Ruby object) and one malloc to copy the string for the
“no extension substring” case.</p>
<p>This function calls <code>features_index_add_single</code> to add the substring to the
index. I want to call out one excerpt from <a href="https://github.com/ruby/ruby/blob/b16eaf86324b000c4c349e072e15b97dde701e48/load.c#L175-L205">the <code>features_index_add_single</code>
function</a>:</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> features_index = get_loaded_features_index_raw();
st_lookup(features_index, (st_data_t)short_feature_cstr, (st_data_t *)&this_feature_index);
<span class="keyword">if</span> (NIL_P(this_feature_index)) {
st_insert(features_index, (st_data_t)ruby_strdup(short_feature_cstr), (st_data_t)offset);
}
</pre></div>
</div>
</div>
<p>This code looks up the string in the index, and if the string isn’t in the
index, it adds it to the index. The caller allocated a new Ruby
string, and that string could get garbage collected, so this function calls
<code>ruby_strdup</code> to copy the string for the hash key. It’s important to note that the
keys to this hash <strong>aren’t</strong> Ruby objects, but <code>char *</code> pointers that came from
Ruby objects (the <code>char *ptr</code> field that we were looking at earlier).</p>
<p>Lets count the allocations. So far, we have 2 Ruby objects: one with a file
extension and one without, 1 malloc for the non-sharable substring, then 2 more
mallocs to copy the string in to the hash. So each iteration of the while loop
in <code>features_index_add</code> will do 5 allocations: 2 Ruby objects, and 3 mallocs.</p>
<p>In cases like this, a picture might help explain better. Below is a diagram of
the allocated memory and how they relate to each other.</p>
<p><img src="/images/index-trunk.png" alt="Allocations on Trunk" /></p>
<p>This diagram shows what the memory layout looks like when adding the path
<code>/a/b/c.rb</code> to the index, resulting in 8 hash entries.</p>
<p>Blue nodes are allocations that were alive before the call to add the path to
the index. Red nodes are intermediate allocations done while populating the
index, and will be freed at some point. Black nodes are allocations made while
adding the path to the index but live <em>after</em> we’ve finished adding the path to
the index. Solid arrows represent actual references, where dotted lines
indicate a relationship but not actually a reference (like one string was
<code>ruby_strdup</code>‘d from another).</p>
<p>The graph has lots of nodes and is very complicated, but we will clean it up!</p>
<h2 id="applying-the-shared-string-optimization">Applying the Shared String Optimization</h2>
<p>I’ve translated the C code to Ruby code so that we can more easily see the
optimization at work:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="global-variable">$features_index</span> = {}
<span class="keyword">def</span> <span class="function">features_index_add</span>(feature, index)
ext = feature.index(<span class="string"><span class="delimiter">'</span><span class="content">.</span><span class="delimiter">'</span></span>)
p = ext ? ext : feature.length
loop <span class="keyword">do</span>
p -= <span class="integer">1</span>
<span class="keyword">while</span> p > <span class="integer">0</span> && feature[p] != <span class="string"><span class="delimiter">'</span><span class="content">/</span><span class="delimiter">'</span></span>
p -= <span class="integer">1</span>
<span class="keyword">end</span>
<span class="keyword">break</span> <span class="keyword">if</span> p == <span class="integer">0</span>
short_feature = feature[p + <span class="integer">1</span>, feature.length - p - <span class="integer">1</span>] <span class="comment"># New Ruby Object</span>
features_index_add_single(short_feature, index)
<span class="keyword">if</span> ext <span class="comment"># slice out the file extension if there is one</span>
short_feature = feature[p + <span class="integer">1</span>, ext - p - <span class="integer">1</span>] <span class="comment"># New Ruby Object + malloc</span>
features_index_add_single(short_feature, index)
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">features_index_add_single</span>(str, index)
<span class="keyword">return</span> <span class="keyword">if</span> <span class="global-variable">$features_index</span>.key?(str)
<span class="global-variable">$features_index</span>[str.dup] = index <span class="comment"># malloc</span>
<span class="keyword">end</span>
features_index_add <span class="string"><span class="delimiter">"</span><span class="content">/a/b/c.rb</span><span class="delimiter">"</span></span>, <span class="integer">1</span>
</pre></div>
</div>
</div>
<p>As we already learned, the shared string optimization only works when the
substrings include the end of the shared string. That is, we can only take
substrings from the left side of the string.</p>
<p>The first change we can make is to split the strings in to two cases: one with
an extension, and one without. Since the “no extension” if statement <strong>does
not</strong> scan to the end of the string, it always allocates a new string. If we
make a new string that doesn’t contain the extension, then we can eliminate one
of the malloc cases:</p>
<div class="language-ruby highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="global-variable">$features_index</span> = {}
<span class="keyword">def</span> <span class="function">features_index_add</span>(feature, index)
no_ext_feature = <span class="predefined-constant">nil</span>
p = feature.length
ext = feature.index(<span class="string"><span class="delimiter">'</span><span class="content">.</span><span class="delimiter">'</span></span>)
<span class="keyword">if</span> ext
p = ext
no_ext_feature = feature[<span class="integer">0</span>, ext] <span class="comment"># New Ruby Object + malloc</span>
<span class="keyword">end</span>
loop <span class="keyword">do</span>
p -= <span class="integer">1</span>
<span class="keyword">while</span> p > <span class="integer">0</span> && feature[p] != <span class="string"><span class="delimiter">'</span><span class="content">/</span><span class="delimiter">'</span></span>
p -= <span class="integer">1</span>
<span class="keyword">end</span>
<span class="keyword">break</span> <span class="keyword">if</span> p == <span class="integer">0</span>
short_feature = feature[p + <span class="integer">1</span>, feature.length - p - <span class="integer">1</span>] <span class="comment"># New Ruby Object</span>
features_index_add_single(short_feature, index)
<span class="keyword">if</span> ext
len = no_ext_feature.length
short_feature = no_ext_feature[p + <span class="integer">1</span>, len - p - <span class="integer">1</span>] <span class="comment"># New Ruby Object</span>
features_index_add_single(short_feature, index)
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">end</span>
<span class="keyword">def</span> <span class="function">features_index_add_single</span>(str, index)
<span class="keyword">return</span> <span class="keyword">if</span> <span class="global-variable">$features_index</span>.key?(str)
<span class="global-variable">$features_index</span>[str.dup] = index <span class="comment"># malloc</span>
<span class="keyword">end</span>
features_index_add <span class="string"><span class="delimiter">"</span><span class="content">/a/b/c.rb</span><span class="delimiter">"</span></span>, <span class="integer">1</span>
</pre></div>
</div>
</div>
<p>This changes the function to allocate one new string, but always scan to the end
of both strings. Now we have two strings that we can use to “scan from the
left”, we’re able to avoid new substring mallocs in the loop. You can see this
change, where I allocate a new string <em>without</em> an extension
<a href="https://github.com/github/ruby/commit/bec1637da7fc5bafd9c91ba6443ad38c29ec656f#diff-8962f5c4e82fc86da33bb950a9147069L232">here</a>.</p>
<p>Below is a graph of what the memory layout and relationships look like after
pulling up one slice, then sharing the string:</p>
<p><img src="/images/index-new-slice.png" alt="Allocations after shared slice" /></p>
<p>You can see from this graph that we were able to eliminate string buffers by
allocating the “extensionless” substring first, then taking slices from it.</p>
<p>There are two more optimizations I applied in this patch. Unfortunately they
are specific to the C language and not easy to explain using Ruby.</p>
<h2 id="eliminating-ruby-object-allocations">Eliminating Ruby Object Allocations</h2>
<p>The existing code uses Ruby to slice strings. This allocates a new Ruby object.
Now that we have two strings, we can always take substrings from the left, and
that means we can use pointers in C to “create” substrings. Rather than asking
Ruby APIs to slice the string for us, we simply use a pointer in C to point at
where we want the substring to start. The hash table that maintains the index
uses C strings as keys, so instead of passing Ruby objects around, we’ll just
pass a pointer in to the string:</p>
<div class="language-diff highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="line delete"><span class="delete">-</span> short_feature = rb_str_subseq(feature, beg, feature_end - p - 1);</span>
<span class="line delete"><span class="delete">-</span> features_index_add_single(short_feature, offset);</span>
<span class="line insert"><span class="insert">+</span> features_index_add_single(feature_str + beg, offset);</span>
if (ext) {
<span class="line delete"><span class="delete">-</span> short_feature = rb_str_subseq(feature, beg, ext - p - 1);</span>
<span class="line delete"><span class="delete">-</span> features_index_add_single(short_feature, offset);</span>
<span class="line insert"><span class="insert">+</span> features_index_add_single(feature_no_ext_str + beg, offset);</span>
}
}
<span class="line delete"><span class="delete">-</span> features_index_add_single(feature, offset);</span>
<span class="line insert"><span class="insert">+</span> features_index_add_single(feature<span class="eyecatcher">_str</span>, offset);</span>
if (ext) {
<span class="line delete"><span class="delete">-</span> short_feature = rb_str_subseq(feature, 0, ext - feature_str);</span>
<span class="line delete"><span class="delete">-</span> features_index_add_single(short_feature, offset);</span>
<span class="line insert"><span class="insert">+</span> features_index_add_single(feature_no_ext_str, offset);</span>
</pre></div>
</div>
</div>
<p>In this case, using a pointer in to the string simplifies our code.
<code>feature_str</code> is a pointer to the head of the string that <em>has</em> a file
extension, and <code>feature_no_ext_str</code> is a pointer to the head of the string that
<em>doesn’t</em> have a file extension. <code>beg</code> is the number of characters from the
head of the string where we want to slice. All we have to do now is just add
<code>beg</code> to the head of each pointer and pass that to <code>features_index_add_single</code>.</p>
<p>In this graph you can see we no longer need the intermediate Ruby objects
because the “add single” function directly accesses the underlying <code>char *</code>
pointer:</p>
<p><img src="/images/slice-then-ptr.png" alt="Allocations after pointer substrings" /></p>
<h2 id="eliminating-malloc-calls">Eliminating malloc Calls</h2>
<p>Finally, lets eliminate the <code>ruby_strdup</code> calls. As we covered earlier, new
Ruby strings could get allocated. These Ruby strings would get free’d by the
garbage collector, so we had to call <code>ruby_strdup</code> to keep a copy around inside
the index hash. The <code>feature</code> string passed in is also stored in the
<code>$LOADED_FEATURES</code> global array, so there is no need to copy that string as the
array will prevent the GC from collecting it. However, we created a new string
that does not have an extension, and that object could get collected. If we can
prevent the GC from collecting <em>those</em> strings, then we don’t need to copy
anything.</p>
<p>To keep these new strings alive, I added an array to the virtual machine (the
virtual machine lives for the life of the process):</p>
<div class="language-diff highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> vm->loaded_features = rb_ary_new();
vm->loaded_features_snapshot = rb_ary_tmp_new(0);
vm->loaded_features_index = st_init_strtable();
<span class="line insert"><span class="insert">+</span> vm->loaded_features_index_pool = rb_ary_new();</span>
</pre></div>
</div>
</div>
<p>Then I add the new string to the array via <code>rb_ary_push</code> right after allocation:</p>
<div class="language-diff highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span class="line insert"><span class="insert">+</span> short_feature_no_ext = rb_fstring(rb_str_freeze(rb_str_subseq(feature, 0, ext - feature_str)));</span>
<span class="line insert"><span class="insert">+</span> feature_no_ext_str = StringValuePtr(short_feature_no_ext);</span>
<span class="line insert"><span class="insert">+</span> rb_ary_push(get_loaded_features_index_pool_raw(), short_feature_no_ext);</span>
</pre></div>
</div>
</div>
<p>Now all strings in the index hash are shared and kept alive. This means we can
safely remove the <code>ruby_strdup</code> without any strings getting free’d by the GC:</p>
<div class="language-diff highlighter-coderay"><div class="CodeRay">
<div class="code"><pre> if (NIL_P(this_feature_index)) {
<span class="line delete"><span class="delete">-</span> st_insert(features_index, (st_data_t)<span class="eyecatcher">ruby_strdup(short_feature_cstr)</span>, (st_data_t)offset);</span>
<span class="line insert"><span class="insert">+</span> st_insert(features_index, (st_data_t)<span class="eyecatcher">short_feature_cstr</span>, (st_data_t)offset);</span>
}
</pre></div>
</div>
</div>
<p>After this change, we don’t need to copy any memory because the hash keys can
point directly in to the underlying character array inside the Ruby string
object:</p>
<p><img src="/images/hash-substrings.png" alt="Use string indexes for keys" /></p>
<p>This new algorithm does 2 allocations: one to create a “no extension” copy of
the original string, and one <code>RString</code> object to wrap it. The “loaded features
index pool” array keeps the newly created string from being garbage collected,
and now we can point directly in to the string arrays without needing to copy
the strings.</p>
<p>For any file added to the “loaded features” array, we changed it from requiring
O(N) allocations (where N is the number of slashes in a string) to always
requiring only two allocations regardless of the number of slashes in the
string.</p>
<h2 id="end"><strong>END</strong></h2>
<p>By using shared strings I was able to eliminate over 76000 system calls during
the Rails boot process on a basic app, reduce the memory footprint by 4%, and
speed up <code>require</code> by 35%. Next week I will try to get some statistics from a
large application and see how well it performs there!</p>
<p>Thanks for reading!</p>