Author:

Aaron PattersonMy name is Aaron Patterson.

Event based JSON and YAML parsing

Posted by – April 17, 2010

Let’s use Ruby 1.9.2 and Psych to build an event based twitter stream parser. Psych is a YAML parser that I wrote and is in the standard library in 1.9.2. Eventually, it will replace the current YAML parser, but we can still use it today!

But you said YAML and JSON! wtf?

I know! In the YAML 1.2 spec, JSON is a subset of YAML. Psych supports YAML 1.1 right now, so *much* (but not all) JSON is supported. Once libyaml is upgraded to YAML 1.2, it will have full JSON support!

Why do we want to do an event based parser?

Twitter streams are a never ending flow of user status updates, and if we want a process to live forever consuming these updates, it would be nice if that process kept a low memory profile. Psych is built in such a way that we can hand it an IO object, it will read from the IO object, then call callback methods as soon as possible. It buffers as little as possible, sending events as soon as possible. If you are familiar with SAX based XML parsing, this will be familiar to you. Plus it is a fun problem!

Let’s start by writing an event listener for some sample JSON.

Event Listener

Our event listener is only going to listen for scalar events, meaning that when Psych parses a string, it will send that string to our listener. There are many different events that can happen, so Psych ships with a handler from which you can inherit. If you check out the source for the base class handler, you can see what types of events your handler can intercept.

For now, let’s write our scalar handler, and try it out.

require 'psych'

class Listener < Psych::Handler
  def scalar(value, anchor, tag, plain, quoted, style)
    puts value
  end
end

listener = Listener.new
parser   = Psych::Parser.new listener
parser.parse DATA

__END__
{"foo":"bar"}

If you run this code, you should see the strings “foo” and “bar” printed.

In this example, our handler simply prints out every scalar value encountered. We created a new instance of the listener, pass that listener to a new instance of the parser, and tell the parser to parse DATA. We can hand the parser an IO object or a String object. This is important because we’d like to hand the parser our socket connection, that way the parser can deal with reading from the socket for us.

Hooking up to Twitter

It would be convenient for us if Twitter’s stream was one continuous JSON document. Why? If it was, we could feed the socket straight to our JSON parser and start consuming events immediately. Unfortunately, Twitter’s stream is not so kind for us event based consumers. We’ll need to trick our JSON parser to think the feed is one continuous document. We’ll get tricky with our data in a minute, but first let’s deal with authentication.

Authentication

Twitter requires us to authenticate before we can consume a feed. Stream authentication is done via Basic Auth. Let’s write a class that can authenticate and read from the stream. Once we do that, we’ll concentrate on parsing the stream.

require 'socket'

class StreamClient
  def initialize user, pass
    @ba = ["#{user}:#{pass}"].pack('m').chomp
  end

  def listen
    socket = TCPSocket.new 'stream.twitter.com', 80
    socket.write "GET /1/statuses/sample.json HTTP/1.1\r\n"
    socket.write "Host: stream.twitter.com\r\n"
    socket.write "Authorization: Basic #{@ba}\r\n"
    socket.write "\r\n"

    # Read the headers
    while((line = socket.readline) != "\r\n"); puts line if $DEBUG; end

    # Consume the feed
    while line = socket.readline
      puts line
    end
  end
end

StreamClient.new(ARGV[0], ARGV[1]).listen

This class takes a username and password and calculates the basic auth signature. When “listen” is called, it opens a connection, authorizes, reads the response headers, and starts consuming the feed.

Processing the Feed

If we look at the output from the previous script, we’ll see that the Twitter stream looks something like this:

512
{"in_reply_to_screen_name":null,...}

419
{"in_reply_to_screen_name":"tenderlove"...}

Which isn’t valid JSON. Instead, it’s a header (the number) indicating the length of the JSON chunk, the JSON chunk, then a trailing “\r\n”. We would like the stream to look something like this:

---
{"in_reply_to_screen_name":null,...}
...
---
{"in_reply_to_screen_name":"tenderlove"...}
...

This chunk is two valid YAML documents. If the stream looked like this, we could feed it straight to our YAML processor no problem. How can we modify the stream to be suitable for our parser?

Fun with Thread and IO.pipe

If we create a pipe, we can have have one thread process input from Twitter and feed that in to the pipe. We can then give the other end of the pipe to our JSON processor and let it read from our processed feed. Let’s modify the “listen” method in our client to munge the feed to a pipe, and hand that off to our YAML processor. I only care about the text of people’s tweets, so let’s modify our listener too.

Here is our completed program:

require 'socket'
require 'psych'

class StreamClient
  def initialize user, pass
    @ba = ["#{user}:#{pass}"].pack('m').chomp
  end

  def listen listener
    socket = TCPSocket.new 'stream.twitter.com', 80
    socket.write "GET /1/statuses/sample.json HTTP/1.1\r\n"
    socket.write "Host: stream.twitter.com\r\n"
    socket.write "Authorization: Basic #{@ba}\r\n"
    socket.write "\r\n"

    # Read the headers
    while((line = socket.readline) != "\r\n"); puts line if $DEBUG; end

    reader, writer = IO.pipe
    producer = Thread.new(socket, writer) do |s, io|
      loop do
        io.write "---\n"
        io.write s.read s.readline.strip.to_i 16
        io.write "...\n"
        s.read 2 # strip the blank line
      end
    end

    parser = Psych::Parser.new listener
    parser.parse reader

    producer.join
  end
end

class Listener < Psych::Handler
  def initialize
    @was_text = false
  end

  def scalar value, anchor, tag, plain, quoted, style
    puts value if @was_text
    @was_text = value == 'text'
  end
end

StreamClient.new(ARGV[0], ARGV[1]).listen Listener.new

Great! In 30 lines, we’ve been able to provide an event based API for consuming Twitter streams. Were it not for the feed munging, we could reduce that by 9 lines!

Problems

So far, there have only been two problems for me with this script. The first is that we are forced to buffer the response from Twitter, but we cannot help that. The second is that sometimes the JSON emitted from Twitter is not parseable by Psych. I think this is just due to Psych only supporting YAML 1.1.

Conclusion

It’s true that we could have implemented this same interface without a pipe and a thread. Rather than munging the stream, we could create a new parser instance for each status update. But why create so many objects for parsing the stream when we only need one?

Anyway, have fun playing with this code, and I encourage you to try out Ruby 1.9.2. I think it’s really fun! PEW PEW PEW! HAPPY SATURDAY!

RDoc on your iPad

Posted by – April 12, 2010

Oh snap! I haven’t posted here in a long time. My day job and my night jobs have been keeping me too busy! Hopefully I’ll have more time to blog in the future. I have a bunch of ideas, I just need to find the time to write!

Anyway, let’s talk RDoc, iPad, and epub! I like documentation. I especially like consuming documentation. I thought it would be neat if I could read documentation on my iPad. As it turns out, getting RDoc documentation on your iPad isn’t that hard!

Nokogiri on iPad

According to Wikipedia, iBooks is an EPUB reader. EPUB is a standard format for making books. The EPUB format is basically a zip file that contains a bunch of XHTML and XML documents. The XHTML documents are the “meat” of your book, where the XML documents tell the reader where to find everything, and the order in which to put things. RDoc already emits HTML, so our job is to make sure it emits XHTML along with the special XML files. How do we do that?

RDoc supports a plugin system where we can hook in and emit anything we want. To hook in to RDoc, we just add a special file to our gem (“lib/rdoc/discover.rb”), and register with the RDoc plugin system. So I wrote a gem called paddle that plugs in to RDoc, emits the documentation as XHTML along with the supporting XML files. It even comes with a nice Ruby logo! I encourage you to take a look at the source. The code is quite short, but could be refactored even smaller!

Using the Paddle

Creating your own books with Paddle is really easy. First, install paddle:

  $ sudo gem install paddle

Then find a project for which you want to create a book. For this example, I’ll generate a book for one of my gems called texticle. From the project root, use the rdoc command. Make sure to tell rdoc to use the “paddle” formatter, and supply a title (very important to supply a title!):

  $ cd git/texticle
  $ rdoc -f paddle -t 'Texticle Documentation' -o epub lib

Now there should be an “epub” directory that contains your book. But we’re not quite done yet. There is one more step. The book must be in a zipfile, and the zipfile requires a particular format. Let’s create the zipfile now using the “zip” command:

  $ cd epub
  $ zip -Xr9D texticle.epub mimetype *

You should end up with a file named “texticle.epub”. Just drag that file to iTunes, sync up your iPad, and boom!

Problems

I hacked this out in an evening, so there are a few problems. I’ll mention them here, just so you’re not surprised, and to give you ideas for patches to submit! ;-)

  • Right now, the links don’t work:
  • I haven’t figured out why, but they don’t. That will come soon.

  • The author field isn’t filled out in the book:
  • I need to teach RDoc to take more command line options so we can tell Paddle what to use for the book’s author field

  • Only classes, modules, and the things they contain are documented:
  • Right now, your README file won’t show up in the book. That is just missing right now. It should be easy to add, I just haven’t done it.

A couple books to get you started

THE END!

Thanks for reading! Have fun making books for your iPad, and don’t forget to send patches back to me! :-D

Compiling with Clang

Posted by – January 3, 2010

HI EVERYONE AND HAPPY SUNDAY!

Lately I’ve been trying to compile my ruby extensions with Clang. One reason I like trying out my extensions with Clang is because it catches some errors that GCC doesn’t. If you know the right things to set, it’s pretty easy to get your extension to compile with Clang. Unfortunately finding the right thing isn’t always easy, but I found the right bits to flip and I want to share!

Here’s how to do it. Add this line to your extconf.rb right after you require mkmf:

require 'mkmf'

RbConfig::MAKEFILE_CONFIG['CC'] = ENV['CC'] if ENV['CC']

# ... rest of your extconf goes here

Then when you compile your extension, just set CC to point at clang:

$ CC=/Developer/usr/bin/clang rake compile

You can see it in action in the nokogiri extconf. You can even see where clang helped me shake out some bugs, and I think that’s pretty cool.

Writing Ruby C extensions: Part 1

Posted by – December 18, 2009

Writing Ruby C extensions: Part 1

I like writing C extensions for Ruby. In this series of blog posts we’re
going to explore writing C extensions. I will cover topics including setting
up the development environment, TDD, debugging techniques, dealing with
Ruby’s garbage collector, cross compiling for windows, and more.

By the end of this series, we should end up with a Ruby C extension that wraps
libstree. libstree is a Suffix tree implementation written in C.

In this part, we’re going to set up our development environment, examine the
layout of a typical C extension, and implement our first method in C. Of
course we will be doing this TDD, so we’ll also get autotest running.

Prerequisite gems

First up, we need to install a few gems to make building our extension easier.
Install the following three gems, and while they install, you should read about
why we need them:

$ sudo gem install ZenTest hoe rake-compiler

ZenTest

ZenTest contains autotest, which we’ll be using to automatically run the tests
while we’re developing

hoe

Hoe abstracts gem specifications for us. It knows how to properly build a
gemspec, and provides us with a few rake tasks that make development simple.

rake-compiler

This gem provides us with compilation tasks, and generally makes building
native gems easier. We’ll be looking further in to rake-compiler’s
capabilities in later articles.

Create the project

We’re going to call this gem “stree”. The first thing we’ll do is use the “sow”
command supplied by Hoe to create the initial project structure.

$ sow stree

You should now have an initial project tree set up under the “stree” directory.
Remove the “bin” directory, as we won’t need that. I rename all of my
documentation files to end in “rdoc”, but that is just my personal preference.

Writing our first test

First thing we need to do is write our first failing test. Open up
“test/test_stree.rb” and make it look like this:

require "test/unit"
require "stree"

class TestStree < Test::Unit::TestCase
  def test_hello_world
    assert_equal 'hello world', Stree.hello_world
  end
end

This test is very simple. The trick though, is that the “hello_world” method
will be implemented in C. At this point, you should be able to run “rake” and
see a failing test.

Native extension project layout

Native extension layouts look very similar to normal pure ruby layouts. We just
add one more directory called “ext”. Under the “ext” directory we’ll add
another directory that is the same name as our gem, “stree”. Under
“ext/stree” is where we’ll keep all of our C code. Make those directories,
and you should have a file list that looks similar to this:

$ tree
.
|-- CHANGELOG.rdoc
|-- Manifest.txt
|-- README.rdoc
|-- Rakefile
|-- ext
|   `-- stree
|-- lib
|   `-- stree.rb
`-- test
    `-- test_stree.rb

The next step is to modify our Rakefile.

Modifying the Rakefile

The next step is to modify the Rakefile to teach it how to compile our
extension. Once we get done with this step, our Rakefile will have a task
called “compile”.

Modify your Rakefile so that it looks similar to this:

require 'rubygems'
require 'hoe'

Hoe.spec 'stree' do
  developer('Aaron Patterson', 'aaron@tenderlovemaking.com')
  self.readme_file   = 'README.rdoc'
  self.history_file  = 'CHANGELOG.rdoc'
  self.extra_rdoc_files  = FileList['*.rdoc']
  self.extra_dev_deps << ['rake-compiler', '>= 0']
  self.spec_extras = { :extensions => ["ext/stree/extconf.rb"] }

  Rake::ExtensionTask.new('stree', spec) do |ext|
    ext.lib_dir = File.join('lib', 'stree')
  end
end

Rake::Task[:test].prerequisites << :compile

I've modified the readme and history file sections to use custom named files.
The important parts are the "spec_extras", the "Rake::ExtensionTask" line and
the "Rake::Task" line.

The "spec_extras" line modifies the gemspec. When someone installs our gem,
this line tells the gem command to execute the "ext/stree/extconf.rb" file.
We'll talk a little bit more about the extconf.rb file later.

The "Rake::ExtensionTask" is the line where we get our "compile" task. It comes
from the rake-compiler gem. This block also configures rake-compiler to tell
it where to copy the compiled extension when it's finished. We want our
compiled extension to end up in "lib/stree/". This is my convention, and I'll
explain why this convention is good in later posts.

The final line tells Rake to always compile our extension before the tests run.
Some people might not want to use this, but I like compiling my extension
before every test run.

Configuring autotest

Autotest doesn't use the normal rake tasks when running your tests. That means
we need to teach autotest to compile our extension before running the tests.
We're going to hook in to the autotest run command and have it build our
extension before running the tests.

While we're at it, we'll also teach autotest to run the tests after any .c
files get modified.

Open up ".autotest" and make it look like this:

require 'autotest/restart'

Autotest.add_hook :initialize do |at|
  at.add_mapping(/.*\.c/) do |f, _|
    at.files_matching(/test_.*rb$/)
  end
end

Autotest.add_hook :run_command do |at|
  system "rake clean compile"
end

Start up autotest and let it run in the background. By the end of this blog
post, autotest should show one passing test.

At this point, we should see Rake complaining with a message:

rake aborted!
Don't know how to build task 'ext/stree/extconf.rb'

Let's deal with that error now.

extconf.rb

The responsibility of the extconf.rb file is generate a Makefile that will
be used to build your extension. Eventually, we will need to teach extconf.rb
how to examine the target system to make sure that the libstree library is
installed.

Right now, we don't need to do any inspection of the system. We simply want
to create a Makefile. To build our Makefile, we're going to use a library
that ships with ruby called "mkmf". Open up "ext/stree/extconf.rb" and
modify it to look like this:

require 'mkmf'
create_makefile('stree/stree')

That is the minimum code required to get our Makefile generated.

As I mentioned earlier, the extconf.rb file is executed by the RubyGems system
when installing our gem. While we're developing our gem, rake-compiler will
take care of executing that file for us.

Our first C code

Great! Our environment knows how to compile things, but we don't have anything
to compile! Let's write our first bit of C code.

We are going to write the file named "ext/stree/stree.c". The name of this
file is important. It corresponds to the "create_makefile" line from our
extconf.rb. After our extension is built, we'll end up with a file
"lib/stree/stree.dylib" (or .so depending on your system). This convention is
important, but I'm going to talk about why it's important in a later post.

When Ruby loads the dynamic library we're building, it must supply us with
some way to define our native methods. The way it does this is with another
naming convention using the dynamic library's file name. When "stree.dylib"
is loaded, ruby will automatically try to call a function called "Init_stree".
The second part matches the name of the file it loaded. In the Init_stree
function is where we'll do our native extension initialization.

In stree.c, define the Init_stree function to look like this:

#include <ruby.h>
void Init_stree()
{
  VALUE mStree = rb_define_module("Stree");
  rb_define_singleton_method(mStree, "hello_world", hello_world, 0);
}

This function does two things, defines the "Stree" module, and the "hello_world"
method on the Stree module.

The first line actually defines the module, the second line tells ruby to
define the singleton method "hello_world", and when that method gets called,
to call the "hello_world" C function pointer. The 0 indicates the number of
arguments.

Let's actually add the hello_world C function now:

static VALUE hello_world(VALUE mod)
{
  return rb_str_new2("hello world");
}

We declare this function as static because it's not needed outside this file.
All ruby methods must return a VALUE. The first argument to the C function is
always the recipient of the message, in this case it will be the Stree module.

We create a new ruby string with rb_str_new2() and return it.

The final stree.c file should look like this:

#include <ruby.h>

static VALUE hello_world(VALUE klass)
{
  return rb_str_new2("hello world");
}

void Init_stree()
{
  VALUE mStree = rb_define_module("Stree");
  rb_define_singleton_method(mStree, "hello_world", hello_world, 0);
}

A note on types.

When we write ruby, everything is an object. When we write ruby in C,
everything is a VALUE. We'll learn more about the VALUE type in later posts.

Finishing up

We've written our C code, everything should now compile and copy in to the
right place, but our tests are still failing. What gives?

We've got one more tiny modification to make. We need to actually require
the dynamic library that we built. Open up "lib/stree.rb" and modify it to
look like this:

require 'stree/stree'

module Stree
  VERSION = '1.0.0'
end

We've now told ruby to load the dynamic library we built, and changed the
definition of Stree to a module in our ruby code. At this point, our test
should be passing. Congratulations! You have now successfully mixed C and
Ruby code.

If you were successful, your project tree should look like this:

$ tree -I tmp
.
|-- CHANGELOG.rdoc
|-- Manifest.txt
|-- README.rdoc
|-- Rakefile
|-- ext
|   `-- stree
|       |-- extconf.rb
|       `-- stree.c
|-- lib
|   |-- stree
|   |   `-- stree.bundle
|   `-- stree.rb
`-- test
    `-- test_stree.rb

The "tmp" directory is where rake-compiler stashes your .o files when
compiling your extension. I've omitted that from the tree to keep it short.

Last notes

I've posted the code for stree here in case you're having troubles. I've
made tags for each post so you can follow along.

Next time, we'll tackle making sure libstree is installed, compiling and
linking against libstree, and making a few calls in to libstree from Ruby.

In the mean time, your homework is to read through README.EXT.

Rubyconf 2009 Slides

Posted by – December 7, 2009

I’ve posted the slides for the talk that Ryan and I gave at Rubyconf 2009.

You can grab the pdf here.

I’ve also put them on slideshare here.

I replaced the videos in the slides with links to youtube. If you want to jump straight to the videos though, you can find them here, here, and here.

Full Text Search on Heroku

Posted by – October 17, 2009

YA!! IT’S SATURDAY NIGHT! YOU ALL KNOW WHAT THAT MEANS! Time to get krunk and do some full text searching. OW! I’d like to share with my tens of loyal readers how I’m doing Full Text Search on Heroku.

Heroku’s documentation lists two ways to get full text indexing working with your Heroku application. They talk about using Ferret and Solr for full text indexes. The Ferret option looks OK, but it requires you to rebuild your indexes every time you push. Solr would work, but it requires an EC2 instance or some third party server. Since my budget is precisely $0, using Solr is out of the picture.

But there is a third option. A very secret option. A devious but fun option. You see, Heroku runs PostgreSQL for each rails application database. They’re running a version new enough (Version 8.3) to have full text index support built in. If we’re willing to throw out database agnosticism, we can take advantage of the database’s indexing capability. For this article, I’d like to hop on the Postgres train and show you how to get full text indexes working with Postgres in your rails application. I’ll also show you how to get those indexes on Heroku so we can use them “in the cloud” (Heroku is in the cloud, right?).

For the rest of this article, I’m going to assume you have PostgreSQL version 8.3 or higher installed already and can get your rails application working with Postgres. Installing postgres is outside the scope of this article, but I found these instructions to be very helpful.

Step 1: Go get some coffee

I love it when instructions tell me to go get some coffee because I always do. I have to follow the instructions right?

Step 2: Install Texticle

Texticle is a gem I wrote to help you define your text indexes on a per model basis. To install texticle, we just do the normal gem install:

  $ sudo gem install texticle

The gem is pure ruby and isn’t very long, so I encourage you to peek through the source.

While we’re at it, we should configure rails to load the texticle gem. We need to add it to our envoronment.rb file. Here’s what mine looks like:

RAILS_GEM_VERSION = '2.3.4' unless defined? RAILS_GEM_VERSION

require File.join(File.dirname(__FILE__), 'boot')

Rails::Initializer.run do |config|
  config.time_zone = 'UTC'

  config.gem 'texticle'
end

Texticle also comes with some handy rake tasks (which we’ll talk about later). In order to get those we’ll need update the rails Rakefile:

require(File.join(File.dirname(__FILE__), 'config', 'boot'))

require 'rake'
require 'rake/testtask'
require 'rake/rdoctask'

require 'tasks/rails'

require 'rubygems'

## Our texticle rake tasks
require 'texticle/tasks'

Step 3: Configuring your index

Let’s pretend we have an Article model. The Article model has a “title” field and a “body” field:

class CreateArticles < ActiveRecord::Migration
  def self.up
    create_table :articles do |t|
      t.string :title
      t.text   :body

      t.timestamps
    end
  end

  def self.down
    drop_table :articles
  end
end

To index those two fields, we just create an index block in the model and list the fields we want to index:

class Article < ActiveRecord::Base
  index do
    title
    body
  end
end

Declaring this index automatically defines a “search” method on the model that we can use to search our articles:

>> Article.search('coffee instruction')
=> [#<Article id: 4, title: "coffee", body: "I like getting coffee to be in instructions", created_at: "2009-10-17 21:42:13", updated_at: "2009-10-17 21:42:13">]
>> Article.create(:title => 'kittens', :body => 'kitten poop smells bad, but I still like kittens.')
=> #<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33">
>> Article.search('kittens')
=> [#<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33">]
>>

Great! We can search our records. There’s just one catch: we haven’t indexed our data. Doing these types of searches will be slow against large sets of data unless we add an index. Writing these indexes is a PITA, so texticle comes with a handy rake task for generating a migration to create your indexes:

  $ rake texticle:migration
  $ rake db:migrate

After running this, Postgres can use the prebuilt indexes when searching your data.

Just remember: every time you modify columns in your index block, or add new index blocks, you should create a new migration to updated the indexes. If you don’t update the indexes, searches will still work as expected, they just might be kind of slow.

Step 4: Integrating With Heroku

This part is pretty easy. First we update our heroku gem manifest:

  $ echo "texticle" >> .gems
  $ git add .gems
  $ git commit -m'updating gem manifest'
  $ git push origin master

Once your code is up on heroku, just tell heroku to migrate the database:

  $ heroku rake db:migrate

It’s just that easy! Your indexes should be available on the Heroku database server and your application can use them.

Advanced Texticle Usage

Texticle has a few more features I’d like to briefly mention. The first one is search ranking. We can tell Postgres which field has a higher priority. For example, we can tell Postgres to weigh matches in the article’s title higher than matches in the body:

class Article < ActiveRecord::Base
  index do
    title 'A'
    body  'B'
  end
end

The ranks are ‘A’ through ‘D’, and multiple fields can have the same rank.

We can also group indexes. The index we’ve seen so far will search all columns listed. We can add another index so that we only search the “title” field:

class Article < ActiveRecord::Base
  index do
    title 'A'
    body  'B'
  end

  index('title') { title }
end

This gives us a “search_title” method in addition to the “search” method:

>> Article.search_title('kittens')
=> [#<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33">]
>>

The last thing I want to mention is “rank”. When you perform a search, texticle adds an extra field to your model called “rank”. The rank indicates how well your record matched the search criteria:

>> Article.search('like').map { |x| x.rank }
=> ["0.4", "0.4"]
>> Article.search('coffee').map { |x| x.rank }
=> ["1.4"]
>>

Search results are already returned sorted by rank in descending order, so no need to worry about sorting.

Conclusion

I hope you enjoy tickling text with texticle as much as I do. So far, I’ve been pretty happy with this solution.

Things I like:

  • It’s the right price for use with Heroku (namely $0)
  • Easy to configure and deploy
  • No need to rebuild indexes on pushes
  • Postgres can be configured to use different dictionaries, so you aren’t stuck with English

The only drawbacks I’ve found so far are:

  • INSERTs and UPDATEs are slower
  • It’s database specific

Inserts and updates will be slower, but that comes with the territory of adding database indexes. My data is mostly doing reads, so it doesn’t bother me. Texticle is database specific, but other databases are starting to have full text search support. I think texticle could be extended to support other databases, but I’m quite happy with postgres.

Anyway, thanks for reading. The final step is that you should go get another cup of coffee.

Ruby and RFID tags

Posted by – September 19, 2009

It’s been forever since I’ve written a blog entry, so LETS DO THIS. I want to talk about reading RFID tags with Ruby. I am a nerd, so even though I can’t think of a good application, I am compelled to be able to read RFID tags. I love programming Ruby, so of course, I have to do this with Ruby.

Getting an RFID Reader

First thing to do, is buy an RFID reader. After searching around, I found the touchatag reader. I bought the touchatag starter pack. It’s only $40, USB, and comes with 10 RFID tags. Most importantly, it works well with libnfc (more about that later).

IMG_0315

The tags that come with the reader have an adhesive back, so you can stick them to stuff. They also have the unique identifier printed on them so that you can make sure your program output is correct.

IMG_0317IMG_0318

Interfacing with the reader

Now that we’ve got the reader, let’s do something with it! I mentioned earlier that the touchatag reader works with libnfc. Libnfc is a C library that knows how to work with NFC devices (nerd talk for “RFID readers”). I’ve written a gem called nfc that wraps up the C library in to something we can use in Ruby.

First thing we need to do is install libnfc. I use macports with OS X. With macports, installing libnfc is quite easy:

    $ sudo port install libnfc

Installing on linux should be just as easy, but you’ll need to consult your package manager. Make sure to install the devel packages too!

After that, simply install the nfc Ruby gem:

    $ sudo gem install nfc

Now that that is out of the way, we can actually read an RFID tag. Here is our code:

require 'rubygems'
require 'nfc'
# Find a tag
NFC.instance.find do |tag|
  # Print out the tag we find
  p tag
end

That’s it! Run the code, then touch a tag to the reader, and boom! We have output. With the tag I’m using, the output looks like this:

$ ruby -I lib test.rb
(NFC) ISO14443A Tag
 ATQA (SENS_RES): 00  44
    UID (NFCID1): 04  D7  62  91  21  25  80
   SAK (SEL_RES): 00

The important part of this output is the UID field. That field is the unique identifier for this tag. The identifier comes back as a list of integers, but they are printed on the tag as hex. We can adjust the program just a little bit to see that list, or to get the same string that’s printed on the tag:

# Find a tag
NFC.instance.find do |tag|
  # Examine the raw numbers
  p tag.uid
  # Get just the UID as a string
  puts tag.to_s
end

The output looks like this:

$ ruby -I lib test.rb
[4, 215, 98, 145, 33, 37, 128]
04D76291212580

That’s pretty much it. Unfortunately, I can’t think of anything fun to do with my tags, but maybe you can! I hooked my tags up to the “say” command that comes with OS X and made each tag say something different.

Non-Blocking NFC interaction

Our previous example blocked until an RFID tag was read. If you run the program without having an RFID tag on the reader, it will just sit there until it can read a tag. Sometimes we might want to tell whether or not there is a tag on the reader right now. In other words, we don’t want our program to block.

Calling find without providing a block will return immediately:

p NFC.instance.find.to_s

You’ll get a return value immediately. The tag returned will either contain a blank uid, or an actual UID. Here is the output run once with a tag sitting on the reader, and once without a tag:

$ ruby -I lib test.rb
"04D76291212580"
$ ruby -I lib test.rb
""

Conclusion

That’s pretty much it. Interacting with the touchatag reader is quite simple and straight forward. Currently the nfc gem supports reading ISO1443A tags (the tags that come with the reader). The reader should be able to read other tag types, but I haven’t had a chance to get other tags to test.

Touchatag provides an official API for their readers. But the API seems difficult and is dependent on a network connection.

Here is a video of me reading some tags.
Here is the code from the video.
Here you can find more photos of the reader.
Finally, here is the source of the NFC gem.

Have fun reading some RFID tags!

String Encoding in Ruby 1.9 C extensions

Posted by – June 26, 2009

One of the challenges of developing nokogiri has been dealing with String encodings in C. I would like to present one of the problems encountered, along with a solution. I will be using RubyInline in the examples below, but the C code presented should be easy to port to your own C extensions.

Examining the Encoding

If you’ve developed a C extension before, you’re probably familiar with rb_str_new2 and friends. They all basically turn a char * in to a string VALUE. But in Ruby 1.9, what is the encoding of the returned Ruby String? Well, using RubyInline, it’s easy enough to see by calling the “encoding” method. Here is a script that works in Ruby 1.8 and Ruby 1.9:

require 'rubygems'
require 'inline'

class HelloWorld
  inline do |builder|
    builder.c '
      static VALUE test() {
        return rb_str_new2("Hello world");
      }
    '
  end
end

string = HelloWorld.new.test

if string.respond_to? :encoding
  puts string.encoding
else
  puts string
end

In Ruby 1.8, this outputs the string, and in 1.9 we see the encoding. In 1.9, the encoding returned is ASCII-8BIT. Now ASCII-8BIT may be the encoding that you want, but then again, it may not. In Nokogiri, the strings coming from libxml2 are already encoded according to the document declaration. So strings returned must be marked with the appropriate encoding. How can we update the encoding?

Changing the Encoding

In Ruby 1.9, we get a few new functions specifically for dealing with encoding. These functions are defined in <ruby/encoding.h>. We’re going to be dealing with two of them: rb_enc_find_index and rb_enc_associate_index.

The first function, rb_enc_find_index, given a char * will look up the index of your encoding. The function takes a string like “UTF-8″ and returns a magic index number for that encoding.

The second function, rb_enc_associate_index, will associate a string held in a VALUE with the encoding index returned from the first function.

Armed with this knowledge, we can modify our original program to return a string encoded with UTF-8. The only modifications are to include <ruby/encoding.h>, get the index for the desired encoding, then associate the VALUE with the returned index:

require 'rubygems'
require 'inline'

class HelloWorld
  inline do |builder|
    builder.include "<ruby/encoding.h>"

    builder.c '
      static VALUE test() {
        VALUE string = rb_str_new2("Hello World");
        int enc = rb_enc_find_index("UTF-8");
        rb_enc_associate_index(string, enc);
        return string;
      }
    '
  end
end

string = HelloWorld.new.test

if string.respond_to? :encoding
  puts string.encoding
else
  puts string
end

Great! When this is run under Ruby 1.9, the encoding returned is UTF-8. Unfortunately, this example is now specific for Ruby 1.9. Ruby 1.8 does not ship with the correct header files, and definitely does not include the functions for looking up and assigning encoding. This code will just not work under Ruby 1.8. Luckily, this code can be refactored to work under either version of Ruby.

Refactoring for 1.8 Support

Both Ruby 1.8 and 1.9 provide a <ruby.h> header file. The Ruby 1.9 version of that file defines a constant HAVE_RUBY_ENCODING_H that lets us determine whether the proper header file exists. Our final attempt tests for the encoding constant, then defines a macro to wrap rb_str_new2. If the version of Ruby we compile against has encoding support, the macro can add the encoding to the string, otherwise, it just ignores the encoding:

require 'rubygems'
require 'inline'

class HelloWorld
  inline do |builder|

    builder.prefix <<-eoc
#include <ruby.h>

#ifdef HAVE_RUBY_ENCODING_H

#include <ruby/encoding.h>

#define ENCODED_STR_NEW2(str, encoding) \
  ({ \
    VALUE _string = rb_str_new2((const char *)str); \
    int _enc = rb_enc_find_index(encoding); \
    rb_enc_associate_index(_string, _enc); \
    _string; \
  })

#else

#define ENCODED_STR_NEW2(str, encoding) \
  rb_str_new2((const char *)str)

#endif
    eoc

    builder.c '
      static VALUE test() {
        return ENCODED_STR_NEW2("Hello world", "UTF-8");
      }
    '
  end
end

string = HelloWorld.new.test

if string.respond_to? :encoding
  puts string.encoding
else
  puts string
end

In 1.8, the macro just returns the new string. In 1.9, the macro returns the string and additionally sets the encoding. Now if we use this macro wherever we create new strings, we’ll be working well with 1.8 and 1.9!

Final Notes

This example was slightly simplified. Since the encoding index is determined at runtime, there could be problems. If rb_enc_find_index cannot find the requested encoding, it simply returns a -1. The macro should handle that case.

Also, if you’re playing along at home, remember to save the file between running it with 1.8 and 1.9. RubyInline examines the mtime of the ruby file, and will only recompile when the rb file has been written to. That means if you run it with 1.8, then immediately run again with 1.9, it won’t recompile it for 1.9. I suppose I should send in a patch. ;-)

One last thing… There may be better ways to do this. I needed to determine the encoding at runtime because XML files declare their encoding scheme. If you parse an XML file that declares it’s encoding as EUC-JP, it would make sense that the strings you pull our are encoded in EUC-JP, right? If you know that you’re always going to be returning UTF-8 strings from your C extensions, it could be a different story. Either way, using macros and checking for constants should make sure your code works with 1.8 or 1.9.

Easy Markup Validation

Posted by – June 12, 2009

I wanted a test helper that would assert that my XHTML was valid XHTML. So I wrote one and called it “markup_validity“. You can use it too, and I will show you how.

First, install the gem:

  $ sudo gem install markup_validity

Then, use it in your tests:

require 'test/unit'
require 'rubygems'
require 'markup_validity'

class ValidHTML < Test::Unit::TestCase
  def test_i_can_has_valid_xhtml
    assert_xhtml_transitional xhtml_document
  end
end

Oh. You use RSpec? It supports that too:

require 'rubygems'
require 'markup_validity'

describe "my XHTML document" do
  it "can has transitional xhtml" do
    xhtml_document.should be_xhtml_transitional
  end
end

Debugging invalid markup can be a pain. MarkupValidity tries to give you helpful errors to make your life easier. Say you have an invalid piece of XHTML like this:

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  </head>
  <body>
    <p>
      <p>
        Hello
      </p>
    </p>
  </body>
</html>

The error output from MarkupValidity will be this:

.Error on line: 2:
Element 'head': Missing child element(s). Expected is one of ( script, style, meta, link, object, isindex, title, base ).

1: <html xmlns="http://www.w3.org/1999/xhtml">
2:   <head>
3:   </head>
4:   <body>
5:     <p>

Error on line: 6:
Element 'p': This element is not expected. Expected is one of ( a, br, span, bdo, object, applet, img, map, iframe, tt ).

5:     <p>
6:       <p>
7:         Hello
8:       </p>
9:     </p>

MarkupValidity provides a few assertions for test/unit:

  • assert_xhtml_transitional(xhtml) for asserting valid transitional XHTML
  • assert_xhtml_strict(xhtml) for asserting valid strict XHTML
  • assert_schema(schema, xml) for asserting that your xml validates against a schema
  • assert_xhtml which is an alias for assert_xhtml_transitional

The methods provided for RSpec are quite similar:

  • be_xhtml_transitional for asserting valid transitional XHTML
  • be_xhtml_strict for asserting valid strict XHTML
  • be_valid_with_schema(schema) for asserting that your xml validates against a schema
  • be_xhtml which is an alias for be_xhtml_transitional

MarkupValidity even works well with rails. Here is an example rails controller test:

require 'test_helper'
require 'markup_validity'

class AwesomeControllerTest < ActionController::TestCase
  test "valid markup" do
    get :new
    assert_xhtml_transitional @response.body
  end
end

Autotest and Vim integration

Posted by – May 18, 2009

Yay! I got vim and autotest integration working. When I run autotest, if there is an error, I can have Vim read the errors from autotest and jump me to the right place.

Here is a video of me using it:

Please note that I’m not copying and pasting anything. In vim, I hit a command and Vim automatically picks up errors from autotest and jumps me to the line where the error occurred.

You too can impress your friends with this trick! Here’s how:

  1. Make sure you have vim-ruby installed
  2. Use this as your .autotest file:
    require 'autotest/restart'
    
    Autotest.add_hook :initialize do |at|
      at.unit_diff = 'cat'
    end
    
    Autotest.add_hook :ran_command do |at|
      File.open('/tmp/autotest.txt', 'wb') { |f|
        f.write(at.results.join)
      }
    end
    
  3. Add this to your .vimrc:

    compiler rubyunit
    nmap <Leader>fd :cf /tmp/autotest.txt<cr> :compiler rubyunit<cr>

Now when you get an error in autotest, just type “\fd” in Vim to jump straight to your first error.

The contents of /tmp/autotest.txt will be used in your errorfile. In Vim do “:help quickfix” for more info on what you can do with your new found power.

Caveat: You don’t get unit_diff. I’m working on that. Any help would be much appreciated (I suck at errorformat in Vim).