Compiling with Clang 3

Posted by Aaron Patterson on January 03, 2010

HI EVERYONE AND HAPPY SUNDAY!

Lately I’ve been trying to compile my ruby extensions with Clang. One reason I like trying out my extensions with Clang is because it catches some errors that GCC doesn’t. If you know the right things to set, it’s pretty easy to get your extension to compile with Clang. Unfortunately finding the right thing isn’t always easy, but I found the right bits to flip and I want to share!

Here’s how to do it. Add this line to your extconf.rb right after you require mkmf:

require 'mkmf'

RbConfig::MAKEFILE_CONFIG['CC'] = ENV['CC'] if ENV['CC']

# ... rest of your extconf goes here

Then when you compile your extension, just set CC to point at clang:

$ CC=/Developer/usr/bin/clang rake compile

You can see it in action in the nokogiri extconf. You can even see where clang helped me shake out some bugs, and I think that’s pretty cool.

Full Text Search on Heroku 20

Posted by Aaron Patterson on October 17, 2009

YA!! IT’S SATURDAY NIGHT! YOU ALL KNOW WHAT THAT MEANS! Time to get krunk and do some full text searching. OW! I’d like to share with my tens of loyal readers how I’m doing Full Text Search on Heroku.

Heroku’s documentation lists two ways to get full text indexing working with your Heroku application. They talk about using Ferret and Solr for full text indexes. The Ferret option looks OK, but it requires you to rebuild your indexes every time you push. Solr would work, but it requires an EC2 instance or some third party server. Since my budget is precisely $0, using Solr is out of the picture.

But there is a third option. A very secret option. A devious but fun option. You see, Heroku runs PostgreSQL for each rails application database. They’re running a version new enough (Version 8.3) to have full text index support built in. If we’re willing to throw out database agnosticism, we can take advantage of the database’s indexing capability. For this article, I’d like to hop on the Postgres train and show you how to get full text indexes working with Postgres in your rails application. I’ll also show you how to get those indexes on Heroku so we can use them “in the cloud” (Heroku is in the cloud, right?).

For the rest of this article, I’m going to assume you have PostgreSQL version 8.3 or higher installed already and can get your rails application working with Postgres. Installing postgres is outside the scope of this article, but I found these instructions to be very helpful.

Step 1: Go get some coffee

I love it when instructions tell me to go get some coffee because I always do. I have to follow the instructions right?

Step 2: Install Texticle

Texticle is a gem I wrote to help you define your text indexes on a per model basis. To install texticle, we just do the normal gem install:

  $ sudo gem install texticle

The gem is pure ruby and isn’t very long, so I encourage you to peek through the source.

While we’re at it, we should configure rails to load the texticle gem. We need to add it to our envoronment.rb file. Here’s what mine looks like:

RAILS_GEM_VERSION = '2.3.4' unless defined? RAILS_GEM_VERSION

require File.join(File.dirname(__FILE__), 'boot')

Rails::Initializer.run do |config|
  config.time_zone = 'UTC'

  config.gem 'texticle'
end

Texticle also comes with some handy rake tasks (which we’ll talk about later). In order to get those we’ll need update the rails Rakefile:

require(File.join(File.dirname(__FILE__), 'config', 'boot'))

require 'rake'
require 'rake/testtask'
require 'rake/rdoctask'

require 'tasks/rails'

require 'rubygems'

## Our texticle rake tasks
require 'texticle/tasks'

Step 3: Configuring your index

Let’s pretend we have an Article model. The Article model has a “title” field and a “body” field:

class CreateArticles < ActiveRecord::Migration
  def self.up
    create_table :articles do |t|
      t.string :title
      t.text   :body

      t.timestamps
    end
  end

  def self.down
    drop_table :articles
  end
end

To index those two fields, we just create an index block in the model and list the fields we want to index:

class Article < ActiveRecord::Base
  index do
    title
    body
  end
end

Declaring this index automatically defines a “search” method on the model that we can use to search our articles:

>> Article.search('coffee instruction')
=> [#<Article id: 4, title: "coffee", body: "I like getting coffee to be in instructions", created_at: "2009-10-17 21:42:13", updated_at: "2009-10-17 21:42:13">]
>> Article.create(:title => 'kittens', :body => 'kitten poop smells bad, but I still like kittens.')
=> #<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33">
>> Article.search('kittens')
=> [#<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33">]
>>

Great! We can search our records. There’s just one catch: we haven’t indexed our data. Doing these types of searches will be slow against large sets of data unless we add an index. Writing these indexes is a PITA, so texticle comes with a handy rake task for generating a migration to create your indexes:

  $ rake texticle:migration
  $ rake db:migrate

After running this, Postgres can use the prebuilt indexes when searching your data.

Just remember: every time you modify columns in your index block, or add new index blocks, you should create a new migration to updated the indexes. If you don’t update the indexes, searches will still work as expected, they just might be kind of slow.

Step 4: Integrating With Heroku

This part is pretty easy. First we update our heroku gem manifest:

  $ echo "texticle" >> .gems
  $ git add .gems
  $ git commit -m'updating gem manifest'
  $ git push origin master

Once your code is up on heroku, just tell heroku to migrate the database:

  $ heroku rake db:migrate

It’s just that easy! Your indexes should be available on the Heroku database server and your application can use them.

Advanced Texticle Usage

Texticle has a few more features I’d like to briefly mention. The first one is search ranking. We can tell Postgres which field has a higher priority. For example, we can tell Postgres to weigh matches in the article’s title higher than matches in the body:

class Article < ActiveRecord::Base
  index do
    title 'A'
    body  'B'
  end
end

The ranks are ‘A’ through ‘D’, and multiple fields can have the same rank.

We can also group indexes. The index we’ve seen so far will search all columns listed. We can add another index so that we only search the “title” field:

class Article < ActiveRecord::Base
  index do
    title 'A'
    body  'B'
  end

  index('title') { title }
end

This gives us a “search_title” method in addition to the “search” method:

>> Article.search_title('kittens')
=> [#<Article id: 5, title: "kittens", body: "kitten poop smells bad, but I still like kittens.", created_at: "2009-10-17 21:42:33", updated_at: "2009-10-17 21:42:33">]
>>

The last thing I want to mention is “rank”. When you perform a search, texticle adds an extra field to your model called “rank”. The rank indicates how well your record matched the search criteria:

>> Article.search('like').map { |x| x.rank }
=> ["0.4", "0.4"]
>> Article.search('coffee').map { |x| x.rank }
=> ["1.4"]
>>

Search results are already returned sorted by rank in descending order, so no need to worry about sorting.

Conclusion

I hope you enjoy tickling text with texticle as much as I do. So far, I’ve been pretty happy with this solution.

Things I like:

  • It’s the right price for use with Heroku (namely $0)
  • Easy to configure and deploy
  • No need to rebuild indexes on pushes
  • Postgres can be configured to use different dictionaries, so you aren’t stuck with English

The only drawbacks I’ve found so far are:

  • INSERTs and UPDATEs are slower
  • It’s database specific

Inserts and updates will be slower, but that comes with the territory of adding database indexes. My data is mostly doing reads, so it doesn’t bother me. Texticle is database specific, but other databases are starting to have full text search support. I think texticle could be extended to support other databases, but I’m quite happy with postgres.

Anyway, thanks for reading. The final step is that you should go get another cup of coffee.

Ruby and RFID tags 13

Posted by Aaron Patterson on September 19, 2009

It’s been forever since I’ve written a blog entry, so LETS DO THIS. I want to talk about reading RFID tags with Ruby. I am a nerd, so even though I can’t think of a good application, I am compelled to be able to read RFID tags. I love programming Ruby, so of course, I have to do this with Ruby.

Getting an RFID Reader

First thing to do, is buy an RFID reader. After searching around, I found the touchatag reader. I bought the touchatag starter pack. It’s only $40, USB, and comes with 10 RFID tags. Most importantly, it works well with libnfc (more about that later).

IMG_0315

The tags that come with the reader have an adhesive back, so you can stick them to stuff. They also have the unique identifier printed on them so that you can make sure your program output is correct.

IMG_0317IMG_0318

Interfacing with the reader

Now that we’ve got the reader, let’s do something with it! I mentioned earlier that the touchatag reader works with libnfc. Libnfc is a C library that knows how to work with NFC devices (nerd talk for “RFID readers”). I’ve written a gem called nfc that wraps up the C library in to something we can use in Ruby.

First thing we need to do is install libnfc. I use macports with OS X. With macports, installing libnfc is quite easy:

    $ sudo port install libnfc

Installing on linux should be just as easy, but you’ll need to consult your package manager. Make sure to install the devel packages too!

After that, simply install the nfc Ruby gem:

    $ sudo gem install nfc

Now that that is out of the way, we can actually read an RFID tag. Here is our code:

require 'rubygems'
require 'nfc'
# Find a tag
NFC.instance.find do |tag|
  # Print out the tag we find
  p tag
end

That’s it! Run the code, then touch a tag to the reader, and boom! We have output. With the tag I’m using, the output looks like this:

$ ruby -I lib test.rb
(NFC) ISO14443A Tag
 ATQA (SENS_RES): 00  44
    UID (NFCID1): 04  D7  62  91  21  25  80
   SAK (SEL_RES): 00

The important part of this output is the UID field. That field is the unique identifier for this tag. The identifier comes back as a list of integers, but they are printed on the tag as hex. We can adjust the program just a little bit to see that list, or to get the same string that’s printed on the tag:

# Find a tag
NFC.instance.find do |tag|
  # Examine the raw numbers
  p tag.uid
  # Get just the UID as a string
  puts tag.to_s
end

The output looks like this:

$ ruby -I lib test.rb
[4, 215, 98, 145, 33, 37, 128]
04D76291212580

That’s pretty much it. Unfortunately, I can’t think of anything fun to do with my tags, but maybe you can! I hooked my tags up to the “say” command that comes with OS X and made each tag say something different.

Non-Blocking NFC interaction

Our previous example blocked until an RFID tag was read. If you run the program without having an RFID tag on the reader, it will just sit there until it can read a tag. Sometimes we might want to tell whether or not there is a tag on the reader right now. In other words, we don’t want our program to block.

Calling find without providing a block will return immediately:

p NFC.instance.find.to_s

You’ll get a return value immediately. The tag returned will either contain a blank uid, or an actual UID. Here is the output run once with a tag sitting on the reader, and once without a tag:

$ ruby -I lib test.rb
"04D76291212580"
$ ruby -I lib test.rb
""

Conclusion

That’s pretty much it. Interacting with the touchatag reader is quite simple and straight forward. Currently the nfc gem supports reading ISO1443A tags (the tags that come with the reader). The reader should be able to read other tag types, but I haven’t had a chance to get other tags to test.

Touchatag provides an official API for their readers. But the API seems difficult and is dependent on a network connection.

Here is a video of me reading some tags.
Here is the code from the video.
Here you can find more photos of the reader.
Finally, here is the source of the NFC gem.

Have fun reading some RFID tags!

String Encoding in Ruby 1.9 C extensions 3

Posted by Aaron Patterson on June 26, 2009

One of the challenges of developing nokogiri has been dealing with String encodings in C. I would like to present one of the problems encountered, along with a solution. I will be using RubyInline in the examples below, but the C code presented should be easy to port to your own C extensions.

Examining the Encoding

If you’ve developed a C extension before, you’re probably familiar with rb_str_new2 and friends. They all basically turn a char * in to a string VALUE. But in Ruby 1.9, what is the encoding of the returned Ruby String? Well, using RubyInline, it’s easy enough to see by calling the “encoding” method. Here is a script that works in Ruby 1.8 and Ruby 1.9:

require 'rubygems'
require 'inline'

class HelloWorld
  inline do |builder|
    builder.c '
      static VALUE test() {
        return rb_str_new2("Hello world");
      }
    '
  end
end

string = HelloWorld.new.test

if string.respond_to? :encoding
  puts string.encoding
else
  puts string
end

In Ruby 1.8, this outputs the string, and in 1.9 we see the encoding. In 1.9, the encoding returned is ASCII-8BIT. Now ASCII-8BIT may be the encoding that you want, but then again, it may not. In Nokogiri, the strings coming from libxml2 are already encoded according to the document declaration. So strings returned must be marked with the appropriate encoding. How can we update the encoding?

Changing the Encoding

In Ruby 1.9, we get a few new functions specifically for dealing with encoding. These functions are defined in <ruby/encoding.h>. We’re going to be dealing with two of them: rb_enc_find_index and rb_enc_associate_index.

The first function, rb_enc_find_index, given a char * will look up the index of your encoding. The function takes a string like “UTF-8″ and returns a magic index number for that encoding.

The second function, rb_enc_associate_index, will associate a string held in a VALUE with the encoding index returned from the first function.

Armed with this knowledge, we can modify our original program to return a string encoded with UTF-8. The only modifications are to include <ruby/encoding.h>, get the index for the desired encoding, then associate the VALUE with the returned index:

require 'rubygems'
require 'inline'

class HelloWorld
  inline do |builder|
    builder.include "<ruby/encoding.h>"

    builder.c '
      static VALUE test() {
        VALUE string = rb_str_new2("Hello World");
        int enc = rb_enc_find_index("UTF-8");
        rb_enc_associate_index(string, enc);
        return string;
      }
    '
  end
end

string = HelloWorld.new.test

if string.respond_to? :encoding
  puts string.encoding
else
  puts string
end

Great! When this is run under Ruby 1.9, the encoding returned is UTF-8. Unfortunately, this example is now specific for Ruby 1.9. Ruby 1.8 does not ship with the correct header files, and definitely does not include the functions for looking up and assigning encoding. This code will just not work under Ruby 1.8. Luckily, this code can be refactored to work under either version of Ruby.

Refactoring for 1.8 Support

Both Ruby 1.8 and 1.9 provide a <ruby.h> header file. The Ruby 1.9 version of that file defines a constant HAVE_RUBY_ENCODING_H that lets us determine whether the proper header file exists. Our final attempt tests for the encoding constant, then defines a macro to wrap rb_str_new2. If the version of Ruby we compile against has encoding support, the macro can add the encoding to the string, otherwise, it just ignores the encoding:

require 'rubygems'
require 'inline'

class HelloWorld
  inline do |builder|

    builder.prefix <<-eoc
#include <ruby.h>

#ifdef HAVE_RUBY_ENCODING_H

#include <ruby/encoding.h>

#define ENCODED_STR_NEW2(str, encoding) \
  ({ \
    VALUE _string = rb_str_new2((const char *)str); \
    int _enc = rb_enc_find_index(encoding); \
    rb_enc_associate_index(_string, _enc); \
    _string; \
  })

#else

#define ENCODED_STR_NEW2(str, encoding) \
  rb_str_new2((const char *)str)

#endif
    eoc

    builder.c '
      static VALUE test() {
        return ENCODED_STR_NEW2("Hello world", "UTF-8");
      }
    '
  end
end

string = HelloWorld.new.test

if string.respond_to? :encoding
  puts string.encoding
else
  puts string
end

In 1.8, the macro just returns the new string. In 1.9, the macro returns the string and additionally sets the encoding. Now if we use this macro wherever we create new strings, we’ll be working well with 1.8 and 1.9!

Final Notes

This example was slightly simplified. Since the encoding index is determined at runtime, there could be problems. If rb_enc_find_index cannot find the requested encoding, it simply returns a -1. The macro should handle that case.

Also, if you’re playing along at home, remember to save the file between running it with 1.8 and 1.9. RubyInline examines the mtime of the ruby file, and will only recompile when the rb file has been written to. That means if you run it with 1.8, then immediately run again with 1.9, it won’t recompile it for 1.9. I suppose I should send in a patch. ;-)

One last thing… There may be better ways to do this. I needed to determine the encoding at runtime because XML files declare their encoding scheme. If you parse an XML file that declares it’s encoding as EUC-JP, it would make sense that the strings you pull our are encoded in EUC-JP, right? If you know that you’re always going to be returning UTF-8 strings from your C extensions, it could be a different story. Either way, using macros and checking for constants should make sure your code works with 1.8 or 1.9.

Easy Markup Validation 1

Posted by Aaron Patterson on June 12, 2009

I wanted a test helper that would assert that my XHTML was valid XHTML. So I wrote one and called it “markup_validity“. You can use it too, and I will show you how.

First, install the gem:

  $ sudo gem install markup_validity

Then, use it in your tests:

require 'test/unit'
require 'rubygems'
require 'markup_validity'

class ValidHTML < Test::Unit::TestCase
  def test_i_can_has_valid_xhtml
    assert_xhtml_transitional xhtml_document
  end
end

Oh. You use RSpec? It supports that too:

require 'rubygems'
require 'markup_validity'

describe "my XHTML document" do
  it "can has transitional xhtml" do
    xhtml_document.should be_xhtml_transitional
  end
end

Debugging invalid markup can be a pain. MarkupValidity tries to give you helpful errors to make your life easier. Say you have an invalid piece of XHTML like this:

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  </head>
  <body>
    <p>
      <p>
        Hello
      </p>
    </p>
  </body>
</html>

The error output from MarkupValidity will be this:

.Error on line: 2:
Element 'head': Missing child element(s). Expected is one of ( script, style, meta, link, object, isindex, title, base ).

1: <html xmlns="http://www.w3.org/1999/xhtml">
2:   <head>
3:   </head>
4:   <body>
5:     <p>

Error on line: 6:
Element 'p': This element is not expected. Expected is one of ( a, br, span, bdo, object, applet, img, map, iframe, tt ).

5:     <p>
6:       <p>
7:         Hello
8:       </p>
9:     </p>

MarkupValidity provides a few assertions for test/unit:

  • assert_xhtml_transitional(xhtml) for asserting valid transitional XHTML
  • assert_xhtml_strict(xhtml) for asserting valid strict XHTML
  • assert_schema(schema, xml) for asserting that your xml validates against a schema
  • assert_xhtml which is an alias for assert_xhtml_transitional

The methods provided for RSpec are quite similar:

  • be_xhtml_transitional for asserting valid transitional XHTML
  • be_xhtml_strict for asserting valid strict XHTML
  • be_valid_with_schema(schema) for asserting that your xml validates against a schema
  • be_xhtml which is an alias for be_xhtml_transitional

MarkupValidity even works well with rails. Here is an example rails controller test:

require 'test_helper'
require 'markup_validity'

class AwesomeControllerTest < ActionController::TestCase
  test "valid markup" do
    get :new
    assert_xhtml_transitional @response.body
  end
end

Autotest and Vim integration 5

Posted by Aaron Patterson on May 18, 2009

Yay! I got vim and autotest integration working. When I run autotest, if there is an error, I can have Vim read the errors from autotest and jump me to the right place.

Here is a video of me using it:

Please note that I’m not copying and pasting anything. In vim, I hit a command and Vim automatically picks up errors from autotest and jumps me to the line where the error occurred.

You too can impress your friends with this trick! Here’s how:

  1. Make sure you have vim-ruby installed
  2. Use this as your .autotest file:
    require 'autotest/restart'
    
    Autotest.add_hook :initialize do |at|
      at.unit_diff = 'cat'
    end
    
    Autotest.add_hook :ran_command do |at|
      File.open('/tmp/autotest.txt', 'wb') { |f|
        f.write(at.results.join)
      }
    end
    
  3. Add this to your .vimrc:

    compiler rubyunit
    nmap <Leader>fd :cf /tmp/autotest.txt<cr> :compiler rubyunit<cr>
    [/sourcecode]

Now when you get an error in autotest, just type "\fd" in Vim to jump straight to your first error.

The contents of /tmp/autotest.txt will be used in your errorfile. In Vim do ":help quickfix" for more info on what you can do with your new found power.

Caveat: You don't get unit_diff. I'm working on that. Any help would be much appreciated (I suck at errorformat in Vim).

Fat binary gems make the rockin’ world go round 9

Posted by Aaron Patterson on May 07, 2009

Right now people who publish native gems targeting the windows platform have a problem. Our problem is supporting ruby 1.8 and 1.9 at the same time. Right now, we can’t build one gem targeting 1.8 and one gem targeting 1.9, and have rubygems differentiate the two. I have a solution: fat binary gems. We can build a gem that contains dynamic libraries that target ruby 1.8 and ruby 1.9 on windows, with no changes to rubygems whatsoever. I’ve put together a proof of concept that I want to share. I will walk through the steps for building a fat binary gem with the tools we have today. The steps I am going to present are not necessarily the best steps, they are just the steps I took to get this idea working.

The tools I will use are MinGW for cross compiling, hoe and rake-compiler for their packaging and compiling tasks, multiruby for cross compiling 1.8 and 1.9, and use nokogiri as the target gem to be built.

Here is the basic strategy for making dreams happen:

1. Gem entry point must be written in Ruby

When someone does “require ‘whatever’” on your library, that ‘whatever.rb’ file must be written in ruby and work with both 1.8 and 1.9. The reason is because we will:

2. Dynamically determine the correct SO file to load

We can determine at runtime the current ruby version, then load the appropriate SO file at runtime.

Let’s get down to business and see it in action.

Getting our hands dirty

The first thing we need to do is make sure that the so file from the ruby 1.8 build and the ruby 1.9 build are in a different place. The way I accomplished this was by customizing my Rake::Extension task (from rake-compiler), and adding a prerequisite to the cross task:

RET = Rake::ExtensionTask.new("nokogiri", HOE.spec) do |ext|
  ext.lib_dir = "ext/nokogiri"
end

task :muck_with_lib_dir do
  RET.lib_dir += "/#{RUBY_VERSION.sub(/\.\d$/, '')}"
  FileUtils.mkdir_p(RET.lib_dir)
end
if Rake::Task.task_defined?(:cross)
  Rake::Task[:cross].prerequisites << "muck_with_lib_dir"
end

This code will make sure the so file goes to “ext/nokogiri/1.8″ when compiling with ruby 1.8, and “ext/nokogiri/1.9″ when compiling with 1.9. Then, all you have to do is compile your extension twice:


$ ~/.multiruby/install/1.8.6-p114/bin/rake cross compile
$ rm -rf tmp
$ $ ~/.multiruby/install/1.9.1-rc2/bin/rake cross compile
[/sourcecode]
WARNING! Watch out for that "rm -rf". That is removing the tmp directory that rake-compiler made. rake-compiler doesn't seem to know that I switched ruby versions. In order to get the two different compilations working, I had to manually remove the already compiled objects.

Dynamic loading

So we've got our compiled so files in two different locations. What about loading? This step is very easy. Since our entry point will be in ruby, we can just write this in our entry point file:

if RUBY_PLATFORM =~/(mswin|mingw)/i
  # Fat binary gems, you make the Rockin' world go round
  require "nokogiri/#{RUBY_VERSION.sub(/\.\d+$/, '')}/nokogiri"
else
  require 'nokogiri/nokogiri'
end

Basically all this code says is "if we're running windows, load the shared object from a path that contains the ruby version". When a windows user requires this file, the path to the shared object is determined by the version of ruby that they are using. If they're running 1.8, the path will be "nokogiri/1.8/nokogiri", if they're running 1.9, "nokogiri/1.9/nokogiri".

Packaging

We've got one more hurdle to overcome, and that is packaging. We need to make sure that when we're building the windows gem, our custom so files are added to the gem. To do this, I just added another task:

task :add_dll_to_manifest do
  HOE.spec.files += Dir['ext/nokogiri/**.{dll,so}']
  HOE.spec.files += Dir['ext/nokogiri/{1.8,1.9}/**.{dll,so}']
end

if Rake::Task.task_defined?(:cross)
  Rake::Task[:cross].prerequisites << :add_dll_to_manifest
end

This makes sure that any extra dll or so files in our ext directories are added to the gem. Now we can run our packaging task:

$ ~/.multiruby/install/1.8.6-p114/bin/rake cross native gem
[/sourcecode]

If everything went well, we can examine the content of our packaged gem and find two different so files:

$ gem spec pkg/nokogiri-1.2.4-x86-mswin32.gem files | grep nokogiri.so
- ext/nokogiri/1.8/nokogiri.so
- ext/nokogiri/1.9/nokogiri.so
$
[/sourcecode]

Conclusion

There we have it, a fat binary gem. This gem will work with Ruby 1.8 OR Ruby 1.9 on windows. If you're a windows user, and you'd like to try using this fat binary gem, I have it on my gem server. Just do:

$ gem install nokogiri -s http://tenderlovemaking.com/
[/sourcecode]
The next full release of nokogiri will be using this technique for windows builds. Also, the rake tasks that I've presented were somewhat simplified. If you'd like to get very specific, check out the nokogiri source.

Next Steps

I would like to work with Luis on integrating this functionality in to rake-compiler. I'm not sure the best way to go about it, but I know that he and I can simplify these steps even further.

[ANN] nokogiri 1.3.0rc1 has been released! 1

Posted by Aaron Patterson on May 06, 2009

= nokogiri version 1.3.0rc1 has been released!

Thanks to herculean efforts by my nokogiri partner in crime, Mike Dalessio,
nokogiri now works on JRuby 1.3.0RC1 via FFI.

To install this prerelease gem do this:

$ jgem install nokogiri -s http://tenderlovemaking.com/
[/sourcecode]

Then you should be able to do this:


$ jirb
irb(main):001:0> require 'open-uri'
=> true
irb(main):002:0> require 'rubygems'
=> true
irb(main):003:0> require 'nokogiri'
=> true
irb(main):004:0> doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))
=> #
irb(main):005:0> doc.css('h3.r a.l').length
=> 10
irb(main):006:0>
[/sourcecode]

== CAVEATS!

* The JRuby FFI gem only works with JRuby 1.3.0RC1
* You MUST install it from my gem server
* The gem version will say 1.2.4, that is actually because I couldn't get
pre release gem versions working. Don't worry, it's actually the 1.3.0
release candidate.
* You can get an MRI version and the JRuby version from my gem server, no
windows support yet.

== ACCOLADES

* Mike made this FFI monster happen! I can't thank him enough.
* Thanks to the JRuby team for making FFI work!

== CHANGELOG

* hahahahahaha
* hahahahahahaha
* hahahaha
* hahahahahahahha
* You'll get to see the acutal changes when this isn't a release candidate
* Or check out the git repository

== More information

* github
* rdoc

Namespaces in XML 14

Posted by Aaron Patterson on April 23, 2009

Shit. This is a boring topic. Just writing the title made me cry a little bit out of boredom. Unfortunately this topic is something I feel compelled to write about because I think that most Ruby developers dealing with XML know very little about the topic, and yet XML namespaces are crucial when dealing with XML documents. So, in order to curb the boredom, I will attempt to demonstrate why we need namespaces and how they affect you when dealing with XML in the shortest amount of time. I will also try to sprinkle in a few swear words and innuendos just to make sure you’re paying attention.

A tale of two companies

One day, long ago, when XML was written on punch cards, Alice’s Auto Supply decided that they would distribute their inventory as XML so that other people would know what they had in stock. They came up with an XML document that looked like this:

<?xml version="1.0"?>
<inventory>
  <tire name="super slick racing tire" />
  <tire name="all weather tire" />
</inventory>

Excellent! Programmers started consuming the inventory for Alice’s shop. They could pull a list of tires from the document like this:

doc.xpath('//tire')

Bob’s Bike Shop also wanted to get on the XML broadcast bandwagon. So they followed suit and produced an XML document as well:

<?xml version="1.0"?>
<inventory>
  <tire name="narrow street tire" />
  <tire name="mountain trail tire" />
</inventory>

Again, programmers were happy. They started consuming inventory for Bob’s Bike Shop and getting a list of bike tires like this:

doc.xpath('//tire')

Everything was going well until someone decided to consume inventory from both sources. Fuck. There was no way to tell the difference between a car tire from Alice’s Auto Supply, or a bike tire from Bob’s Bike Shop. The search criteria used for both documents was the same:

doc.xpath('//tire')

How was one suppose to tell the difference between a bike tire and a car tire? There is a naming conflict, and this code would return both! Systems began crashing, punch cards were lit on fire, massive power outages occurred. Needless to say, our society was at it’s lowest point.

Fortunately, a very smart group of people (much smarter than me) came along and said “let’s associate these things to something unique, then we can tell them apart”. That unique bit of information was a URL. They also said “let’s have the ability to name the url so that we can easily reference it in our XML documents”. Fortunately, the names for a URL does not have to be unique since it is tied to the unique URL! Yay!

Alice got wind of these updates to the XML spec, and wanted to make sure that everyone could tell the difference between her car tires and some other tire. So she updated her inventory document, adding her URL with a name, and naming all of her inventory:

<?xml version="1.0"?>
<inventory xmlns:car="http://alicesautosupply.example.com/">
  <car:tire name="super slick racing tire" />
  <car:tire name="all weather tire" />
</inventory>

Alice’s new inventory document was pushed out. The developers had not yet updated their code. They thought they had a new bug, the code was only returning tires from Bill’s Bike shop. But how to get the car tires? They had to inform the parser they were looking for tires associated with Alice’s url, and changed their code to look like this:

doc.xpath('//tire')
doc.xpath('//aliceAuto:tire',
  'aliceAuto' => 'http://alicesautosupply.example.com/'
)

The first query returned tires that have no namespace and the second query returned tires that belong to Alice’s shop.

Alice’s inventory grew and grew (owing it all to namespacing her document of course). Prefixing everything in her document with “car” was taking a toll on her fingers as well as her puchcard supply. The XML superheros had a trick up their sleeves for Alice. They said that “URLs could be declared as a default” that way every tag could be associated with a URL but not explicitly declare a name.

Armed with this knowledge, Alice was able to change her XML to this:

<?xml version="1.0"?>
<inventory xmlns="http://alicesautosupply.example.com/">
  <tire name="super slick racing tire" />
  <tire name="all weather tire" />
</inventory>

Her XML now stated that “inventory” and all tags inside “inventory” belonged to her URL, but did not need a prefix. Everything was still associated with her URL so that developers could tell the difference between her tires and all other tires, but she did not have to add prefixes.

As for the developers consuming her XML, they never noticed the change. Their code asked for all tires belonging to Alice’s URL, and Alice’s document still declared that those tires belonged to her URL. Punchcards were saved, carpel tunnel was cured, and the world rejoiced.

Conclusion

I hope this story hasn’t been too boring. Here are the key parts I wanted to explain:

  1. Namespaces prevent tag name collision.
    We would not be able to deal with colliding tag names without namespaces.
  2. When you don’t specify a namespace in your search, that means you want tags with no namespace.
  3. Namespaces are tied to URLs. Only the URL must be unique which is why you must use the URL in your XPath queries.

Remember that there is a difference between asking for tags that belong to a namespace and ones that do not. Also remember that a default namespace in a document means that tags which do not explicitly have a namespace belong to the default. So even though those tags do not have a prefix you must use a namespace when querying for them.

Bonus Round

Even though using namespaces is essential when searching an XML document, Nokogiri tries to help out. If there are namespaces declared on the root node of a document, Nokogiri will automatically register those for you. You will still have to use the prefix when searching the document, but the URL registration is done for you.

Let’s modify Alice’s XML a little to demonstrate:

<?xml version="1.0"?>
<inventory xmlns="http://alicesautosupply.example.com/" xmlns:bike="http://bobsbikes.example.com.">
  <tire name="super slick racing tire" />
  <tire name="all weather tire" />
  <bike:tire name="skinny street" />
</inventory>

Using Nokogiri, these two statements are equivalent:

doc.xpath('//xmlns:tire',
  'xmlns' => 'http://alicesautosupply.example.com/'
)
doc.xpath('//xmlns:tire')

We can specify the namespace ourselves, or use the same name that Nokogiri picks for us.

Similarly, if we want to find bike tires, these two statements are equivalent:

doc.xpath('//bike:tire',
  'bike' => 'http://bobsbikes.example.com/'
)
doc.xpath('//bike:tire')

Testing JavaScript Outside the Browser 7

Posted by Aaron Patterson on April 05, 2009

The other day at LA RubyConf during the Johnson presentation, I showed a few slides which I don’t think were given the time that they deserve. Not that we didn’t have enough time, I just don’t think I made as big a deal about them as I should have. Those particular slides demonstrated HTML Document Object manipulation executed in JavaScript outside any web browser. Those particular slides, and that particular code, is the culmination of over a year worth of work (and Yak Shaving) and I would like to talk about it a little more in detail here.

Since I started doing any sort of non-trivial browser dependent JavaScript, I’ve wanted to be able to test the code which I wrote. Hitting refresh on a webpage seems like a hack. Setting up a special browser to refresh the page for me also seems like a hack. I want to run “rake test” and have my JavaScript DOM manipulations tested right along with everything else, no browser dependence required. As far as I could tell, we need three things to make that happen:

  1. A JavaScript runtime that can be used in Ruby
  2. A parser with browser-like HTML correction schemes
  3. A DOM interface that mirrors a browsers DOM interface

Over the weekend, I think we’ve come a lot closer. John has finally released Johnson. Johnson solves problem number 1. Johnson provides a JavaScript runtime that is fully accessible in Ruby. Watch our RubyConf 2008 presentation about Johnson for more details about that project.

Number 2, I believe, has been solved by nokogiri. As far as I can tell, the tree generated inside libxml2 is very similar to one found in the browser. Nokogiri was partly a Yak Shave for number 3. Since I had writing a DOM interface in mind, nokogiri’s api lends well to writing a DOM api.

Number 3 was partly solved this weekend. I’ve been working on a DOM api called taka. Taka sits a DOM api on top of nokogiri. The goal of the project is to mirror a browser’s DOM api in Ruby.

With these three tools in place, I believe that we have a good start on a browserless JavaScript testing environment.

Codes

Enough talk. Let’s look at some codes. Take this HTML page for example:

<html>
  <head>
    <script>
      function populateDropDown() {
        var select = document.getElementById('colors');
        var options = ['red', 'green', 'blue', 'black'];
        var i;
        for(i = 0; i < options.length; i++) {
          var option = document.createElement('option');
          option.appendChild(document.createTextNode(options[i]));
          option.value = options[i];
          select.appendChild(option);
        }
      }
    </script>
  </head>
  <body onload="populateDropDown()">
    <h1>Behold the Johnson</h1>
    <form>
      <select id="colors">
      </select>
    </form>
  </body>
</html>

The JavaScript in this HTML will add a few option tags as children of the select tag. Effectively populating the drop down for our user. It would be nice if we could write a test to assert that when this JavaScript executes, the option tags are actually added as children of the select tag.

With Johnson and Taka, it is possible to write such a test:

require 'rubygems'
require 'taka'
require 'johnson'
require 'test/unit'

class OptionTagsAppendedTest < Test::Unit::TestCase
  def setup
    # Create our DOM object
    @document = Taka::DOM::HTML(DATA.read)

    # Create a new JavaScript runtime
    @rt = Johnson::Runtime.new

    # Set the document in the runtime
    @rt['document'] = @document

    # Execute any script tags
    @document.getElementsByTagName('script').each do |script|
      @rt.evaluate(script.textContent);
    end
  end

  def test_options_populated_by_onload
    # 0 option tags before onload is executed
    assert_equal 0, @document.getElementsByTagName('option').length

    # Execute the onload body attribute
    @rt.evaluate(@document.getElementsByTagName('body')[0].onload)

    # 4 option tags after onload is executed
    assert_equal 4, @document.getElementsByTagName('option').length
  end
end

There. It’s done. This test executes the JavaScript and manipulates your HTML the same way the browser would. You can run this code today, just make sure to install the johnson and taka gems first.

Problems

There are at least a few problems. This HTML code is, admittedly, carefully crafted. So far, taka only implements the DOM 1 interface. That means taka is missing many methods that are available in browsers. The good news though is that Taka is pure ruby and open source. As soon as you find methods that are missing, fork the repo, add a test, and send a pull request. I will be sure to merge it.

Conclusion

We are making progress towards testing JavaScript without a browser. We have to do it a step at a time. The solution I have presented to you, while not complete, has promise. I think that the only thing standing in our way right now is time and man power. The methods that need to be implemented on Taka to make it mirror a browser are not hard (take a look at the taka source). These methods just need to be written.