2009-04-05 @ 20:15

Testing JavaScript Outside the Browser

The other day at LA RubyConf during the Johnson presentation, I showed a few slides which I don’t think were given the time that they deserve. Not that we didn’t have enough time, I just don’t think I made as big a deal about them as I should have. Those particular slides demonstrated HTML Document Object manipulation executed in JavaScript outside any web browser. Those particular slides, and that particular code, is the culmination of over a year worth of work (and Yak Shaving) and I would like to talk about it a little more in detail here.

Since I started doing any sort of non-trivial browser dependent JavaScript, I’ve wanted to be able to test the code which I wrote. Hitting refresh on a webpage seems like a hack. Setting up a special browser to refresh the page for me also seems like a hack. I want to run “rake test” and have my JavaScript DOM manipulations tested right along with everything else, no browser dependence required. As far as I could tell, we need three things to make that happen:

  1. A JavaScript runtime that can be used in Ruby
  2. A parser with browser-like HTML correction schemes
  3. A DOM interface that mirrors a browsers DOM interface

Over the weekend, I think we’ve come a lot closer. John has finally released Johnson. Johnson solves problem number 1. Johnson provides a JavaScript runtime that is fully accessible in Ruby. Watch our RubyConf 2008 presentation about Johnson for more details about that project.

Number 2, I believe, has been solved by nokogiri. As far as I can tell, the tree generated inside libxml2 is very similar to one found in the browser. Nokogiri was partly a Yak Shave for number 3. Since I had writing a DOM interface in mind, nokogiri’s api lends well to writing a DOM api.

Number 3 was partly solved this weekend. I’ve been working on a DOM api called taka. Taka sits a DOM api on top of nokogiri. The goal of the project is to mirror a browser’s DOM api in Ruby.

With these three tools in place, I believe that we have a good start on a browserless JavaScript testing environment.

Codes

Enough talk. Let’s look at some codes. Take this HTML page for example:

<html>
  <head>
    <script>
      function populateDropDown() {
        var select = document.getElementById('colors');
        var options = ['red', 'green', 'blue', 'black'];
        var i;
        for(i = 0; i < options.length; i++) {
          var option = document.createElement('option');
          option.appendChild(document.createTextNode(options[i]));
          option.value = options[i];
          select.appendChild(option);
        }
      }
    </script>
  </head>
  <body onload="populateDropDown()">
    <h1>Behold the Johnson</h1>
    <form>
      <select id="colors">
      </select>
    </form>
  </body>
</html>

The JavaScript in this HTML will add a few option tags as children of the select tag. Effectively populating the drop down for our user. It would be nice if we could write a test to assert that when this JavaScript executes, the option tags are actually added as children of the select tag.

With Johnson and Taka, it is possible to write such a test:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
require 'rubygems'
require 'taka'
require 'johnson'
require 'test/unit'

class OptionTagsAppendedTest < Test::Unit::TestCase
  def setup
    # Create our DOM object
    @document = Taka::DOM::HTML(DATA.read)

    # Create a new JavaScript runtime
    @rt = Johnson::Runtime.new

    # Set the document in the runtime
    @rt['document'] = @document

    # Execute any script tags
    @document.getElementsByTagName('script').each do |script|
      @rt.evaluate(script.textContent);
    end
  end

  def test_options_populated_by_onload
    # 0 option tags before onload is executed
    assert_equal 0, @document.getElementsByTagName('option').length

    # Execute the onload body attribute
    @rt.evaluate(@document.getElementsByTagName('body')[0].onload)

    # 4 option tags after onload is executed
    assert_equal 4, @document.getElementsByTagName('option').length
  end
end

There. It’s done. This test executes the JavaScript and manipulates your HTML the same way the browser would. You can run this code today, just make sure to install the johnson and taka gems first.

Problems

There are at least a few problems. This HTML code is, admittedly, carefully crafted. So far, taka only implements the DOM 1 interface. That means taka is missing many methods that are available in browsers. The good news though is that Taka is pure ruby and open source. As soon as you find methods that are missing, fork the repo, add a test, and send a pull request. I will be sure to merge it.

Conclusion

We are making progress towards testing JavaScript without a browser. We have to do it a step at a time. The solution I have presented to you, while not complete, has promise. I think that the only thing standing in our way right now is time and man power. The methods that need to be implemented on Taka to make it mirror a browser are not hard (take a look at the taka source). These methods just need to be written.

read more »

2009-04-23 @ 12:03

Namespaces in XML

Shit. This is a boring topic. Just writing the title made me cry a little bit out of boredom. Unfortunately this topic is something I feel compelled to write about because I think that most Ruby developers dealing with XML know very little about the topic, and yet XML namespaces are crucial when dealing with XML documents. So, in order to curb the boredom, I will attempt to demonstrate why we need namespaces and how they affect you when dealing with XML in the shortest amount of time. I will also try to sprinkle in a few swear words and innuendos just to make sure you’re paying attention.

A tale of two companies

One day, long ago, when XML was written on punch cards, Alice’s Auto Supply decided that they would distribute their inventory as XML so that other people would know what they had in stock. They came up with an XML document that looked like this:

&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;inventory&gt;
  &lt;tire name=&quot;super slick racing tire&quot; /&gt;
  &lt;tire name=&quot;all weather tire&quot; /&gt;
&lt;/inventory&gt;

Excellent! Programmers started consuming the inventory for Alice’s shop. They could pull a list of tires from the document like this:

doc.xpath('//tire')

Bob’s Bike Shop also wanted to get on the XML broadcast bandwagon. So they followed suit and produced an XML document as well:

&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;inventory&gt;
  &lt;tire name=&quot;narrow street tire&quot; /&gt;
  &lt;tire name=&quot;mountain trail tire&quot; /&gt;
&lt;/inventory&gt;

Again, programmers were happy. They started consuming inventory for Bob’s Bike Shop and getting a list of bike tires like this:

doc.xpath('//tire')

Everything was going well until someone decided to consume inventory from both sources. Fuck. There was no way to tell the difference between a car tire from Alice’s Auto Supply, or a bike tire from Bob’s Bike Shop. The search criteria used for both documents was the same:

doc.xpath('//tire')

How was one suppose to tell the difference between a bike tire and a car tire? There is a naming conflict, and this code would return both! Systems began crashing, punch cards were lit on fire, massive power outages occurred. Needless to say, our society was at it’s lowest point.

Fortunately, a very smart group of people (much smarter than me) came along and said “let’s associate these things to something unique, then we can tell them apart”. That unique bit of information was a URL. They also said “let’s have the ability to name the url so that we can easily reference it in our XML documents”. Fortunately, the names for a URL does not have to be unique since it is tied to the unique URL! Yay!

Alice got wind of these updates to the XML spec, and wanted to make sure that everyone could tell the difference between her car tires and some other tire. So she updated her inventory document, adding her URL with a name, and naming all of her inventory:

1
2
3
4
5
&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;inventory xmlns:car=&quot;http://alicesautosupply.example.com/&quot;&gt;
  &lt;car:tire name=&quot;super slick racing tire&quot; /&gt;
  &lt;car:tire name=&quot;all weather tire&quot; /&gt;
&lt;/inventory&gt;

Alice’s new inventory document was pushed out. The developers had not yet updated their code. They thought they had a new bug, the code was only returning tires from Bill’s Bike shop. But how to get the car tires? They had to inform the parser they were looking for tires associated with Alice’s url, and changed their code to look like this:

doc.xpath('//tire')
doc.xpath('//aliceAuto:tire',
  'aliceAuto' =&gt; 'http://alicesautosupply.example.com/'
)

The first query returned tires that have no namespace and the second query returned tires that belong to Alice’s shop.

Alice’s inventory grew and grew (owing it all to namespacing her document of course). Prefixing everything in her document with “car” was taking a toll on her fingers as well as her puchcard supply. The XML superheros had a trick up their sleeves for Alice. They said that “URLs could be declared as a default” that way every tag could be associated with a URL but not explicitly declare a name.

Armed with this knowledge, Alice was able to change her XML to this:

&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;inventory xmlns=&quot;http://alicesautosupply.example.com/&quot;&gt;
  &lt;tire name=&quot;super slick racing tire&quot; /&gt;
  &lt;tire name=&quot;all weather tire&quot; /&gt;
&lt;/inventory&gt;

Her XML now stated that “inventory” and all tags inside “inventory” belonged to her URL, but did not need a prefix. Everything was still associated with her URL so that developers could tell the difference between her tires and all other tires, but she did not have to add prefixes.

As for the developers consuming her XML, they never noticed the change. Their code asked for all tires belonging to Alice’s URL, and Alice’s document still declared that those tires belonged to her URL. Punchcards were saved, carpel tunnel was cured, and the world rejoiced.

Conclusion

I hope this story hasn’t been too boring. Here are the key parts I wanted to explain:

  1. Namespaces prevent tag name collision. We would not be able to deal with colliding tag names without namespaces.
  2. When you don't specify a namespace in your search, that means you want tags with no namespace.
  3. Namespaces are tied to URLs. Only the URL must be unique which is why you must use the URL in your XPath queries.

Remember that there is a difference between asking for tags that belong to a namespace and ones that do not. Also remember that a default namespace in a document means that tags which do not explicitly have a namespace belong to the default. So even though those tags do not have a prefix you must use a namespace when querying for them.

Bonus Round

Even though using namespaces is essential when searching an XML document, Nokogiri tries to help out. If there are namespaces declared on the root node of a document, Nokogiri will automatically register those for you. You will still have to use the prefix when searching the document, but the URL registration is done for you.

Let’s modify Alice’s XML a little to demonstrate:

&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;inventory xmlns=&quot;http://alicesautosupply.example.com/&quot; xmlns:bike=&quot;http://bobsbikes.example.com.&quot;&gt;
  &lt;tire name=&quot;super slick racing tire&quot; /&gt;
  &lt;tire name=&quot;all weather tire&quot; /&gt;
  &lt;bike:tire name=&quot;skinny street&quot; /&gt;
&lt;/inventory&gt;

Using Nokogiri, these two statements are equivalent:

doc.xpath('//xmlns:tire',
  'xmlns' =&gt; 'http://alicesautosupply.example.com/'
)
doc.xpath('//xmlns:tire')

We can specify the namespace ourselves, or use the same name that Nokogiri picks for us.

Similarly, if we want to find bike tires, these two statements are equivalent:

doc.xpath('//bike:tire',
  'bike' =&gt; 'http://bobsbikes.example.com/'
)
doc.xpath('//bike:tire')
read more »