2009-04-23 @ 12:03
Namespaces in XML
Shit. This is a boring topic. Just writing the title made me cry a little bit out of boredom. Unfortunately this topic is something I feel compelled to write about because I think that most Ruby developers dealing with XML know very little about the topic, and yet XML namespaces are crucial when dealing with XML documents. So, in order to curb the boredom, I will attempt to demonstrate why we need namespaces and how they affect you when dealing with XML in the shortest amount of time. I will also try to sprinkle in a few swear words and innuendos just to make sure you’re paying attention.
A tale of two companies
One day, long ago, when XML was written on punch cards, Alice’s Auto Supply decided that they would distribute their inventory as XML so that other people would know what they had in stock. They came up with an XML document that looked like this:
<?xml version="1.0"?> <inventory> <tire name="super slick racing tire" /> <tire name="all weather tire" /> </inventory>
Excellent! Programmers started consuming the inventory for Alice’s shop. They could pull a list of tires from the document like this:
Bob’s Bike Shop also wanted to get on the XML broadcast bandwagon. So they followed suit and produced an XML document as well:
<?xml version="1.0"?> <inventory> <tire name="narrow street tire" /> <tire name="mountain trail tire" /> </inventory>
Again, programmers were happy. They started consuming inventory for Bob’s Bike Shop and getting a list of bike tires like this:
Everything was going well until someone decided to consume inventory from both sources. Fuck. There was no way to tell the difference between a car tire from Alice’s Auto Supply, or a bike tire from Bob’s Bike Shop. The search criteria used for both documents was the same:
How was one suppose to tell the difference between a bike tire and a car tire? There is a naming conflict, and this code would return both! Systems began crashing, punch cards were lit on fire, massive power outages occurred. Needless to say, our society was at it’s lowest point.
Fortunately, a very smart group of people (much smarter than me) came along and said “let’s associate these things to something unique, then we can tell them apart”. That unique bit of information was a URL. They also said “let’s have the ability to name the url so that we can easily reference it in our XML documents”. Fortunately, the names for a URL does not have to be unique since it is tied to the unique URL! Yay!
Alice got wind of these updates to the XML spec, and wanted to make sure that everyone could tell the difference between her car tires and some other tire. So she updated her inventory document, adding her URL with a name, and naming all of her inventory:
<?xml version="1.0"?> <inventory xmlns:car="http://alicesautosupply.example.com/"> <car:tire name="super slick racing tire" /> <car:tire name="all weather tire" /> </inventory>
Alice’s new inventory document was pushed out. The developers had not yet updated their code. They thought they had a new bug, the code was only returning tires from Bill’s Bike shop. But how to get the car tires? They had to inform the parser they were looking for tires associated with Alice’s url, and changed their code to look like this:
doc.xpath('//tire') doc.xpath('//aliceAuto:tire', 'aliceAuto' => 'http://alicesautosupply.example.com/' )
The first query returned tires that have no namespace and the second query returned tires that belong to Alice’s shop.
Alice’s inventory grew and grew (owing it all to namespacing her document of course). Prefixing everything in her document with “car” was taking a toll on her fingers as well as her puchcard supply. The XML superheros had a trick up their sleeves for Alice. They said that “URLs could be declared as a default” that way every tag could be associated with a URL but not explicitly declare a name.
Armed with this knowledge, Alice was able to change her XML to this:
<?xml version="1.0"?> <inventory xmlns="http://alicesautosupply.example.com/"> <tire name="super slick racing tire" /> <tire name="all weather tire" /> </inventory>
Her XML now stated that “inventory” and all tags inside “inventory” belonged to her URL, but did not need a prefix. Everything was still associated with her URL so that developers could tell the difference between her tires and all other tires, but she did not have to add prefixes.
As for the developers consuming her XML, they never noticed the change. Their code asked for all tires belonging to Alice’s URL, and Alice’s document still declared that those tires belonged to her URL. Punchcards were saved, carpel tunnel was cured, and the world rejoiced.
I hope this story hasn’t been too boring. Here are the key parts I wanted to explain:
- Namespaces prevent tag name collision. We would not be able to deal with colliding tag names without namespaces.
- When you don't specify a namespace in your search, that means you want tags with no namespace.
- Namespaces are tied to URLs. Only the URL must be unique which is why you must use the URL in your XPath queries.
Remember that there is a difference between asking for tags that belong to a namespace and ones that do not. Also remember that a default namespace in a document means that tags which do not explicitly have a namespace belong to the default. So even though those tags do not have a prefix you must use a namespace when querying for them.
Even though using namespaces is essential when searching an XML document, Nokogiri tries to help out. If there are namespaces declared on the root node of a document, Nokogiri will automatically register those for you. You will still have to use the prefix when searching the document, but the URL registration is done for you.
Let’s modify Alice’s XML a little to demonstrate:
<?xml version="1.0"?> <inventory xmlns="http://alicesautosupply.example.com/" xmlns:bike="http://bobsbikes.example.com."> <tire name="super slick racing tire" /> <tire name="all weather tire" /> <bike:tire name="skinny street" /> </inventory>
Using Nokogiri, these two statements are equivalent:
doc.xpath('//xmlns:tire', 'xmlns' => 'http://alicesautosupply.example.com/' ) doc.xpath('//xmlns:tire')
We can specify the namespace ourselves, or use the same name that Nokogiri picks for us.
Similarly, if we want to find bike tires, these two statements are equivalent:
doc.xpath('//bike:tire', 'bike' => 'http://bobsbikes.example.com/' ) doc.xpath('//bike:tire')