Shit. This is a boring topic. Just writing the title made me cry a little bit out of boredom. Unfortunately this topic is something I feel compelled to write about because I think that most Ruby developers dealing with XML know very little about the topic, and yet XML namespaces are crucial when dealing with XML documents. So, in order to curb the boredom, I will attempt to demonstrate why we need namespaces and how they affect you when dealing with XML in the shortest amount of time. I will also try to sprinkle in a few swear words and innuendos just to make sure you’re paying attention.
A tale of two companies
One day, long ago, when XML was written on punch cards, Alice’s Auto Supply decided that they would distribute their inventory as XML so that other people would know what they had in stock. They came up with an XML document that looked like this:
<?xml version="1.0"?> <inventory> <tire name="super slick racing tire" /> <tire name="all weather tire" /> </inventory>
Excellent! Programmers started consuming the inventory for Alice’s shop. They could pull a list of tires from the document like this:
doc.xpath('//tire')
Bob’s Bike Shop also wanted to get on the XML broadcast bandwagon. So they followed suit and produced an XML document as well:
<?xml version="1.0"?> <inventory> <tire name="narrow street tire" /> <tire name="mountain trail tire" /> </inventory>
Again, programmers were happy. They started consuming inventory for Bob’s Bike Shop and getting a list of bike tires like this:
doc.xpath('//tire')
Everything was going well until someone decided to consume inventory from both sources. Fuck. There was no way to tell the difference between a car tire from Alice’s Auto Supply, or a bike tire from Bob’s Bike Shop. The search criteria used for both documents was the same:
doc.xpath('//tire')
How was one suppose to tell the difference between a bike tire and a car tire? There is a naming conflict, and this code would return both! Systems began crashing, punch cards were lit on fire, massive power outages occurred. Needless to say, our society was at it’s lowest point.
Fortunately, a very smart group of people (much smarter than me) came along and said “let’s associate these things to something unique, then we can tell them apart”. That unique bit of information was a URL. They also said “let’s have the ability to name the url so that we can easily reference it in our XML documents”. Fortunately, the names for a URL does not have to be unique since it is tied to the unique URL! Yay!
Alice got wind of these updates to the XML spec, and wanted to make sure that everyone could tell the difference between her car tires and some other tire. So she updated her inventory document, adding her URL with a name, and naming all of her inventory:
<?xml version="1.0"?> <inventory xmlns:car="http://alicesautosupply.example.com/"> <car:tire name="super slick racing tire" /> <car:tire name="all weather tire" /> </inventory>
Alice’s new inventory document was pushed out. The developers had not yet updated their code. They thought they had a new bug, the code was only returning tires from Bill’s Bike shop. But how to get the car tires? They had to inform the parser they were looking for tires associated with Alice’s url, and changed their code to look like this:
doc.xpath('//tire')
doc.xpath('//aliceAuto:tire',
'aliceAuto' => 'http://alicesautosupply.example.com/'
)
The first query returned tires that have no namespace and the second query returned tires that belong to Alice’s shop.
Alice’s inventory grew and grew (owing it all to namespacing her document of course). Prefixing everything in her document with “car” was taking a toll on her fingers as well as her puchcard supply. The XML superheros had a trick up their sleeves for Alice. They said that “URLs could be declared as a default” that way every tag could be associated with a URL but not explicitly declare a name.
Armed with this knowledge, Alice was able to change her XML to this:
<?xml version="1.0"?> <inventory xmlns="http://alicesautosupply.example.com/"> <tire name="super slick racing tire" /> <tire name="all weather tire" /> </inventory>
Her XML now stated that “inventory” and all tags inside “inventory” belonged to her URL, but did not need a prefix. Everything was still associated with her URL so that developers could tell the difference between her tires and all other tires, but she did not have to add prefixes.
As for the developers consuming her XML, they never noticed the change. Their code asked for all tires belonging to Alice’s URL, and Alice’s document still declared that those tires belonged to her URL. Punchcards were saved, carpel tunnel was cured, and the world rejoiced.
Conclusion
I hope this story hasn’t been too boring. Here are the key parts I wanted to explain:
-
Namespaces prevent tag name collision.
We would not be able to deal with colliding tag names without namespaces. - When you don’t specify a namespace in your search, that means you want tags with no namespace.
- Namespaces are tied to URLs. Only the URL must be unique which is why you must use the URL in your XPath queries.
Remember that there is a difference between asking for tags that belong to a namespace and ones that do not. Also remember that a default namespace in a document means that tags which do not explicitly have a namespace belong to the default. So even though those tags do not have a prefix you must use a namespace when querying for them.
Bonus Round
Even though using namespaces is essential when searching an XML document, Nokogiri tries to help out. If there are namespaces declared on the root node of a document, Nokogiri will automatically register those for you. You will still have to use the prefix when searching the document, but the URL registration is done for you.
Let’s modify Alice’s XML a little to demonstrate:
<?xml version="1.0"?> <inventory xmlns="http://alicesautosupply.example.com/" xmlns:bike="http://bobsbikes.example.com."> <tire name="super slick racing tire" /> <tire name="all weather tire" /> <bike:tire name="skinny street" /> </inventory>
Using Nokogiri, these two statements are equivalent:
doc.xpath('//xmlns:tire',
'xmlns' => 'http://alicesautosupply.example.com/'
)
doc.xpath('//xmlns:tire')
We can specify the namespace ourselves, or use the same name that Nokogiri picks for us.
Similarly, if we want to find bike tires, these two statements are equivalent:
doc.xpath('//bike:tire',
'bike' => 'http://bobsbikes.example.com/'
)
doc.xpath('//bike:tire')


I’m going to read this as a bedtime story to my children one day.
Don’t do that! They’ll have nightmares.
Bugger, I still don’t think I get the point of namespaces within XML. Boo to me.
What’s the meaning of the URI in the xmlns:car=”http://alicesautosupply.example.com/” definition? Why do they look like URLs but aren’t meant to point to any resource on the intertubes?
@Dr Nic: to avoid name collisions. Imagine if the two companies in my story were to merge XML documents. You would still want to be able to tell the difference between car tires and bike tires. It wouldn’t be possible without either changing the tag name, or adding a namespace.
I like to think of the namespace as a standardized attribute that is on every node.
@Dr Nic: Just to make them unique. You could make your URL point somewhere useful, but the main point is to be unique.
So why namespacing vs element attributes?
e.g.
@Dr Nic: You could, but then you would have to standardize on an attribute name. Everyone would have to agree on one particular attribute name to examine. If you gave me your XML document, you would have to tell me to examine the “type” attribute, where someone else might choose the “class” attribute. The namespace attribute is standardized.
Not to mention in your example, the attribute value would have to be unique. I couldn’t tell the difference between a car tire from Ford and a car tire from GM.
Maybe a dumb question, but do you have to declare the bindings on every xpath method call, or can you register them independently (which can be useful, particularly if you’re dealing with a lot of content with default namespaces)?
How would I find the inventory node for a specific namespace?
If we put the two together:
What query would I use to find Bob’s inventory?
Um, should this be doc.xpath, etc.?
doc.xml(‘//tire’)
doc.xml(‘//aliceAuto:tire’,
‘aliceAuto’ => ‘http://alicesautosupply.example.com/’
)
@john yes, good catch. I will fix. Thanks!
I feel like an idiot for not having discovered this explanation earlier (here or anywhere else!). Thanks for this explanation and special thanks for giving Nokogiri examples. My Ruby n00b ass has a much better understanding now!
Aaron,
This article was REALLY helpful for me. Nokogiri is a godsend and i just wanted to thank you for writing in a way that makes it fun/easy to understand while still driving home the key points.