Nokogiri Is Released

Posted by – October 30, 2008

Hey internet. How are you doing? Ya. It’s been a while. I know, I know. I suck at blogging. Couldn’t you tell by my horrible layout? But seriously, I’ve been really busy lately. We used to have such good times together. I’d write a blog post, you would show it to everyone on the internet. But that spark just doesn’t seem to be there anymore.

Well, I’m doing my best to keep this relationship together. With the help of super awesome ruby hacker Mike Dalessio, I wrote an XML/HTML parsing library for ruby called Nokogiri. What is so great about Nokogiri? Well, for one it is really easy to parse HTML or XML:

require 'nokogiri'

doc = Nokogiri::HTML(<<-eohtml)
<html>
  <body>
    <div id="wrapper">
      <h1>Hello world</h1>
      <p>Paragraph</p>
    </div>
  </body>
</html>
eohtml

Oh, I know what you’re saying internet. Ya, sure, it’s easy to parse, but is it easy to search? Well, it is. I promise. You know XPath, right? Well you can search by XPath very easily:

doc.xpath('//p').each do |paragraph|
  puts paragraph.text
end

Oh, you don’t know XPath very well? That’s OK. I know you know CSS. You use it everywhere! I’ve viewed your source (*wink* *wink*). Well you can search using CSS selectors as well:

doc.css('div#wrapper').each do |div_with_wrapper_id|
  puts div_with_wrapper_id['id']
end

Oh, I see how it is. You don’t want to commit. You want to search with CSS selectors *and* XPath. Well fine. You can have that too. Just use the “search” method, and you can mix and match your selectors:

css.search('//p', 'div#wrapper') do |node|
  puts node.name
end

Well, I hope you’re feeling better about our relationship now. I just want to tell you that you shouldn’t worry about that old legacy code that uses Hpricot. Nokogiri can be used as a drop in replacement! Really! Nokogiri doesn’t reproduce the bugs that are in Hpricot, but should work in most cases. Just use “Nokogir::Hpricot()” to parse your HTML. Of course, I’ve tried to keep the syntax of Hpricot that I like. For example, you can use slashes for searching, subsearching:

(doc/'div').each { |div| puts div.at('p').text }

You even get a speed increase. For free!

Want to install Nokogiri? No problem. Just do “gem install nokogiri”. It’s that easy!

Well, now that we’re back together, why don’t you send some twitters if you like it! Thanks innernet. I promise to update you more often. I swear.

56 Comments on Nokogiri Is Released

Respond

  1. Kris says:

    Do you mean libxml2-dev and libxslt-dev?

  2. Kris says:

    [SOLVED] – many thanks!

    sudo apt-get update
    sudo apt-get install libxslt1-dev
    sudo apt-get install libxml2-dev
    sudo gem install nokogiri

  3. GenFoch says:

    Hi, I’m really new to ruby so please bear with me. I have installed ruby and rubygems and other gems I have installed do work. I have been requested to install nokogiri-1.2.3 but I get the following error:

    checking for xmlParseDoc() in -lxml2… no
    libxml2 is missing. try ‘port install libxml2′ or ‘yum install libxml2′

    however looking at /usr/lib I see libxml2.a and looking in it I can see DOCBparser which I think contains the xmlParseDoc, so it seems like everything is there.

    I have ruby 1.8.6 (2007-03-13 patchlevel 0) [i486-linux] on the system and updated to rubygems 1.3.2 in my attempt to solve this issue.

    I see there is a message like details. You may need configuration options.

    Provided configuration options:
    and a bunch of options but I’m not sure where to start.

    Thanks for your time.

  4. MikeW says:

    I’m having the same issue as GenFoch after installing Snow Leopard. I’ve even go so far as to try to install lxml2 manaully. I’ve already used port to install libxml2 and libxslt. Every solution I’ve seen says to install libxml2-dev and libxslt-dev, but those are solutions for other linux distros; not OS X. Any way to fix this on a Mac?

  5. @MikeW I run Snow Leopard, and it installs just fine for me.

    Would you mind sending an email to the mailing list that includes the version of nokogiri you’re trying to install, your “ruby -v”, and also the version of XCode you have installed? We can continue debugging it from there!

    Here is a link to the mailing list: http://groups.google.com/group/nokogiri-talk

  6. [...] a shout out to mechanize I will never use another screenscraping library again. It’s uses nokogiri (so it parses all the html into a nice xpath accessible form), it handles all the cookie session [...]

Respond

Comments

Comments