2009-02-07 @ 22:54

Custom CSS functions in Nokogiri

In CSS we can use different pseudo selectors like a:focus or a:hover to get access to links that have focus or are being hovered over. The CSS spec also defines a pseudo selector called :lang() which allows us to select an element based on it’s language.

The :lang() selector is interesting to us because the CSS parser in Nokogiri must support syntax like a:lang(‘en’), but it doesn’t care what token comes before the parenthesis. If it finds a function it doesn’t know about, it simply pushes it in to the libxml2 xpath engine, hoping that libxml knows what that function is.

We can take advantage of this behavior and define our own set of CSS pseudo selectors by defining functions for the xpath engine to use. Let’s look at some sample code and pick it apart.

Let’s say we have a document where we want to find all “a” tags based on a regular expression. We want to find all link tags with an href that matches a particular expression. Normally we would search the document for all “a” tags, then filter our list down. What if we could have the xpath engine filter our links for us? With Nokogiri, that is very easy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
doc = Nokogiri::HTML(<<-eohtml)
  <html>
    <body>
      <a href="http://example.com/">One</a>
      <a href="http://tenderlovemaking.com/">TLM</a>

      <div>
        <a href="http://example.org/">Two</a>
      </div>
    </body>
  </html
eohtml

doc.css('a:match_href("ampl")', Class.new {
  def match_href list, expression
    list.find_all { |node| node['href'] =~ /#{expression}/ }
  end
}.new).each do |node|
  puts node['href']
end[/sourcecode]

This works with basically a "method_missing" call in the xpath engine.  When the xpath engine finds a function it doesn't know, it asks Nokogiri, "do you know where this function is?".  Nokogiri looks for a method on your object with the same name.  If it can find one, it calls your method with the <strong>current matching items</strong> and feeds the return value of your function back in to the xpath engine.

The first argument passed to your method is always a list of the current matching nodes.  In this case, a list of "a" tags.  The rest of the arguments are the ones that you passed in to the function in your CSS expression.  In this case "ampl".

This functionality isn't limited to just CSS expressions.  You can use it in your XPath expressions too.  Here is the same expression using XPath:

{:lang="ruby"}

doc.xpath(‘//a[match_href(., “ampl”)]’, Class.new { def match_href list, expression list.find_all { |node| node[‘href’] =~ /#{expression}/ } end }.new).each do |node| puts node[‘href’] end[/sourcecode]

I find this functionality useful because it lets me define reusable search criteria. Many times when searching HTML, I find myself doing similar searches. This lets me refactor those searches in to a different class and keep my code DRY and my CSS expressions concise.

I hope you find this as useful as I do!

read more »

2009-02-26 @ 11:24

We need a new version of rake

We’ve all seen this warning: /Library/Ruby/Gems/1.8/gems/rake-0.8.3/lib/rake/gempackagetask.rb:13:Warning: Gem::manage_gems is deprecated and will be removed on or after March 2009.[/sourcecode] Rake is using a deprecated API from RubyGems. Jim knows about the problem, we just need to get him to release a new version of Rake.

In order to get Jim to release a new version of Rake, I have decided to start a letter writing campaign:

IMG_0150.JPG

If you would like to help get a new version of Rake released, I encourage you to send a letter to Jim, thanking him for his hard work on Rake, and asking him kindly to release the new “warning free” version.

Send your letters here:

Rake
c/o Jim Weirich
EdgeCase
1130 Congress Ave
Cincinnati, OH 45246

Together we can get Jim to release a new version of Rake. Together we can build a “warning free” future for our children.

read more »