Tenderlove Making

Custom CSS functions in Nokogiri

In CSS we can use different pseudo selectors like a:focus or a:hover to get access to links that have focus or are being hovered over. The CSS spec also defines a pseudo selector called :lang() which allows us to select an element based on it’s language.

The :lang() selector is interesting to us because the CSS parser in Nokogiri must support syntax like a:lang(‘en’), but it doesn’t care what token comes before the parenthesis. If it finds a function it doesn’t know about, it simply pushes it in to the libxml2 xpath engine, hoping that libxml knows what that function is.

We can take advantage of this behavior and define our own set of CSS pseudo selectors by defining functions for the xpath engine to use. Let’s look at some sample code and pick it apart.

Let’s say we have a document where we want to find all “a” tags based on a regular expression. We want to find all link tags with an href that matches a particular expression. Normally we would search the document for all “a” tags, then filter our list down. What if we could have the xpath engine filter our links for us? With Nokogiri, that is very easy:

~~~ ruby doc = Nokogiri::HTML(«-eohtml)

One TLM
Two
</html eohtml doc.css('a:match_href("ampl")', Class.new { def match_href list, expression list.find_all { |node| node['href'] =~ /#{expression}/ } end }.new).each do |node| puts node['href'] end[/sourcecode] This works with basically a "method_missing" call in the xpath engine. When the xpath engine finds a function it doesn't know, it asks Nokogiri, "do you know where this function is?". Nokogiri looks for a method on your object with the same name. If it can find one, it calls your method with the current matching items and feeds the return value of your function back in to the xpath engine. The first argument passed to your method is always a list of the current matching nodes. In this case, a list of "a" tags. The rest of the arguments are the ones that you passed in to the function in your CSS expression. In this case "ampl". This functionality isn't limited to just CSS expressions. You can use it in your XPath expressions too. Here is the same expression using XPath: ~~~ ruby doc.xpath('//a[match_href(., "ampl")]', Class.new { def match_href list, expression list.find_all { |node| node['href'] =~ /#{expression}/ } end }.new).each do |node| puts node['href'] end[/sourcecode] I find this functionality useful because it lets me define reusable search criteria. Many times when searching HTML, I find myself doing similar searches. This lets me refactor those searches in to a different class and keep my code DRY and my CSS expressions concise. I hope you find this as useful as I do!

« go back