- A parser with browser-like HTML correction schemes
- A DOM interface that mirrors a browsers DOM interface
Number 2, I believe, has been solved by nokogiri. As far as I can tell, the tree generated inside libxml2 is very similar to one found in the browser. Nokogiri was partly a Yak Shave for number 3. Since I had writing a DOM interface in mind, nokogiri’s api lends well to writing a DOM api.
Number 3 was partly solved this weekend. I’ve been working on a DOM api called taka. Taka sits a DOM api on top of nokogiri. The goal of the project is to mirror a browser’s DOM api in Ruby.
Enough talk. Let’s look at some codes. Take this HTML page for example: ~~~ html
Behold the Johnson
There are at least a few problems. This HTML code is, admittedly, carefully crafted. So far, taka only implements the DOM 1 interface. That means taka is missing many methods that are available in browsers. The good news though is that Taka is pure ruby and open source. As soon as you find methods that are missing, fork the repo, add a test, and send a pull request. I will be sure to merge it.