New Stuff in Ruby Mechanize 0.5.0

I've been working on the next pretty major release of Ruby WWW::Mechanize 0.5.0. I've decided to break some interfaces with this version, but I think it will all be for the better.

The first major change is that I've done is to unify the name space. There were a bunch of classes scattered around under WWW, like WWW::Link for instance. I've moved everything under WWW::Mechanize. The names will be a bit longer, but more consistent. This shouldn't break too much code unless the code specifically uses a class name.

One of the best new features, in my opinion, is the addition of Pluggable Parsers.

Pluggable parsers are classes used for parsing any particular content type. What this means is that by default a WWW::Mechanize::Page will only be returned when content of 'text/html' is returned. Any other content type will return a WWW::Mechanize::File object. For example:

agent.get('http://example.com/index.html')  # => WWW::Mechanize::Page
agent.get('http://example.com/article.pdf')   # => WWW::Mechanize::File

So why is this cool? Because when a PDF is requested, there shouldn't be a 'links' method on the object that is returned. Mechanize doesn't know how to find links in a PDF. So instead of throwing a nasty parse exception, the object that is returned doesn't have a links method!

But wait! There's more! You can define and use your own pluggable parser. In fact, the 'filter_body' methods have been removed in favor of this design. Let's say, for example, that you need to change all occurrences 'perl' to 'ruby' in html pages you could do something like this:

class PerlToRuby < WWW::Mechanize::Page
  def initialize(uri=nil, response=nil, body=nil, code=nil)
    super(uri, response, body.gsub(/perl/, 'ruby'), code)
  end
end
agent = WWW::Mechanize.new
agent.pluggable_parser['text/html'] = PerlToRuby
agent.get('http://search.cpan.org/')  # => PerlToRuby

Mechanize 0.5.0 even comes with a new Pluggable Parser called WWW::Mechanize::FileSaver that will automatically save the file fetched for you.

Another really cool feature is that the cookie jar can be serialized, and de-serialized. That way you can save your cookies between scripts. The jar is serialized as YAML so it is easy to read and modify. For example:

agent.get('http://google.com/')
agent.cookie_jar.save_as('cookies.yml') # Save the cookies
agent.cookie_jar.clear# Cookie jar will now be empty!
agent.cookie_jar.load('cookies.yml')  # Got all our cookies back!

There are a few other changes, but I think I've covered the major ones here. I should publish this code next week if I can't think of any other major changes I want to make.

2 Comments

  1. Tyler Broadbent
    Posted December 27, 2006 at 11:34 pm | Permalink

    Wow! Great set of changes and I love working with Mechanize. I’ve created a quick script that gets all the new transactions from my bank for a ruby budget application. Thanks for all your hard work in creating such an easy to use gem. The only suggestion I have is to provide better documentation, examples, and how-to’s. It took a lot of googling and reading other’s code to get the understanding I now have.

  2. John Bonges
    Posted May 5, 2007 at 5:13 am | Permalink

    is there an easy way to export cookies from a particular site from firefox ?

Post a Comment

Your email is never shared. Required fields are marked *

*
*
Check Spelling
Activate Spell Check while Typing