A friend of mine pointed me at the Fear perl module today, and it has inspired me some on Mechanize. I couldn’t believe the size of a spider using the Fear API:
&$_ >> _self while $_;
That is really amazing! I also can’t read it….. After looking at the Fear innards, I finally understood the code, so I tried to reproduce it with Mechanize. This is what I came up with:
stack = agent.get(ARGV[0]).links
while l = stack.pop
stack.push(*(agent.click(l).links)) unless agent.visited? l.href
end
To get this to work, I added the “visited?” method to the yet to be release 0.4.6 version of Mechanize. I’ve got a few more lines, but still pretty small. I still don’t like my spider though because it will visit any domain. I don’t really want it to try to read the entire internet, so I added the following line at the top of the while loop:
Can I make it shorter? I’m not sure yet. Do I want it to be shorter? Not sure about that either.
How about an example of using Mechanize with client certificates? Looking over the mechanize.rb source, I see that they are supported but not sure what a ‘best practices’ type of script would look like.
Thanks for making mechanize!
Regards,
Jim