2006-05-24 @ 19:30
A Mechanize Spider
url("google.com"); &$_ >> _self while $_;
That is really amazing! I also can’t read it….. After looking at the Fear innards, I finally understood the code, so I tried to reproduce it with Mechanize. This is what I came up with:
agent = WWW::Mechanize.new stack = agent.get(ARGV).links while l = stack.pop stack.push(*(agent.click(l).links)) unless agent.visited? l.href end
To get this to work, I added the “visited?” method to the yet to be release 0.4.6 version of Mechanize. I’ve got a few more lines, but still pretty small. I still don’t like my spider though because it will visit any domain. I don’t really want it to try to read the entire internet, so I added the following line at the top of the while loop:
next unless l.uri.host == agent.history.first.uri.host
Can I make it shorter? I’m not sure yet. Do I want it to be shorter? Not sure about that either.