Mechanize One Liners

Posted by – May 26, 2006

I thought I’d try to come up with some useful one liners for Mechanize. Here goes:

Fetch a page and print to stdout:

puts WWW::Mechanize.new.get(ARGV[0]).body

List all links in a page:

WWW::Mechanize.new.get(ARGV[0]).links.each { |l| puts l.text }

Visit all links on a page:

(a = WWW::Mechanize.new).get(ARGV[0]).links.each { |l| puts a.click(l).body }

List all links that match a pattern:

WWW::Mechanize.new.get(ARGV[0]).links.text(/[a-z]/).each { |l| puts l.text }

Visit all links that match a pattern:

(a = WWW::Mechanize.new).get(ARGV[0]).links.text(/[a]/).each { |l| puts a.click(l).body }

Smaller Spider:

(mech = WWW::Mechanize.new).get(ARGV[0])
(a = lambda { |p|
  mech.page.links.each { |l| mech.click(l) && p.call(p) if ! mech.visited? l }
}).call(a)
1 Comment on Mechanize One Liners

Respond

  1. [...] writing a flexible spider with plenty of call-backs and filters, based on Arron Pattersons’ four line web spider using WWW::Mechanize. Soon I realized a major problem with this approach, WWW::Mechanize caches [...]

Respond

Comments

Comments