<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-2914337944584630860.post8923755908399249304..comments</id><updated>2009-01-24T12:35:22.278-08:00</updated><title type='text'>Comments on Jon Hart's Blog: Hawler, the Ruby crawler, 0.3 released</title><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog.spoofed.org/feeds/8923755908399249304/comments/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2914337944584630860/8923755908399249304/comments/default'/><link rel='alternate' type='text/html' href='http://blog.spoofed.org/2009/01/hawler-ruby-crawler-03-released.html'/><author><name>Jon Hart</name><uri>http://www.blogger.com/profile/02857880233692933624</uri><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>2</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-2914337944584630860.post-3850405169533891101</id><published>2009-01-24T12:35:00.000-08:00</published><updated>2009-01-24T12:35:00.000-08:00</updated><title type='text'>@postmodern:The obstacle course is a great idea!  ...</title><content type='html'>@postmodern:&lt;BR/&gt;&lt;BR/&gt;The obstacle course is a great idea!  I took a quick pass through it and it looks like the only part Hawler gets slightly confused on is the empty href tricks.  I'm torn as to whether this a bug in Hawler or the fault of URI::merge, which is responsible for making new URIs out of the page being processed and the newly encountered "link".  I've worked around this in the latest commit.&lt;BR/&gt;&lt;BR/&gt;Also, its good to see familiar names working on cool projects.  Congrats on Spidr!</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2914337944584630860/8923755908399249304/comments/default/3850405169533891101'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2914337944584630860/8923755908399249304/comments/default/3850405169533891101'/><link rel='alternate' type='text/html' href='http://blog.spoofed.org/2009/01/hawler-ruby-crawler-03-released.html?showComment=1232829300000#c3850405169533891101' title=''/><author><name>Jon Hart</name><uri>http://www.blogger.com/profile/03410754059921403771</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.spoofed.org/2009/01/hawler-ruby-crawler-03-released.html' ref='tag:blogger.com,1999:blog-2914337944584630860.post-8923755908399249304' source='http://www.blogger.com/feeds/2914337944584630860/posts/default/8923755908399249304' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-2914337944584630860.post-3607877378046954879</id><published>2009-01-16T00:42:00.000-08:00</published><updated>2009-01-16T00:42:00.000-08:00</updated><title type='text'>Hey, nice to see Hawler continue to grow into a ha...</title><content type='html'>Hey, nice to see Hawler continue to grow into a handy website crawling toolkit.&lt;BR/&gt;&lt;BR/&gt;I've also written a web spidering library, named &lt;A HREF="http://spidr.rubyforge.org/" REL="nofollow"&gt;Spidr&lt;/A&gt;. What could be useful to your Hawler project is that I've also written a &lt;A HREF="http://spidr.rubyforge.org/course/start.html" REL="nofollow"&gt;Web-Spider Obstacle Course&lt;/A&gt; for web-spiders. This course provides various foul HTML pages which the spider must navigate properly. There's also a &lt;A HREF="http://spidr.rubyforge.org/course/specs.json" REL="nofollow"&gt;JSON file&lt;/A&gt; that describes which links have to be followed/ignored/not-followed. I use the JSON file along with RSpec to test Spidr's ability at navigating rough HTML.&lt;BR/&gt;&lt;BR/&gt;More of my code can be found on &lt;A HREF="http://github.com/postmodern" REL="nofollow"&gt;GitHub&lt;/A&gt;.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2914337944584630860/8923755908399249304/comments/default/3607877378046954879'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2914337944584630860/8923755908399249304/comments/default/3607877378046954879'/><link rel='alternate' type='text/html' href='http://blog.spoofed.org/2009/01/hawler-ruby-crawler-03-released.html?showComment=1232095320000#c3607877378046954879' title=''/><author><name>postmodern</name><uri>http://houseofpostmodern.wordpress.com/</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.spoofed.org/2009/01/hawler-ruby-crawler-03-released.html' ref='tag:blogger.com,1999:blog-2914337944584630860.post-8923755908399249304' source='http://www.blogger.com/feeds/2914337944584630860/posts/default/8923755908399249304' type='text/html'/></entry></feed>