Craigslist feed aggregator 2
Ok I am done with the scooterbot thing. Here is the last iteration, I was also honing up on ruby/ xml/ databases …
Its a feed aggregator that searches all craigslist cities for a term, stores the results in a mysql database, creates an rss feed and uploads it to a server. Though while writing this I realize I could get rid of the mysql database and use the xml as a datasource. But no, I’m done. Heres the code if its useful
I wrote an automator script to launch this. Run shell script ::
cd to_your_directory
./the_scripts_filename
When it completes launch safari and visit your rss feed.
#!/usr/bin/ruby
require 'mysql'
require 'rexml/document'
require 'time'
require 'open-uri'
require 'URI'
host = <host>
password = <password>
user = <login>
cd_to = <your www directory>
saved_file_name = <filename>
searchterm = 'aprilia'
# CREATE DATABASE CRAIGSLIST;
# CREATE TABLE `items` (
# `id` int(11) NOT NULL default '0',
# `title` varchar(255) default NULL,
# `url` varchar(255) default NULL,
# `date` datetime default NULL,
# `xml` text,
# `viewed` int(11) default '0'
# ) ENGINE=MyISAM DEFAULT CHARSET=utf8
header = <<EOF
<rss version='0.91'
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
xmlns:ev="http://purl.org/rss/1.0/modules/event/"
xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:admin="http://webns.net/mvcb/"
>
<channel>
<title>Where the hell is my scooter? bot</title>
<link>www.nervetree.com/scooter.xml</link>
<description>Where the hell is my scooter? bot</description>
<language>us-en</language>
<copyright>None</copyright>
<managingEditor></managingEditor>
<webMaster></webMaster>
<image>
<title></title>
<url></url>
<width></width>
<height></height>
<description>Where the hell is my scooter? bot</description>
</image>
</channel>
</rss>
EOF
here = Dir.pwd
area_ids = [1,127,231,200,207,51,244,18,57,293,100,63,187,43,189,104,7,285,96,102,103,209,188,12,8,1,191,62,97,208,210,13,287,315,288,44,168,281,193,10,238,236,125,219,80,20,39,203,237,186,37,124,258,14,256,257,205,28,52,190,11,224,223,225,229,227,226,45,228,98,307,280,99,133,58,199,283,284,31,206,169,34,4,239,173,240,172,22,259,129,261,212,309,260,262,255,19,316,230,134,222,30,221,29,192,282,55,26,92,198,170,286,50,218,59,248,40,249,201,250,3,126,130,247,171,41,273,61,36,274,272,196,251,35,27,42,131,204,252,54,70,233,94,216,9,232,275,166,279,167,277,17,33,278,276,180,38,128,101,253,254,195,220,202,46,32,269,15,264,266,265,21,132,23,271,267,263,268,53,308,270,292,56,93,291,290,48,60,289,217,2,95,246,194,243,242,241,165,47,197]
feeds = area_ids.collect{|aid|
"http://www.craigslist.org/cgi-bin/search?areaID=#{aid}&subAreaID=&query=#{URI.escape(searchterm)}&catAbbreviation=sss&format=rss"
}
$my = Mysql.connect('localhost', 'root')
$my.select_db('craigslist')
feeds.each {
|feed|
ids = []
results = $my.query('select id from items')
results.each{|result| ids << result[0]}
rss = open(feed).read()
xml = REXML::Document.new(rss)
xml.elements.each("//item") { |item|
id = /(\d+).html/.match(item.attributes['about'])[1]
url = item.attributes['about']
if ids.include?(id) == false
st = $my.prepare("insert into items (id,title,url,date,xml) values (?,?,?,?,?)")
st.execute(id,item.elements['title'].text,item.elements['link'].text,item.elements['dc:date'].text,item.to_s)
st.close
end
}
}
xmlstuff = []
results = $my.query('select xml from items order by date desc')
results.each{|result| xmlstuff << result[0]}
$my.close
# header = File.new here+'/rss_header.xml'
maindoc = REXML::Document.new( header )
rt = maindoc.elements['rss/channel']
xmlstuff.each{|e|
elementdoc = REXML::Document.new(e)
rt.add_element elementdoc
}
scooter = StringIO.new ""
maindoc.write scooter
scooter.rewind
require 'net/ftp'
ftp = Net::FTP.new(host)
ftp.login(user,password)
ftp.passive = true
files = ftp.chdir(cd_to)
ftp.storlines("STOR " + saved_file_name , scooter)
ftp.close
p 'done'
thanks admin. nice good article.
Your post gives useful and sensible information for someone who is thinking of venturing into home based business enterprise.thanks for your great work