Craigslist feed aggregator 2

Posted by Joel Jensen Mon, 21 Aug 2006 01:14:29 GMT

Ok I am done with the scooterbot thing. Here is the last iteration, I was also honing up on ruby/ xml/ databases …

Its a feed aggregator that searches all craigslist cities for a term, stores the results in a mysql database, creates an rss feed and uploads it to a server. Though while writing this I realize I could get rid of the mysql database and use the xml as a datasource. But no, I’m done. Heres the code if its useful

I wrote an automator script to launch this. Run shell script ::

cd to_your_directory
./the_scripts_filename

When it completes launch safari and visit your rss feed.

#!/usr/bin/ruby
require 'mysql'
require 'rexml/document'
require 'time'
require 'open-uri'
require 'URI'

host = <host>
password = <password>
user = <login>
cd_to = <your www directory>
saved_file_name = <filename>
searchterm = 'aprilia'


# CREATE DATABASE CRAIGSLIST;
# CREATE TABLE `items` (
#   `id` int(11) NOT NULL default '0',
#   `title` varchar(255) default NULL,
#   `url` varchar(255) default NULL,
#   `date` datetime default NULL,
#   `xml` text,
#   `viewed` int(11) default '0'
# ) ENGINE=MyISAM DEFAULT CHARSET=utf8


header = <<EOF
<rss version='0.91'
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://purl.org/rss/1.0/"
  xmlns:ev="http://purl.org/rss/1.0/modules/event/"
  xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:admin="http://webns.net/mvcb/"
  >
 <channel>
  <title>Where the hell is my scooter? bot</title>
  <link>www.nervetree.com/scooter.xml</link>
  <description>Where the hell is my scooter? bot</description>
  <language>us-en</language>
  <copyright>None</copyright>
  <managingEditor></managingEditor>
  <webMaster></webMaster>
  <image>
   <title></title>
   <url></url>
   <width></width>
   <height></height>
   <description>Where the hell is my scooter? bot</description>
  </image>
</channel>
</rss>
EOF


here = Dir.pwd
area_ids = [1,127,231,200,207,51,244,18,57,293,100,63,187,43,189,104,7,285,96,102,103,209,188,12,8,1,191,62,97,208,210,13,287,315,288,44,168,281,193,10,238,236,125,219,80,20,39,203,237,186,37,124,258,14,256,257,205,28,52,190,11,224,223,225,229,227,226,45,228,98,307,280,99,133,58,199,283,284,31,206,169,34,4,239,173,240,172,22,259,129,261,212,309,260,262,255,19,316,230,134,222,30,221,29,192,282,55,26,92,198,170,286,50,218,59,248,40,249,201,250,3,126,130,247,171,41,273,61,36,274,272,196,251,35,27,42,131,204,252,54,70,233,94,216,9,232,275,166,279,167,277,17,33,278,276,180,38,128,101,253,254,195,220,202,46,32,269,15,264,266,265,21,132,23,271,267,263,268,53,308,270,292,56,93,291,290,48,60,289,217,2,95,246,194,243,242,241,165,47,197]

feeds = area_ids.collect{|aid| 
    "http://www.craigslist.org/cgi-bin/search?areaID=#{aid}&subAreaID=&query=#{URI.escape(searchterm)}&catAbbreviation=sss&format=rss"
}

$my = Mysql.connect('localhost', 'root')
$my.select_db('craigslist')

feeds.each {
    |feed|
    ids = []
    results = $my.query('select id from items')
    results.each{|result| ids << result[0]}

    rss = open(feed).read()
    xml = REXML::Document.new(rss)
    xml.elements.each("//item") { |item|
      id = /(\d+).html/.match(item.attributes['about'])[1]
      url = item.attributes['about']

      if ids.include?(id) == false
        st = $my.prepare("insert into items (id,title,url,date,xml) values (?,?,?,?,?)")
        st.execute(id,item.elements['title'].text,item.elements['link'].text,item.elements['dc:date'].text,item.to_s)
        st.close
      end
    }
}

xmlstuff = []
results = $my.query('select xml from items order by date desc')
results.each{|result| xmlstuff << result[0]}
$my.close

# header = File.new here+'/rss_header.xml'
maindoc = REXML::Document.new( header )
rt = maindoc.elements['rss/channel']
xmlstuff.each{|e| 
  elementdoc = REXML::Document.new(e)
  rt.add_element elementdoc 
}

scooter = StringIO.new ""
maindoc.write scooter
scooter.rewind

require 'net/ftp'
ftp = Net::FTP.new(host)
ftp.login(user,password)
ftp.passive = true
files = ftp.chdir(cd_to)
ftp.storlines("STOR " + saved_file_name , scooter)
ftp.close
p 'done'
Comments

Leave a comment

  1. Oyun over 3 years later:

    thanks admin. nice good article.

  2. wedding dresses over 3 years later:

    Your post gives useful and sensible information for someone who is thinking of venturing into home based business enterprise.thanks for your great work

Comments