RSS feed from Voyager

Had a request so here is the perl code for creating the ‘on-the-fly’ RSS feed, based on a call number prefix.

This script uses the intermediate newbooks.txt file that Michael Doran’s newbooks program creates. It isn’t necessarily the best perl code but if you realize that, you probably have the power to fix it up. I gave some thought to optimizing the thing (in terms of where processing gets done) but I’m sure it could be tuned further.

Basically, the URL to call the feed looks like this:

http://myserver.edu/cgi-bin/newrss.pl?QA76

which will produce a feed of books that have QA76 in the call number. It uses the ’squashed’ callnumber (without any spaces) to do the comparison. Anything after the ‘?’ in the URL must be at the start of the call number to be retrieved.

#!/m1/shared/perl/5.8.5-09/bin/perl
# this program processes a flat file created by M. Doran's
# newbooks system (newbooks.txt)
# author of this program:
# w. grotophorst, (c) 2005, Lost Packet Planet
# Program may be freely copied, modified & improved.
#########################################
#
# less variable variables
#
$fromlink = "http://lso.gmu.edu/index.php";
$inputfile = "newbooks.txt";
$NumToFeed = "15";
$URL2Voyager = "http://magik.gmu.edu/cgi-bin/Pwebrecon.cgi?BBID=";

#########################################

$ToFind = $ENV{'QUERY_STRING'};
$ToFind =~ tr/+/ /;
$ToFind =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
$ToFind =~ tr/\cM/\n/;
$ToFind =~ s/[a-z]/[A-Z]/;

$titlestr = "New Books - University Libraries - ".$ToFind;
open(INFILE,$inputfile);
print "Content-type: text/xml\n\n";

print "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
print "<!DOCTYPE rss PUBLIC ";
print "\"-//Netscape Communications//DTD RSS 0.91//EN\"\n";
print "\"http://my.netscape.com/publish/formats/rss-0.91.dtd\">\n";
print "<rss version=\"0.91\">\n";
print "<channel>\n";
print "<title>$titlestr</title>\n";
print "<link>$fromlink</link>\n";

$line = <INFILE>;
$foundit = 0;
$numfed = 0;

while ($line ne ""){

if ($numfed < $NumToFeed) {

# check string to see if characters matching call# stem appear anywhere, if
# not, go on to next line from newbooks.txt 

$t = index($line,$ToFind);

if ($t >= 0) {

    # call # stem is in the line, blow it apart & see if it is actually in call#
    # section...the 8th data element that M. Doran's system puts on the line
    $line =~ s/&/&amp;/g;
    $line =~ s/>/&gt;/g;
    $line =~ s/</&lt;/g;
    $line =~ s/"/&quot;/g;
    $line =~ s/'/&apos;/g;

    @itemdata = split(/\t/,$line);

    $call = @itemdata[7];

    $t = index($call,$ToFind);
    if ($t == 0) {

    # now assign other 'split' values from the itemdata array

    $bibid = @itemdata[0];
    $author= @itemdata[1];
    $title = @itemdata[2];
    $publ  = @itemdata[4];
    $location = @itemdata[5];

    $numfed++;

    print "<item>\n";
    print "<title>$title</title>\n";
    print "<link>$URL2Voyager$bibid</link>\n";
    print "<description>$author $title $publ $location $call</description>\n";
    print "</item>\n";
    }
   }
  }
 $line = <INFILE>;
 }

print "</channel>\n</rss>\n";
close INFILE;

Codingmonkeysp.s., You may not know how hard it was to get this code listing to appear correctly in this blog entry…but if you do then this will qualify as “tip of the week.”
I tried a couple of Wordpress plugins but nothing seemed to work right (code was being rendered as HTML or worse, WordPress was making bad assumptions about what I was trying to do). Finally, I opened the perl script in SubEthaEdit on my desktop (to do some line shortening) and when I selected “all” and got ready to copy it back to an xterm window, there was the option I needed, had never been more than a simple right mouse click away: “Copy as XHTML“. SubEthaEdit even threw in a bit of XHTML markup to make the little black box around the entry. Yet another reason that’s a great editor.

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

9 Comments so far

  1. Fred on September 27th, 2005

    Thanks for the SubEthaEdit tip. I’ve had the same problem more than once and that’s an easy fix.

  2. Abbey Warner on October 26th, 2005

    I’d like to give this Voyager to RSS a go on our system (using Michael Doran’s newbooks.txt), but have questions about what to put on these lines:

    $fromlink = http://lso.gmu.edu/index.php; (what’s this do?)

    $inputfile = “/opt/www-ctl/cgi-bin/newbooks.txt”; (this is just the path to newbooks.txt, right? or is it the file OUT of newbooks and into an intermediary?)

  3. [...] Ken Varnum points to a posting by Wally Grotophorst, Associate University Librarian at George Mason University’s library, who posted “a small Perl application that searches his Voyager online catalog for a specific Library of Congress call number and returns the results as an RSS feed.” RSS4Lib:: On-The-Fly RSS by LC Number for Voyager [...]

  4. Ryan Edwards on March 17th, 2006

    Hello,

    I replaced the paths to $fromlink, $inputfile, and $Url2Voy variables (with our urls) and when I tried to execute the script from our Voyager server, I received an error message stating, ‘unrecognizable character \x91′. Do you know what might be causing this?

  5. Mickey Soltys on December 4th, 2007

    Thanks for making this available. I think though, that the comparison if ($t=1) which appears before the comment
    # now finish filling itemdata array is really not what you want. I think you should have if ($t == 0) instead.

    $t = 1 is assigning the value of 1 to $t and will always be true.

  6. Wally on December 4th, 2007

    Thanks Mickey…and a few others I heard from. You’re quite right, I should have realized that. I wonder if I dropped an ‘=’ sign when I was converting the code to XHTML to post on this blog?

  7. Mickey Soltys on December 5th, 2007

    It needs to be $t==0 though, not $t==1.

  8. Wally on December 5th, 2007

    Sorry, I wasn’t paying enough attention to this when it came up the other day. Yes, it should be $t==0 and I’ve now updated the page to reflect that.

    there’s a test instance you can try out at this URL

    http://breeze.gmu.edu/cgi-bin/newrss.pl?XXX

    where XXX is the call number stem you want to retrieve.

  9. Adam Shire on May 22nd, 2008

    I think: s/[a-z]/[A-Z]/

    should be: tr/[a-z]/[A-Z]/

Leave a reply