Working through my email today (trying to make a bit of space on the server), I happened upon a message from a listserv that I had skipped over a couple of months ago. Douglas Anderson (of the University of Alabama) was asking about RSS and the correct XML syntax to encode a canned URL to search a Voyager system by ISBN number. I got curious about what he might be up to and wrote a quick email asking if his question ever got answered. He quickly wrote back, letting me know that it wasn’t an issue any longer as the Unicode version of Voyager supported a BIB ID# search without a lot of fancy HTML coding.
He then pointed me to his Newbooks RSS feed.
http://www.lib.ua.edu/newnotable/newbksviarss.htm
Wow, I thought…what a great idea. I wrote again and he was nice enough to share a few thoughts with me. During that conversation (about things DBI, Voyager, Perl and so on) I realized that we were both satisfied users of Michael Doran’s newbooks software package for Voyager systems.
Not ready to try generating 300+ feeds like Douglas does down at Alabama, I decided to begin experimenting with trying to build an on-demand CGI generator for the information. Having just talked a bit about Michael Doran’s newbooks program, I realized he had already done much of the heavy lifting for this project (using a DBI interface to Oracle to pull the author, title, publication info, call number and shelving location for new books out of a Voyager system) had already been done. I was about to begin prying into the secrets of his code (Perl/DBI), when I noticed that the flat file his program generates (which I already run daily) could be used as the datasource for my CGI process…which made real-time generation much more feasible. I hacked together a short Perl process to query the newbooks.txt file and produce an RSS feed for a particular call number prefix. For my tests I went with QA.
Here’s a rough first cut…it creates an RSS feed and in each item there’s a link to our online catalog for more information on that title.
http://breeze.gmu.edu/cgi-bin/rss_qa.pl
It looks nice in Safari (which offers in-browser sort options) and Opera’s built-in RSS reader works OK (where each item looks like an email message) but it looks kinda odd/pointless in Firefox.
Two hours into teaching myself XML, I’ve already learned that you need to convert “&” signs into & or XML parsers balk. I can’t imagine what I’m going to discover next…
There’s more to do on this project but I hope to develop a CGI that lets the user specify the call number range and receive an RSS feed in return. Then our subject reference librarians can just include the URL on their webpages…
If you’d like the bit of perl I used for this experiment, drop me a line. If you have a better way to do this, please let me know…
Update: Have now managed to hack out a little program that takes an argument and sends out 15 new books that begin with that call number stem. Here’s the example for TK titles:
http://breeze.gmu.edu/cgi-bin/newrss.pl?TK
By Dorothea Salo September 21, 2005 - 8:34 am
I can tell you almost everything you never wanted to know about XML. If you run into any other problems, drop me a line. Oh, and you can borrow my Learning XML book if you want.
Yes, you have to encode ampersands, because otherwise XML parsers won’t know when character entities start. Less-than signs have to be encoded for similar reasons (not that you’ll typically find those in RSS feeds, fortunately).
I don’t know if anybody else is doing Voyager feeds. I’ll look into it, but I suspect you’re pioneering (to which I say, go you!).
By Dorothea Salo September 21, 2005 - 8:44 am
Works fine in Bloglines, just for an additional data point. One thing you may want to figure out how to do is limit the number of posts you’re putting out; I had 38 when I subscribed! The posts are so short I guess it’s not a major bandwidth hit, though.
Oh, and there’s an RSS/Atom feed validator at http://feedvalidator.org . Useful gizmo, especially when hand-generating XML. (Though Perl’s got bazillions of XML libraries. Why not let one of them worry about ampersands?)
You know, if we brought the liaison librarians’ blogs back in-house at some point, there might be an automagical way to mix in generated subject-specific feeds (like Anderson’s) with their actual posts. Might even be possible if we didn’t — some auto-post mechanism via the Blogger API, perchance. All you’d have to do is add a filtering mechanism (by call-number prefix, I assume, but you’d want to allow more than one prefix) to your existing code.
Has the code got its own web page? I think it could make friends fast if publicized. The Shifted Librarian blog would love it.
By Dorothea Salo September 21, 2005 - 8:57 am
Oh, sorry, duh, I wasn’t reading carefully — you already have this sorted by call-number prefix. So you’re already partway there. That’s neat.
Other ideas: If you want to do an actual new-books webpage based on this code, I can whomp up a CSS stylesheet for it. CSS can work on RSS feeds also (making the result look like an ordinary HTML page in most browsers); I’d just need you to insert a processing instruction pointing to the stylesheet.
By the way, my XML Hacks book has a chapter on RSS/Atom. (I think there’s now an RSS Hacks book out, but I don’t have it.) Aha, and it mentions a Perl module XML::RSS specifically designed for RSS generation.
http://search.cpan.org/author/KELLAN/XML-RSS , though if you search CPAN on RSS you get several more hits. I am not a Perlian, but the code in the XML Hacks book looks pretty darn simple, lots easier than hand-generating XML.
Book also has a Perl hack for posting RSS to a website.
Sorry for the multiple comments. I just think this is exciting!
By Dorothea Salo September 21, 2005 - 9:07 am
Hm. One more comment. Something weird is happening in Bloglines to the name of the author of the book Approximation theory using positive linear operators.
I can’t really tell why, frankly. Encoding issue? What character encoding is Voyager using? Or is it that Bloglines is trying and failing to compose the characters with the accents? I now see that the name Wolfgang has the same problem.
For our next trick, can we figure out how to do an RSS feed for MARS? I know there’s recent-submission code in there somewhere, because the community pages use it.