Archive for July, 2007

VuFind update

A few days ago I mentioned the VuFind project and since then time I’ve been trying to find the time to get an installation running locally. After switching from SuSE to Ubuntu (and after getting more than a couple of really timely bits of information from Wayne Graham down at William and Mary) I now have at least a portion of the VuFind software working:

http://zoombox.gmu.edu/vufind

This little sample installation (running on a 1.8Ghz Athlon-based Shuttle with just 512Mb memory) is built with about 231,000 78,000 35,000+ bib records from our Voyager system (items added or modified between January 1, 2004 and the present). What’s not yet working is the link back to Voyager to get rid of those little “loading…” messages and replace them with holdings and circulation information. Also clearly there are a few XML and XSL issues which prevent full-record display of an individual item [fixed that].

But the part that is working is pretty cool—like the faceted search options built along the right side of the page.

Will tackle the Voyager link-back piece next—unfortunately, I’m having just a bit of trouble getting the PHP/Oracle software installed (PDO_OCI). Would go much faster if I actually knew what I was doing but happily iterative hacking through trial and error does work—it just takes longer.

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Completely Automated Public Turing Test…

Inode Captcha
You’ve seen those little authentication tests on web-based forms, probably more this year than last. They’re called CAPTCHAs which is an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart. Just a quick bit of history—in 1950 Alan Turing (considered the father of modern computer science and credited with breaking the Enigma ciphers during WWII), created a test to determine whether a computer demonstrated thought. The test was simple…with a computer hidden in one room and a human hidden in the other, converse with each. If you can’t tell which is the human, then the computer is demonstrating thought.

The CAPTCHA is really a reverse Turing test (we want to prove that you’re a human) but you get the idea.

Today, I added a CAPTCHA to the comments for this blog but it’s a very special, library-oriented one. This CAPTCHA (actually called reCAPTCHA) is helping digitize books for the Internet Archive.

How? It shows two words: one it knows and one that it had difficulty resolving during the OCR process. You’re asked to type in both words as you read the image. Assuming your input matches the known meaning of the first word, the CAPTCHA assumes you’ve correctly interpreted the second term as well and uses that input to correct the OCR work. You’re authenticated (you must be a human, not a spam-bot) and you’ve done a small bit of work to help bring knowledge to the masses! I’d like to see more library-related sites use this system and will add it to our library’s “comment/suggestion” form soon.

http://recaptcha.net

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Another seven percent solution

I’m sure we’ve all seen estimates of the ratio of “good” email to spam. What I offer today is a set of data points from George Mason University.

In December of 2004 (when our first spam filter was introduced) our mail server received 1.9 million “good” email messages (good defined only as making it through the spam filter—you have to assume some percentage of that was really clever spam) and 5.6 million spam messages (including, no doubt, a few false positives). Expressed in percentages, 26% of the mail we received was legitimate.

Fast forward to May, 2007 the percentage of “good” email has fallen to a mere 7% of received messages (8 million good messages and 100.2 million spam).

Here are the statistics since December 2004:

spam_stats.jpg

Quite a spike in spamming during April and May 2007. Will know in a few months whether it was just a “spam storm” or actually the beginning of the final deluge. Without knowing that, I think it is still easy and safe to say that we’re approaching the point where receiving any email that isn’t spam will be a statistically significant event.

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

VuFind

vufind.jpgA new Web 2.0 version of your OPAC is on the horizon (well, here now if you’re into software with version numbers like “beta 0.5″). From the vufind.org site:

VuFind is a library resource portal designed and developed for libraries by libraries. The goal of VuFind is to enable your users to search and browse through all of your library’s resources by replacing the traditional OPAC to include:

  • Catalog Records
  • Digital Library Items
  • Institutional Repository
  • Institutional Bibliography
  • Other Library Collections and Resources

VuFind is completely modular so you can implement just the basic system, or all of components. And since it’s open source, you can modify the modules to best fit your need or you can add new modules to extend your resource offerings.

Power

VuFind runs on Solr Energy. Apache Solr, an open source search engine, offers amazing performance and scalability to allow for VuFind to respond to search queries in milliseconds time. It has the ability to be distributed if your need is to distribute the load of the catalog over many servers or in a server farm environment.

Click here for a demo.

I’ll give a report (and perhaps a demo link) to our catalog in a few weeks—it will take me that long to find the time to install the software and see if things can be made to work—but it looks very promising. Happily, since Villanova (where VuFind originates) is a Voyager site, the connector to a Voyager catalog is already done. Kudos to Andrew Nagy and his group for taking this very exciting step toward fixing the OPAC with open source tools.

http://www.vufind.org


Add to Del.icio.us Add to Technorati Stumble Upon Digg This

What if we had shareholders…

Read an interesting post today from Peter Brantley on O’Reilly Radar. A few excerpts:

I have wondered in other blogs (see: “Lost Cathedrals: Libraries and Steel“) whether libraries might (to put it crassly) turn into acquisition agencies for licensed content, with small cafes on their ground floors or basements, existing in the physical realm primarily to serve as community centers for students.

and this:

yesterday my friend Jerry McDonough of the University of Illinois’ Graduate School of Library and Information Science forwarded me a talk that he gave recently at the British Library called, “We Are Not Alone: The Role of the Research Library in a Suddenly Crowded Information Universe.” It contained some slides that made my eyes open very wide. His explanation of the slides is better than I could provide, so I’ve replicated the analyses on my own, uploaded them below, and with his permission, interleaved his narrative.

Brantley wonders if, like newspapers, libraries are in the midst of a gut-wrenching, brake-screeching exercise in redefinition. Well, yeah, I’d say so. I’ll suggest reading the rest of this post and contemplating the graphs that accompany it.

http://radar.oreilly.com/archives/2007/07/if_libraries_ha.html

 

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Torx and Torked…

replacedrive.jpgOh, I am beginning to think it is possible to overcome one’s inner klutz. Today I performed my second successful disk drive transplant on a MacBook Pro. This latest was done to install the quite new Hitachi 7200 200GB drive (a few weeks ago I replaced another machine’s drive with the new Seagate 7200 160GB Momentus).

As noted in a recent BareFeats posting (and confirmed by my testing once the install was complete), this Hitachi is one fast drive. Quiet too.Turns out it’s not all that difficult to replace a drive in a MacBook pro (Core2Duo) but you need to pay attention to all the different screws (and there are quite a few). I used the screw-by-screw takeapart guide on iFixit.com and had very little trouble. The guide covers the CoreDuo version and there are very slight differences in the Core2Duo machines (it’s actually a bit easier). The one recommendation I’ll make (and it worked out well for me both times) is to tape the screws on a sheet of paper as you remove them from the computer, writing down where they came from. The tape will keep them from flying into the carpet (many are really small) and it’s a big help when the time comes to reassemble.

Grab a copy of SuperDuper or Carbon Copy Cloner to handle data transfer. Either of these utilities offers the ability to clone a drive. CCC is donationware while SuperDuper requires registration to unlock additional features beyond full clones.Those PBS videos Today we loaded 498 or so MARC records into our catalog, each describing a particular PBS video and each containing a link to a PHP script which redirects the browser to our QuickTime server for streaming. I wasn’t able to put a direct RTSP:// link in our catalog as our Voyager system doesn’t handle that protocol in the 856 field. So I built a small MySQL database with a couple of fields (one containing the OCLC number and another the corresponding URL to the proper title on our streaming server). The link in the MARC record looks like this:

http://furbo.gmu.edu/streamcatcher/bounce.php?hop=123082827

In this example, 123082827 is the OCLC number of that record. In the database, that number resolves to this link on the streaming server:

rtsp://phobos.gmu.edu:7070/PBS/800/pbs_amx002-3_800k.mp4

It all works great—if you’re using a Mac. As I found at the end of the day when I asked a colleague to test it with Windows, it doesn’t work for a Windows user. I’m not sure why but I’ll keep tweaking things until it’s working for that platform as well :) I’ll make sure I update a posting when it’s finally working right.

Update (7/9/07): Thanks to a tip from my colleague Andrew Stevens, I now have the links in our Voyager OPAC working on Windows as well—working with the QuickTime browser plugin, anyway. Changing the redirect link from rtsp:// to http:// was enough to fix the problem with Windows-based browsers. As you might well guess (and I should have recalled from my years as a Windows user), Real Player grabs (and refuses to relinquish) control over anything with an RTSP:// protocol in the URL. I’m going to have to do some more work and add some additional intelligence to my redirect code (bounce.php) to get things working for those who want to use the standalone QuickTime client. But at least now it’s working.

Add to Del.icio.us Add to Technorati Stumble Upon Digg This