Cabin Fever Raises IQs?

During the recent snowstorms (aka Snowpocalypse), our libraries closed to the public from 3:30pm on a Friday afternoon until 8:00am the next Friday. Almost 7 full days.

Fifteen years ago, that would have meant that very little library research until we reopened. Today it’s bad when you lose a nice place to study but the real show stopper is losing the network. Not only because we deliver a lot of content electronically but because we also use most of that technology to find the paper stuff so carefully arranged within the library’s walls.

Thinking about this, I realized that the unexpected and unannounced closing of our physical libraries for seven days in the midst of a term gave me a nice opportunity to assess digital library usage without the burden of trying to separate out what usage we were seeing from in-library users. Here was an opportunity to see what sort of progress we’re making toward what I assume will be our future–a library where the physical presence of a “place” is far less important.

For a quick apples to apples comparison, I drew together our proxy server statistics for this same period last year and those generated during this year’s snow closing. Since our proxy server handles only off-campus traffic, I could exclude in-library use in both sets of numbers and get a fair comparison with the snowpocalypse stats.

February 5, 2009 – February 12, 2009: 2,692,908 items served (lines in the log)
February 5, 2010 – February 12, 2010: 3,908,821 items served (lines in the log)

Wow. Off-campus use of library e-resources increased 45.15% with the library closed.

It would be easy to read too much into those numbers and there are probably a number of explanations for the increase (as an aside, I know at one point I got so bored that I sat through an entire episode of Miss Austen Regrets). Perhaps people just decided to do this term’s reading since they could do little else.

But let’s not lose sight of the fact that thanks to e-content, they could do just that. And who knows, if there hadn’t been so many power failures in the region, I’ll bet this “cabin fever effect” could have been even larger.

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Mechanical Turk as Collection Development Tool

Poking around Amazon’s Mechanical Turk today, I found this “HIT” (Human Intelligence Task) available to webworkers.

The author/publisher is offering $4.00 if you request the book from your library (which I guess they hope will trigger a wave of purchases). I don’t know why it surprised me to see that this sort of thing happens…

HIT

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Mason Tweets

Earlier today a tweet from Dan Cohen pointed me to an interesting service offered by NC State:

http://twitter.ncsu.edu/

They were nice enough to offer a link to their Zend-framework based PHP code on the site so I spent a few minutes today building a Mason tweet aggregator. It still needs a bit of work and I appreciate the fact that it has that Web 1.0 look that seems to come so effortlessly to me, but it does work and I’ll eventually get around to “styling” it

http://gmutant.gmu.edu/tweet

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

OCR, Image/Text PDFs and the Mac

This week I’ve been staring at a collection of just over 29,000 PDFs. Image-only copies of thousands of documents created with “..the software that came with the scanner.”

My task? Figuring out the right tools and workflow to get these PDFs through an OCR process so we can unlock the content and make them more accessible. A number of these documents will end up in our MARS system, so exposing the text to the PDFBox indexing code that ships with DSpace is critical (as an aside, I’ve heard that Xpdf is a really nice replacement for PDFBox but I haven’t had time to tip it into our DSpace install yet).

I don’t have a precise OCR accuracy threshold in mind but assume if we can hit the mid-90% range we’ll find that retrieval doesn’t suffer.

I have seen a 2001 study by a group from Harvard University Library that found that 96.6% of searches will succeed on uncorrected OCR’d text. Also worth a look, Rose Holley’s recent article in D-Lib Magazine (“How Good Can It Get?“). She offers a number of interesting ideas on improving OCR accuracy in a large-scale digitization project. For some reason, it seems that most of the literature on OCR accuracy and retrieval focuses on scientific literature–where it appears to make very little difference. [ article behind pay wall ] [ freely viewable version ]

An ideal workflow would look something like this: fill a directory with image-only PDFs and point some sort of OCR process toward it. The final product would be yet another directory that contains “image-over-text” versions of the original PDFs (wherein the OCR’d text resides ‘inside’ the PDF as an extra ‘layer’ of content).

I’m trying out Mac-based solutions first (knowing that if it ends up being a Windows-based workflow we’ll likely use OmniPage (a product we already use with our ATIZ bookscanner)).

Read more »

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Javascript speed

I’ve long thought that if you wanted the fastest browser experience on a Mac, you went with the nightly Webkit build from http://nightly.webkit.org/.

So I was surprised today when I happened on the SunSpider JavaScript benchmark site and put several browsers through their paces.

One caveat, this test is measuring the core JavaScript engine and no other browser APIs or features. The results (smaller number is better):

Machine: MacPro (dual 2.8 quad-core); OS 10.6.1

Firefox 3.5.3 1036.8ms (32-bit)
Webkit Nightly (r49008) 434.8ms (64-bit)
Google Chrome (4.0.212.1) 434.4ms (32-bit)
Safari 4.0.3 364.6ms (64-bit)

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Fix it till it breaks – into 64 bits

I like to fix things till they break. Today’s post is a cautionary tale for that admittedly small niche of sysadmins running OSX server on XServes upgraded in place from Leopard to Snow Leopard…

For the past few weeks I’ve been tweaking the JSP interface of our MARS system and doing some overdue “authority control” cleanup on subjects and authors. That’s been going so well that late this afternoon I decided to take a crack at updating a few packages originally installed via MacPorts back when the server was running Leopard server (the in-place Snow Leopard upgrade didn’t disturb the code in the /opt/local destination for Macport installs).

I pulled down version 3.2 of Apple’s Developer Tools (to insure 10.6 compatibility) and went to work. In no time at all I had upgraded ant, maven, postgres, bison, wget, openssl and a host of other dependencies. Rebooted and the fun began.  First up, Postgres:

FATAL: incorrect checksum in control file

Never saw that before.

Found a web posting on a Linux site explaining that this could easily happen if you tried to open a database with a 64-bit version of postgres when it had been closed by a 32-bit version. Then it hit me. Of course, on an XServe, Snow Leopard server defaults to 64-bit builds. Under Leopard, I had built a 32-bit version of Postgres.

Recommended solution from the Linux posting: forget about it. Only solution is to open the database under a 32-bit version of Postgres and then dump the data, reimporting it into a new database created by a 64-bit version.

I backed out my 64-bit upgrades, then manually uncommented the “build_arch i386″ line in macports.conf to force 32-bit builds….then started rebuilding 32-bit versions of all the code. That fixed most everything but not Postgres. I still had at least one load library mismatch crashing that compilation.

As a last ditch effort, I tarred up the entire /opt/local tree and did a nearly full replacement from a sparse image clone of the machine’s boot drive that I made with SuperDuper just before doing the Snow Leopard upgrade (meaning all that code was 32-bit). I didn’t disturb /opt/local/var/db (that’s where my postgres database lived) but deleted and then restored these three directories from the sparse image backup:

  • /opt/local/lib
  • /opt/local/bin
  • /opt/local/share

Rebooted…success!

To enable use of the “port” command on this box, I then reinstalled the Snow Leopard version of macports (restoring selected parts of /opt/local from the backup broke the port command). That went smoothly and “port” now works.

My takeaway: Say ‘no’ to that little voice in your head that suggests you should “improve” a system that’s running well…and don’t ever say anything bad about sparse image backups.

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

iPhone / iTouch / Android enabled

inode.jpgGot an iPhone 3GS the other day (clearly I’m behind the mobile curve but then I hate talking on the phone so it took me a while to “get it”).

Anyway, after just a few days with the thing, I realize I need to begin tweaking some of the library’s web-based content.

First (easy) step? A touch-device friendly theme for this weblog. Also applied it to the library’s news blog as well. Thus far it’s working well and it couldn’t be any easier to implement–just drop the code in your plugins folder and activate it. Presence of the mobile device automatically detected and touch theme is served when appropriate.

http://www.bravenewcode.com/wptouch/

tip: To do a screen capture on your 3G/3GS iPhone, press down the home key then hit the “top” button. Screen flashes and image goes into your photos folder.

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Next Page »