Say?

Earlier this week I attended the Project Bamboo workshop at Princeton and thought I’d share a discovery I made while trying to complete the pre-workshop reading assignments attendees received:

1. Please read the proposal in its entirety. The proposal can be found at:

http://projectbamboo.org/files/docs/bamboo_proposal.pdf

2. Please read the Identifying Scholarly Practices handout. This handout can be found at:

http://projectbamboo.org/files/3/Identifying_Scholarly_Practices.pdf

All through the week leading up to the workshop I figured I’d surely get around to reading those documents but I never did. The night before I was to leave, I realized it just wasn’t going to happen.  Then I had a thought: why not run the text of these documents through my Mac’s “Text to Speech” service, capture the output and later listen to it as a podcast during the dead time of my  3+ hour drive up to New Jersey?

I launched WireTap Studio (to capture the sound) then opened the proposal PDF in Preview, highlighted the text of the document, selected “Services -> Speech -> Start Speaking Text” under Preview’s application menu and hit the record button.  I stopped after 30 seconds and imported the mp3 into iTunes. Sounded terrible—lots of ambient noise and sort of muddy sound quality. Oh yeah, I was just picking up the tinny sound of the Macbook’s speakers with the low-quality built-in mic, no wonder it sounded so bad.

Next tried to use WireTap Studio to intercept the audio stream (could also do this with Audio Hijack Pro) and found that I couldn’t seem to interrupt (and grab) the Speech services audio. It wasn’t associated with the application and no matter what I selected as input, it didn’t get the speech audio. I assume it can be done but I wasn’t having any success. Time to Google…

Doh! Turns out there’s a unix command baked right in to OSX (since 10.3) that not only does exactly what I was trying to do, it does is much faster than the real-time capture I was experimenting with. Meet “say”

say [-v voice] [-o out.aiff] [-f file]

So, I opened the PDF in Preview, used Command-A to select all the text, pasted it into a text file using BBEdit, chopped out the parts I didn’t care about then saved it to the desktop. Then in a terminal window, issued this command:

say -v Alex -o ~/desktop/bamboo.aiff -f ~/desktop/bamboo.txt

Alex is the “new and improved” voice in OS X 10.5 (Leopard). He has much better inflection and sounds much more human and much less Cylon. If you really get into this (or need a voice that deals with a language other than US English), you can purchase additional voices from Cepstral (http://www.cepstral.com). The voices are roughly $30 each.

In a little less than 3 minutes wall time, ’say’ produced the bamboo.aiff file that was easily imported into iTunes (2 hours, 5 minutes of audio). Here’s a representative sample of how Alex sounded with the material:

…information technologists to collectively tackle the question: How can we enhance arts and humanities research through the development of shared technology services? This proposal represents an 18-month planning and community design program, the Bamboo Planning Project, where through a series of conversations and workshops, we will map out the scholarly practices and common technology challenges across and among disciplines, and discover where a coordinated, cross-disciplinary development effort can best foster academic innovation. Input into the Bamboo process…

Help Wanted, Passwords, Zotero Syncs! and Bamboo

Digital Library Developer

We’ve posted a job advertisement for a Digital Library Developer and I encourage you to apply if you have an interest in building the sort of tools today’s library could really use but tomorrow’s digital library will absolutely require.

You’ll find the full posting (and online application form) at http://jobs.gmu.edu (position number FA730z).

Here’s the heart of the announcement:

George Mason University, University Libraries seeks a Digital Library Developer to join our innovative Digital Programs and Systems division as we build new ways to deliver library content and services.

Duties include: Anticipating and investigating trends in digital library technology so we can respond quickly to new opportunities. Provide primary support for new initiatives in resource discovery, digital preservation, knowledge management, and scholarly communication. This position reports to the Associate University Librarian for Digital Programs and Systems.

Read more »

Sproutcore

It seems there’s some sort of new web development framework released every week or two but the other day I found one that shows a lot of promise: Sproutcore.

Odd name but an interesting concept. At the most recent WWDC (Apple’s World Wide Developer’s Conference), Sproutcore was revealed as the “engine” behind many of the new services on Apple’s .Mac replacement (MobileMe). Many are suggesting the real purpose is an open-source, plugin-free alternative to Adobe’s Flash and Microsoft’s SilverLight. If you’re interested in how something like Sproutcore fits in with cloud computing, Google, Flash, and the future, you should read the “Cocoa for Windows + Flash Killer = SproutCore” post on Roughly Drafted from June 14th.

From the Sproutcore site:

What is SproutCore?

SproutCore is a framework for building applications in JavaScript with remarkably little amounts of code. It can help you build full “thick” client applications in the web browser that can create and modify data, often completely independent of your web server, communicating with your server via Ajax only when they need to save or load data.

I spent an hour or so working through the “hello world” demo and it’s cool. You do development coding in Ruby with an interactive server process that simplifies the code-test-debug-code cycle. When done, there’s a standalone SproutCore utility that converts everything into static Javascript and CSS files—ready for deployment under Apache or whatever.  Here’s my ‘production’ version of the demo:

hello_world

I tested the look on both Windows (Firefox and IE7) and Mac (Firefox 3) and for this simple demo, at least, rendering was identical across platforms.  I think this is going to be a framework to watch.  It’s open source, doesn’t rely on plugins, is reasonably platform neutral (I’ve seen implementations on Ubuntu and Windows boxes) and relies on basic internet standards (Javascript and CSS).


http://www.sproutcore.com


Free Science

I often hear my fellow librarians lamenting the fact that so many students fail to appreciate the wealth of resources available to them.   I agree, up to a point, but  they have to qualify the scope of their complaint the next time I hear it—every day I see students going to great lengths to use our e-resources.  Trouble is they’re not our students.

Earlier today I decided to do a quick scan of our log files, sorting for multiple logins on the same username from different IP addresses. Wasn’t long before my investigation led me to this page (excerpted below):

View yourself at: http://www.smso.net/forum/forumdisplay.php?f=317

If you go to this site, you’ll find 12 pages of username/password combinations for gaining illicit access to many, many libraries around the world. Seems the domain (smso.net) is registered to Mutib Al Tamimi on King Fahad Street in Riyadh, Saudi Arabia, and operates under the name Saudi Medical Site Online.

I especially enjoyed this: explaining how to help keep the SMSO Free Science Team working for all “Researchers” (after all, sharing and helping is their hallmark). It explains why the spelling in the first excerpt looks kinda odd (e.g., IOwa State Un1versity):

If you’re in library IT and responsible for authentication issues, I’ll recommend a quick visit to the Saudi Medical Site Online—just to make sure you haven’t left a door open. As I write this, the SMSO home page advertises a new login link for a university library every day so your number will probably come up sooner or later.

DIY up and running

It has taken about a week off and on to get all the pieces in place: installation of the scanner, software installation, camera calibration, three days lost while we waited for a replacement lighting housing, RTFM’ing and what not but I finished generating my first e-book with the ATIZ Bookdrive DIY scanner late this afternoon (the title is one I picked up in a used bookshop in Boston a few years back).

It took roughly 20 minutes to scan the 240+ page book (the scanning software indicated I was working thru the book at a 632 pages-per-hour clip) and then another 30 minutes or so for post-processing (deskewing the jpg images, applying automatic crops, despeckling the page backgrounds, improving contrast and ultimately producing a PDF). While scanning is a hands-on endeavor, for the most part post-processing runs unattended (after you interactively test and then set processing parameters).

The “optimized” version of the PDF weighed in at 18 megabytes—a major improvement over the “raw” version (500Mb). I suspect I can get that down but will have to experiment with what lower resolutions scans will do (I was working at 300dpi). I still need to work through various post-processing options (e.g., although the pages were faded the sample could use some brightening and I think I needed to use a higher aperture for better focus), but I can tell this is going to work well once I figure out the optimal settings and workflow.

Here’s a link to a sample PDF with a few pages from the book. The schematics of older ballparks are kind of interesting.

Sample (800K)

Moving Forward with Backing Up

If you have sysadmin duties in a library like Mason’s (where our core technologies actually reside in the library and not the computer center), backing up is one big part of what you do. Though heretical at the time, we abandoned tape for disk-to-disk backup in the early ’90’s so while it doesn’t take a lot of any one person’s time it’s still something that has to be scripted, tested, monitored and regularly thought about. When it comes to the actual work of getting the backups done, well, it’s pretty much just one machine talking to another at cron-induced intervals.
Read more »

Briefly

BookDrive

Our ATIZ BookDrive DIY scanner arrived on campus today (hope to have it delivered to our building tomorrow some time). Despite other distractions, we should have it set up and running early next week (still a bit of work to do on preparing the room where this and other digital production tools will be located).

Once operational, I’ll scan a older (pre 1927) book from Mason’s collection and make it available for inspection. If you have a suggestion for a text you’d like to have digitized, leave your idea in the comments. It might take a few weeks to master this machine and the scan-to-ebook workflow but we’ll go as fast as possible.

Gartner

Seems we’ve been trying off-and-on for six months or more to make access to gartner.com available for Mason affiliates. When you consider we typically spend about five (5) minutes setting up authenticated-access to other restricted sources, you get some sense of how frustrating this experience has been. I could make some snarky observation about the trouble we’ve had talking technology with an organization that “delivers the technology-related insight necessary for their clients to make the right decisions, every day” but I’ll resist the impluse.

Thankfully, I recently stumbled across an interesting tidbit deep in the release notes for the EZproxy 5.0:

15. Enable access to Gartner reports using Gartner’s proprietary encryption method.

I nearly knocked my coffee over getting my mouse on the download link.

Followed the remarkably clear instructions posted on OCLC’s site (actually link-backs to the older usefulutilities.com site), made my necessary local configuration changes, ran my keygen file through a cgi process on the usefulutilities.com site, sent the resulting public key to Gartner and eagerly awaited their response. It came about three hours later: “Go ahead and test it.”

“…Invalid sign-on or password”

Didn’t work. I got back in touch with Chris Zagar, creator of EZproxy. In no time at all he sent me a link to a freshly-coded beta rewrite of EZproxy’s Gartner routines. Took a few back-and-forth tests to get things working (hint: make sure your EZproxy server’s time/date values are close to UTC time) but we now have smooth and simple access to Gartner. The 5.0d release isn’t yet on the OCLC site but if you’re interested in using EZproxy for Gartner, here’s the message you’ll receive when trying to generate the key that Gartner used to require:

This process is no longer used for Gartner authentication. If you are trying to set up Gartner authetnication, please contact zagar@usefulutilities.com for assistance

I find myself bashing OCLC regularly but in this instance I couldn’t be more pleased with the level of support I received. Exactly the same great support from the same person as before the OCLC absorbtion of Useful Utilities. I’m sure Chris will be a good influence.

If you’re a Mason student, faculty or staff member, this link will take you to Gartner once you’re authenticated:

http://mutex.gmu.edu:2048/gartner

VuFind redux

Had a bit of time last week to bring our local VuFind installation up to date. I noticed that Oracle has finally released an intel-native version of their OS X InstantClient package. Grabbed a copy but in the amount of time I had to spend, I wasn’t able to get an OS X-based VuFind machine up and running. Switched over to a new Ubuntu 8.04 LTS server install on a DELL PowerEdge SC1430.

Took about a day to get the OS installed and the VuFind infrastructure in place (basically about an hour on 99% of it then 4 or 5 more hours to get a PDO_OCI -> Voyager link up and running). Note to self, stay away from x86_64 when putting together a system that requires more than 3 separate enabling pieces (e.g., Apache, PHP, Oracle, PEAR, PECL, MySQL, LDAP, etc.)…somewhere you’ll hit a 32-bit only version that will break rpm/deb dependencies.

Still have a few things to fix but the new version is fast. I especially like the way clicking on certain author’s name (for example Alfred Hitchcock or Philip K Dick) takes you to their entry on WIkipedia. Nice touch, don’t you think?  Have to fix the problem with the “next” page on that author listing but that should be simple. Next area for serious work is LDAP authentication and the back-end MySQL database that stores tags, favorites, and so on.

I was particularly pleased to see it took only 54 minutes to import and index just over 1.2 million MARC records. With indexing speed like that, an overnite rebuild of the database is certainly something that could be done.

http://zoombox.gmu.edu/vufind

Next Page »