History MetaFinder

It’s not ‘feature complete’ but our new Mason Metafinder is up and running. The idea is to build small federated search engines for our various research portals. My testbed has been development of a search engine for an Early American history portal. Right now, it covers these seven sources:

  • Historical Abstracts
  • American Memory Project
  • Arts & Humanities Search
  • JSTOR
  • America History & Life
  • OAIster
  • WorldCat
  • and as of June 28th, we’ve added:

  • Early American Imprints (1801-1819)
  • Google Scholar
  • Library of Virginia
  • National Archives web site

We’ll likely add a few more sources as we go forward but for now there’s at least enough content to begin to test the system and start to understand (and appreciate) both the power and limitations of federated searching.

A couple of quick points:

  • Google is a “just in case” sort of search product. Content is collected and indexed by Google just in case you ask. By contrast, Metafinder is a “just in time” sort of thing. When you launch a search, targets are searched in real time and results flow back at an unpredictable pace. For that reason, you’ll see Metafinder pop up a “do you want to add these results to your set” message from time to time during a search session.
  • Metafinder isn’t doing an exhaustive search. Faster responders quickly reach our threshold (roughly 100 citations). Slower systems (hi there, OAIster) give us only 10 results before the clock runs out. To cope with this, Metafinder offers a “Collection Status” button on the results page. Click it and you’ll see how many matches Metafinder got from that source, and how many more the source reported it might eventually deliver. Where there’s a great discrepancy, you need to go to the native interface for that source to do a more deliberate search.

We’re working with Deep Web Technologies on this project and I’ll just note here that they’ve been very good sports as I send them notes asking for a change here and there in their already excellent search system. They’re the code behind sites like science.gov and biznar.

You can try a search in the text box below. Once you retrieve a results page, the “Home” link will take you to ‘normal’ launch page for this particular instance of Metafinder: the Colonial History research portal (also under construction).
 

Search Metafinder (History)

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Did they used to have pencil labs?

fenstairWe’re in the process of planning a new library for George Mason University, adding if I remember correctly, roughly 150,000 square feet to our existing main library building. At present, it seems the ribbon cutting will occur sometime in 2014.

As part of our planning process, we’re all thinking about where Mason’s library should be in five years (opening day) and how the building plan we’re developing would cope with the subsequent thirty years. All pretty standard stuff in this context.

So everyone pretty much comes to the same conclusions, right?

Read more »

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Amplify API

Heard about the OpenAmplify API yesterday and using some PHP code from their site, fashioned a little test service to see the API in action. But first, what’s Amplify?

Here’s a quote from their site:

Using patented Natural Language Processing technology, Amplify reads and understands every word used in text. It identifies the significant topics, brands, people, perspectives, emotions, actions and timescales and presents the findings in an actionable XML structure.

To use the tester, send it a URL to process:

http://gmutant.gmu.edu/z/amp.php?url=http://YourURLHere

Here’s a link that “amplifies” our library’s home page:

http://gmutant.gmu.edu/z/amp.php?url=http://library.gmu.edu

How might you use the API? Well, right now there’s not a lot of utility available to the casual user. You register for an API key and then you’re allowed 1,000 queries per day but each may analyze no more than 2500 characters. I got this a few minutes ago which is encouraging:

tweetapi

So, for now, forget sending your novel in for a quick second read. But still, you can do a few things with the limited API.

For example, looking at the report I got back on our library’s home page, I realized OpenAmplify agrees with me–we don’t use that many “action” terms on the page (a common failing of library websites where the focus often lapses into concern with how we’re organized, not what users might want to do on the site). Worried that we’ve pitched the page to an high-school education level? No, not really…there’s not enough text on that sort of link-heavy page to reach a reliable conclusion.

I saw an interesting OpenAmplify -> QueryPath mashup done by M. Butcher yesterday which shows a much more interesting application of this technology. Here’s a video demonstration / explanation:

http://www.youtube.com/watch?v=GBBKPIva1tM

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

HathiTrust unveils “beta” catalog

hathi.jpg

New “beta” catalog to HathiTrust Digital Library.

http://catalog.hathitrust.org/

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

OA begins at home…

I spent the better part of Tuesday working on a metasearch service that we’re adding to our award-winning research portals (just found out we’ve won an innovation award from a national trade publication–but details are embargoed until August).   Anyway, I made all sorts of interesting discoveries about federated searching but the one I was still thinking about on the drive home dealt with open access and the mixed message I think libraries are probably sending.

Huh? A little background…

I’m working with Deep Web Technologies to build narrowly-focused metasearch engines for our various research portals (if you want to take a peek, you can go to my “testportal” and try a search of the ‘history’ engine we’re building–but no comments, it’s still quite a preliminary piece of unfinished business).   This week’s problem has been figuring out an efficient and infinitely scalable way to deal with content that needs to be proxied (the library world’s version of DRM).

As I was banging in searches and exploring result sets, I kept hitting journal articles that I assumed were available in e-journal form despite the fact that outbound SFX links suggested otherwise.
SFX befuddled
Testing a history collection, my mind eventually drifted to thoughts of Roy Rosenzweig. I entered his name in the search box and among the hits that began flowing back, I noticed an interview from 2000 that appeared in the journal Left History.

America: History & Life (the source of the link) supports OpenURL so I clicked our SFX link and ultimately received the “Sorry, no match for ISSN# 1192-1927.” By this time I decided I really wanted to read the interview with Roy so I jumped over to Google and in less time than it takes to finish this sentence I had the full text:

https://pi.library.yorku.ca/ojs/index.php/lh/article/view/5412/4607

Damn. Not only was it available online but it was hosted on an OJS system. Thinking I’d stumbled onto an oversight I dashed off an email to our e-resources group, asking that this journal be included in our SerialsSolutions e-journal database (which serves as the datastore behind our e-journal finder). I went back to work.

Ten mintues later I hit another one…found the text on the web in a couple of minutes but our SFX system was clueless. I again notified our e-resources group. Just as I was beginning to suspect this might not be a needle-in-haystack situation after all, I got this email from our collection development group:

“You’ve raised an interesting issue that we’re still thinking about…what freely-available e-journals to include in our systems.”

That’s when I realized that for all the Web 2.x buzz maybe things haven’t changed all that much. We might call them discovery tools but clearly many of us still think of them as public interfaces to our inventory control systems. Somehow, I think you approach it differently if you’re trying to solve the task of connecting researchers with information no matter where it resides (and no matter who’s paying for it).

I came away from today’s experience feeling that I’d probably uncovered one of the fault lines between yesterday’s “master of inventory” orientation and the place where we really need to be. Will keep an eye on it and will try to get involved in the discussions I hope our collection development group launches soon.

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Virginia’s OA Draft Resolution

“…NOW THEREFORE the Faculty Senate of the University of Virginia hereby adopts
and endorses the following policy to govern copyrights in scholarly articles authored
by the faculty and respectfully asks the Provost to implement this grant of
copyrights and to develop an Open Access Program for the University of Virginia as
provided below…”

PDF version of March 24, 2009 Memorandum on Scholarly Publications and Author’s Rights.

Resolution was presented to Senate on April 9th.

A “we’re doing what Harvard did” approach to be sure, but encouraging to see the tide rising this close to home…

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Dead Souls of the Google Booksearch Settlement

Recommended reading:

Legally Speaking: The Dead Souls of the Google Booksearch Settlement: “

This piece will appear in the July 2009 issue of Communications of the ACM. Readers may also be interested in the slides from Pam’s recent presentation, ‘Reflections on the Google Book Search Settlement.’

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Next Page »