Archive for August, 2006

back to work

Took a week off–visiting Pawley’s Island and Myrtle Beach in South Carolina and Sunset Beach in North Carolina. Spent most of my time at Sea Trail Pawleys (too bad I don’t play golf but I did enjoy chasing the alligators). This stub of a photo comes from Pawley’s Island (sort of the antimatter Myrtle Beach). Came back last Sunday and hope this week to catch up on a few things that were left hanging as I drove away.

A final note on our Solaris 10 upgrade

Today, two full weeks after completing the Solaris 10 upgrade on our Voyager server, I received a document from Endeavor—”Upgrading from Solaris 8/9 to Solaris 10.” Would have been quite helpful about three weeks ago but reading through it was comforting—I seemed to have eventually thought of about 99% of their suggested procedures. Wasted a lot of time figuring it out on my own to be sure but I paid more attention to the process than I might have had I just followed a checklist.

The one thing I completely missed was making a change in /etc/security/policy.conf to change the default ‘crypt’ routine from __unix__ to md5 (before creating users). Is that a big deal? I decided not. With few users and really fine-grained /etc/hosts.allow controls, I’m willing to take my chances with the Solaris default (DES) crypt routines.

As I thought about it, I realized that if a malicious person gets access to the server’s /etc/shadow and begins brute-force attacks on my password file, the fact that I chose a relatively weak encryption routine won’t be one of my larger problems.

Speaking of tips not taken, I also have one to add if the document gets revised—a fix I had to make in /etc/hosts.allow to get sendmail working (to mail OPAC searches and the like). I found I needed to add a line like this:

sendmail: localhost

since with Solaris 10, the sun-supplied sendmail is tcpd-aware. Oh, and don’t be tempted to express that need for local mailing as:

sendmail: 127.0.0.1

For reasons I haven’t figured out, this version of tpcd doesn’t like the IP alias for localhost.

Smokin…

I have two batteries for this 15″ PowerBook—both are on the Apple recall list. I’m living dangerously until the replacements arrive.

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Work Log

It’s been a busy week around here the past few days…

Solaris 10

On Sunday I did a clean install of Solaris 10 on our SunFire V880 (host to our Voyager system)…then rebuilt a RAID+1 array under Sun Management Console (replacing the older DiskSuite array that was home to Voyager before the O/S upgrade). It took about six hours, much little_sun_logo.gifof that time spent making a few backups across an NFS link to our XServe RAID array—just in case something went very wrong. I also pulled the Solaris 8 boot drive and replaced it with a new disc before installing Solaris 10 (not only to “reset” the MTBF clock on the machine’s boot drive but also to give me a plan “B” in case the install went sour). About the only problem I encountered was an “unreadable media” error when I put in disc 3 of 5. Went to sun.com and pulled down then burned a new ISO image of disc 3 and continued. The rest of the process went smoothly.

Made one interesting discovery which may or may not be news to more seasoned Solaris administrators. When I brought the machine up on Solaris 10 for the first time, I defined the “new” RAID+1 array under SMC using exactly the same disks and slices as my old DiskSuite array under Solaris 8. Once defined, I decided to try and mount the new metadevice (without building a new filesystem on it). Didn’t know if it would work and was prepared to do a “newfs” and copy my old data back. Didn’t have to do a thing—the new array mounted without a hitch and there was my “old” Voyager partition again.

Moments after I mounted the array, I noticed metastat reporting on the “resyncing” progress. I thought that was odd—why would an array need to resync if nothing had been done beyond mounting? Digging deeper, I discovered I had made a typo when defining the submirror—one of the 15GB slices I included was a hot spare in the old submirror and the original slice was now in the hot spare pool. Had I made that mistake with the primary side of the mirror I would have botched the “immaculate” recovery of data. Since it was the submirror, it just started resyncing itself to the master. Once again the computer was a lot smarter than the operator…

Today an engineer from Endeavor is doing an upgrade of the Voyager software on the server and from conversations we’ve had things are going well. The system should come back online sometime Tuesday. Once we’re sure the upgrade was successful, we spend a little time in client-server hell—updating PC-based clients across most of the staff computers in the library.

Update August 15 (Tuesday). Upgrade completed successfully and the web interface to our OPAC came up on the new server…with redirects from its previous home functioning as I’d hoped. -wg

LDAPtation

stealthisinfo.gifSome time back I posted a note about EZProxy and the problems we’ve been having with people who publish information on how to log into systems they’re not authorized to use. As an example, here’s a site in Shanghai that’s working hard to help their readers get cost-free access to information Mason and other universities are paying for:

http://ipopf.info/wp/?cat=23

Actually, it’s not just about getting access to restricted information. The fellow(s?) operating the site at ipopf are actually requiring users to send them a PayPal donation before gaining access to most of their site…so I guess they’re an aggregator of sorts. Here’s their forum:

http://forum.ipopf.info/

To combat this sort of thing, today we changed the way EZProxy does authentication—tying it into the new LDAP server on campus. It’s not yet a perfect solution—the LDAP directory doesn’t contain enough information on each person to support really fine-grained access control—but it’s a start in the right direction. My hope is that by deploying a popular enterprise-wide application (our proxy server) that depends on the campus LDAP server, we’ll help put a bit of pressure on those offices responsible for fleshing out LDAP support.
I’m not trying to imply that building a robust LDAP service is an easy job—a university community turns out to be a tangled web of often poorly documented relationships—but if we don’t show a real need for someone doing the work, it will be tempting to keep putting it off.

MARS

And finally, today Dorothea upgraded our DSpace installation to the latest software release (1.4). There have been a few bumps but nothing more than you’d expect from an open source JAVA product with a relatively small developer base—one that we’ve chosen to run on a platform that very few sites appear to use (Mac OS X Server).

And it is only Monday…

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

AOL releases words too…

Wouldn’t ordinarily blog this but since I mentioned Google’s release of textual data the other day, it seemed fitting to mention this odd “contribution” as well.solprivacy.gif

Guess AOL thought researchers might benefit from access to three months of search history information from approximately 650,000 anonymous AOL users.

Anonymous?

Well, there was some effort made to mask the identity of users (replacing user names with random digits), but it didn’t go nearly far enough. Each search by a particular user was given the same “random” identity mask making it trivial to track a particular user’s queries. They also neglected to scrub the actual query text so social security numbers, names, addresses, phone numbers, and so on all appear.

AOL scrambled after an uproar ensued yesterday and their link to the data has been pulled (the original page is still there). Of course, given the way the rest of the internet works, it’s a little late.

At the bottom of the README that accompanies the data, you find this:

Please reference the following publication when using this collection:

G. Pass, A. Chowdhury, C. Torgeson, “A Picture of Search” The First
International Conference on Scalable Information Systems, Hong Kong, June,
2006.

Copyright (2006) AOL

I found a copy on the net but it’s not clear that the authors are responsible for this lapse—of course, Pass and Chowdhury do list AOL as their affiliation.

Moral: It’s probably never a good idea to search your own social security number—and don’t make assumptions when you read an ISP’s privacy policy.

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

1 Trillion words coming from Google

Google will soon be releasing a dataset of over 1 trillion words—a move they suggest might help advance various text-processing arts (e.g, machine translation, speech recognition, spelling correction, entity detection, etc.). Here’s a blurb from their research blog:

We believe that the entire research community can benefit from access to such massive amounts of data. It will advance the state of the art, it will focus research in the promising direction of large-scale, data-driven approaches, and it will allow all research groups, no matter how large or small their computing resources, to play together. That’s why we decided to share this enormous dataset with everyone. We processed 1,011,582,453,213 words of running text and are publishing the counts for all 1,146,580,664 five-word sequences that appear at least 40 times. There are 13,653,070 unique words, after discarding words that appear less than 200 times.

Watch for an annnouncement at the LDC, who will be distributing it soon, and then order your set of 6 DVDs.

LDC is the Linguistic Data Consortium at the University of Pennsylvania.

Add to Del.icio.us Add to Technorati Stumble Upon Digg This

Digital flotsam

It’s been a busy week or so since the last post and today’s effort is probably nothing more than a response to some measure of guilt over not having written much lately. At any rate, a couple of items are floating around, one trivial and one quite amazing. Let’s begin with the trivial (at least if everything works out):

Voyager upgrade

Work continues in preparation for our Voyager upgrade that begins with a fresh install of Solaris 10 on Sunday, August 13th. Once that starts, our “production” system will be unavailable (for a while it won’t exist at all), but I’ll have a ‘read-only’ clone of our OPAC up and running during the entire process. When the upgrade completes (sometime late Tuesday, August 15), we’ll turn off the “snapshot” and go live with the upgraded system. The tricky part is moving Oracle and the database and Voyager software to a backup machine and modifying the necessary configuration files to bring up the “cloned” version. The process isn’t documented anywhere but after reading and reacting to several hundred error messages I finally managed to figure it out. Yes, I’m sure that if I had a deeper understanding of Oracle this reverse engineering wouldn’t seem like a big deal…

Somewhat out of character, I made notes on how it’s done but I don’t think I’ll post them here—that would probably annoy Endeavor (they offered to do it for us for $1K or so a few years back and for all I know they still derive some revenue from this sort of thing). If you’re curious, drop me a line.

I will point out that running Apache and Voyager’s web interface on a separate box (not your main Voyager system) is critical. We’ve always done it that way here at Mason, a decision I originally made to try and improve performance (why have the web interface steal cycles from the staff clients doing circulation, cataloging and so on). Done this way, you can point the web interface at either your “production” system or a “snapshot clone” by modifying a single line in the web OPAC configuration file (basically just replace the production server’s IP with that of the clone). Users still follow the same URL to reach the OPAC so disruption is minimal.

Scholar for Firefox

I’ve been involved at the edges of a very cool project that’s about to enter a public beta in mid August: Scholar for Firefox. Dan Cohen and a group of other very smart people at Mason’s Center for History and New Media are building, well, let me just lift a paragraph from Dan’s Digital Humanities blog to explain what they’re up to:

“…For those who are hearing about this for the first time, Scholar is a citation manager and note-taking application (like EndNote) that integrates right into the Firefox web browser. Since it lives in the browser, it has some very helpful—and, we think, innovative—features, such as the ability to sense when you are viewing the record for a scholaralpha1.gifbook (on your library’s website or at Amazon or elsewhere) and to offer to save the full citation information to your personal library of references (unlike del.icio.us or other bookmarking tools, it actually grabs the author, title, and copyright information, not just the URL). Scholar will have “smart folder” and “smart search” technology and other user interface capabilities that are reminiscent of iTunes and other modern software. And we hope to unveil some collaborative features soon as well (such as the ability to share and collaborate on bibliographies and notes, find new books and articles that might be of interest to you based on what you’ve already saved to your library, etc.).”

The graphic I’ve posted here is a screenshot from an alpha release. Probably too small to see but this is just after I asked Scholar to include this item from Mason’s online catalog in my personal library—it correctly parses out the important bibliographic information and stores it in my local database. Very smooth. In earlier versions the “database” window was at the top of the browser (just under the bookmark/tab toolbar) but it’s now moved to the bottom. I think I like it down there better. In any case, it can be quickly collapsed when not in use and remains hidden until you summon it (by clicking the little “book” that appears after a URL in the location bar).

I’ll post a link to the public beta when it becomes available (August 15th last I heard).

http://chnm.gmu.edu

Add to Del.icio.us Add to Technorati Stumble Upon Digg This