Ever since I saw Jeffrey Zeldman’s comment on tag clouds (“Tag Clouds Are The New Mullets“), I’ve been interested in getting one. A tag cloud? No a mullet (just kidding). What I’ve been interested in is the way that tag clouds help you visualize the relative importance of particular content at a site. A list of subjects (tags) ranked by the number of times they occur would give the same information but like a regular haircut, not much visual impact.
With that bit of background, the other day I decided to build a tag cloud that captured and visually ranked the subject headings used in our catalog (710,000+ unique subjects). I realized that this wouldn’t be as informative as a years-consuming conspectus of the collection but what the heck, it would be visually compelling.
Step one involved running an SQL+ query against our Voyager system to produce a list of all the subject headings in our catalog and the number of bibliographic records attached to each of them. Here’s the SQL piece if you’re interested (it’s Voyager-specific and includes sub-headings):
That created a text file listing the subjects and the count, separated by a ‘~’ delimiter. I wrote a short PERL program to take that list and create a ready-to-import file for MySQL, building a new database that I could manipulate via PHP.
Two more commands…
# mysqladmin create subjex
# mysql < subjex.sql
…and I had a database. To make sure it was working right, I tried a few simple queries. In the course of this “research” I made what I think must be an important discovery:
OK, so I see we have quite a few Electronic Books but is that really a subject? Or how about our second most popular term Feature Films? Would a book on feature films that got made into a movie be filed under Feature Filmsâ€”Feature Films? I love recursion but there’s a time and place for it.
I think I’m going to have to do a bit more work before I move to the final piece of my OPAC tag cloud…or maybe I’ll just finish the project and see how it looks. All that’s left is a PHP script that builds the webpage of alphabetized subject headings using a CSS stylesheet and the frequency data to scale the font up or down. I’m not yet sure how I’ll develop the necessary AI to toss font-scaling skews like Electronic Books but now that Web 2.0 has made everyone comfortable with a perpetual beta, I guess I don’t have to worry about that right away…