Posted on Wednesday 28 October 2009
9/24/09 – links to presentations to come
Sponsored by the Sun Microsystems (soon to be Oracle) Preservation and Archives Special Interest Group (PASIG – http://www.sun-pasig.ning.com/)
Sun PASIG Focus and Global Trends – Art Pasquinelli, Sun Microsystems
- Need an architectural view (systems) of our campus
- Issues
- IT and economic
- Disaster recovery and business continuity
- Power cost, new technology, scalability
- Don’t know when data will become valuable and needed
- Project Wonderland – 100% Java and open source toolkit for creating collaborative 3D virtual worlds. Immersive virtual world like Second Life, but much more secure.
- “All About Repositories” Webinar Series (http://www.education-webevents.com/) in Fall 2009 designed to provide overviews of best practices, technology updates, and key trend analyses for academic resources directors, IT managers, digital librarians, repository managers and developers, and curators.
Sun in Education – Brian Perkins and Lisa Hosay, Sun Microsystems
- Online articles and participate in online forums – http://sun.com/innercircle
- Sun Academic Excellence Grants – can get equipment for free; just need to apply (http://www.sun.com/solutions/landing/industry/education/aeg.xml)
- Academic Intitiative – free online curriculum for certification (http://www.sun.com/solutions/landing/industry/education/sai/index.jsp)
- EduSoft – Software programs for education (http://www.sun.com/solutions/landing/industry/education/edusoftware.xml)
Data Curation – Sayeed Choudury, Johns Hopkins University
- Primary Investigator on NSF DataNet Grant for Data Conservancy Project (http://www.cni.org/tfms/2009a.spring/CNI_Choudhury.pdf)
- Library becoming the center for data sets and for conversations
- You really need to move content in and out to understand content and systems, i.e. you need to do it to really understand
- What matters for the library in accepting data sets is that a community can’t handle the size of the set (opportunity for libraries)
- DuraCloud – intermediary between cloud storage companies and libraries (they negotiate what I need with cloud companies)
- Faculty point of view is that releasing data is publication – re-invention of tenure process and requirements needed
- Really large data sets are not appropriate to an institutional repository (IR) but some smaller ones quite appropriate
- Need data but also services to run against the data [in an IR]
- Digital Research and Curation Center at JHU – embedded R&D group in JHU Library
- Conversation needed on what is the nature of collections in this day and age
Institutional Updates
Madeline Law – UMD Law Library
- Digitizing the alumni magazine
- Copyright clearance is an issue for items to deposit in the IR
- Staff/time to do the work is an issue
Steven Mandeville-Gamble – GWU Gelman Library
- Cultivating donors for money to digitize the collections they are donating
- Have received over $1 million through donations for digitizing collections
- Have also received $700,000 for a digitization lab and events space
- Will be digitizing collection of Middle Eastern books (received an IMDS grant)
- Partnering with GU to digitize their collection of books in this area including books from the Qatar campus
- Plan to expand the project into Middle East and Africa
- Will be digitizing the Oliver (classical archeology/anthropology collection) and the Kiev (Judaica) collections
Wally Grotophorst (channeled by Cynthia Holt) – GMU
- Have our IR called MARS
- Challenges: Data asset management system; infrastructure
- Cynthia’s comments: Positive step forward is that Digital Systems & Programs and Collection Development & Preservation Departments working together on digitization; realization that both departments have a stake and a need to be involved in developing this area at GMU
Leslie Johnston – Library of Congress
- Issues – number of operations they support and the sheer scale
- Inventoried all staff computers for collections which could be archived
- Audience questions re: archival prep and formal evaluation of content for archived
- They don’t have a content management system; which is a problem
?? – Georgetown
- Digitization program not very far along but expressed the desire for WRLC to play a lead in a consortial digitization program.
- Lene Palmer, as Chair of the WRLC Preservation Advisory Committee, responded that the committee has an interest in expanding their scope into digital preservation
- James Austin Director of IT for WRLC responded that this is a potential role for WRLC
Mike ? – UMD
- Working on auditing the integrity of data over time – how to automate?
Storage Product Overview and Archive & Information Management Trends
Raymond A. Clarke, Sun Microsystems
- Storing data is one thing; being able to retrieve and use it is quite another
- Problems with the multiple iterations of technology over time
- SNIA (Storage Networking Industry Association) created a Bridging Terminology document (http://www.snia.org/forums/dmf/knowledge/white_papers_and_reports/SNIA-DMF_Building-a-Terminology-Bridge_20090515.pdf) so that everyone can mutually understand terms in digitization
- Why does tape still make sense? Lasts longer than digital media and the storage capacity has increased tremendously in recent years
- Open Storage
- DAS (Direct Access Storage), SAN (Storage Area Network), NAS (Network Access Storage), OSD, ISD
- More intelligence in the device itself
- Hard drives are much slower than servers, decreasing performance
- SNIA – 100 year archive project?
- Cloud computing as a solution (http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf) [Cynthia’s note: Could WRLC host a cloud for the consortia?)
Reference Architectures for Repositories and Preservation Archiving
Keith Rajecki, Sun Microsystems
- Challenges
- Ingest
- Sustainability – cost
- Data integrity
- Cost
- Scalability
- Hardware
- Migrations
- People
- Software
- Access
- Bit rot (computing term used either to describe gradual decay of storage media or to facetiously describe the spontaneous degradation of a software program over time)
- Value of reference architectures
- Minimize cost, complexity, and deployment time
- Flexibility to build performance, economy, or mixed archive repository
- Fedora, Fedora/Drupal (Islandora), DSpace, EPrints, Duraspace (cloud), Ex Libris Rosetta, VTLS VITAL, SAF/WMS, Tessella Safety Deposit Box (SDB)
- Virtualized Repository Appliance (great starting point)
- Open repository appliance
- Scale out
- Sun’s Infinite Archive System Approach (next level)
- Cloud computing is usually the second or third layer in preservation architecture
Tessella Overview
Mark Evans
- DIOSCURI
- SDB – Safety Deposit Box
- Contains policy info
- Contains factual info
- Technical registry - PRONOM
- OAIS (Open Archival Information System) – reference model includes “active preservation” framework
- Ingest toolkit
- Passive preservation
- Represents the AIP
- Configurable metadata schema
- Automated integrity check
- Active preservation
- Characterization – What do I have in my archive?
- Preservation planning – guard against obsolescence
- Preservation processing – migration
- Primary configuration
- Standalone archive
- Black-box archive (web service API)
- Active preservation plug-in (storage adapter; 3rd party content storage)
- Characterisation
- Characterizes files: DROID (Digital Record Object Identification), JHOVE
- Embedded object extraction, e.g. picture on a page
- Record components, e.g. position of objects on a page
- Projects Using Tessella Solutions
- Planets – preservation planning project as an interoperability solution
- KEEP – Keeping Emulation Environments Portable
- JHU Data Conservancy project
Ex Libris Rosetta
Mark ?
- OAIS – basis for standardization of the underpinning and framework
- With digital preservation, there is an absolute dependency on technology
- Preservation System Qualities
In Collecting - Producer management
- Support for wide variety of sources and formats
- Allowing the system to be extended using deposit SDK
- Characterization
In Archiving
- Ensuring security – write once, no delete; auditing mechanism
- Ensure minimal dependency
- Ensure integrity
- Support for curatorial process
In Preserving
- Ensuring long-term viability by supporting migration policies using a preservation planning module which includes:
- Risk analysis assessment process
- Embedding of migration tools
- Managing the migration process
In Access
- Ensuring persistency using persistent identifier tools
- Providing simple online access
- Support for dissemination copies with on the fly conversions
- Allowing integration with other library systems
- Rule-based preservation system
- Example of an implementation is the National Library of New Zealand INDIGO system
- PrestoPRIME – A/V preservation innovation (http://www.prestoprime.org)
- DigiTool
- Classic tool built for access
- Not a long-term preservation tool
- Upgradable to Rosetta
Versatile
Ian Jobson
- Islandora
- Being developed by Mark Leggott at the University of Prince Edward Island (UPEI)
- glue between Fedora and Drupal
- Integral part of the system, i.e. not standalone
- Drupal front-end
- Changes/customization to Drupal doesn’t affect the back end (Fedora)
ShareStream
??
- Turnkey solution for institution’s rich media needs
- Provides the means to digitize, archive, preserve, manage, and deliver rich media within LMS’s, i.e. Blackboard, and other online learning destinations in a secure, auditable environment
- Integrates with LDAP
- Pass through credentials from one point
- Administers DRM needs of media
- Integrated with campus networks, ensuring availability in a secure, controlled environment wherever online teaching/learning research take place
- Connector to include federated search link to search in other repositories
- Works with AquaBrowser
- [Cynthia’s Note: Potential collaboration opportunity with ITU and other departments on campus]
Further Reading
SNIA Data Management Forum. May 2009. Building a terminology bridge: Guidelines to digital information retention and preservation practices in the datacenter. Accessed September 25, 2009, at http://www.snia.org/forums/dmf/knowledge/white_papers_and_reports/SNIA-DMF_Building-a-Terminology-Bridge_20090515.pdf
Blue Ribbon Task Force on Sustainable Digital Preservation and Access. December 2008. Sustaining the digital investment: Issues and challenges of economically sustainable digital preservation. Accessed September 25, 2009, at http://brtf.sdsc.edu/biblio/BRTF_Interim_Report.pdf
Berman, Francine. December 2008. Got data? A guide to data preservation in the information age. Communications of the ACM, 51(12): 50-56. Accessed September 25, 2009, at ACM Digital Library http://portal.acm.org/
Armbrust, Michael, Armando Fox, Rean Griffith, Anthony Joseph, Randy Katz, Andrew Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, Matei Zaharia. 2009. Above the clouds: A Berkeley view of cloud computing. Accessed September 28, 2009, at http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf



