Download presentation
Presentation is loading. Please wait.
Published byFelix Mosley Modified over 9 years ago
1
Harvard’s Digital Repository Service (DRS) Architecture Harvard University Library (HUL) Andrea Goethals, Randy Stern December 10, 2009
2
Today’s Agenda 1. What is the DRS? 2. DRS 1 Architecture 3. DRS 2 Highlights 4. Questions
3
1.What is the DRS?
4
DRS Context A core portion of HUL’s mission is to provide current and future access to research materials and resources, with recognition that preserving access to digital content requires different strategies, tools and skills Digital Preservation projects and activities (2000-) Digital Preservation Program (June 2008-) Centerpiece: the Digital Repository Service (DRS)
5
What is the DRS? Set of professionally managed services for preservation and access metadata and content storage & monitoring service creation & format guidelines, training, ingest service delivery services, access restrictions, persistent names preservation planning & activities, administration, management tools use creation/ acquisition
6
What’s in the DRS?
14
DRS by the numbers 103 TB of content 335 TB total (counting all copies) 13 M files 10 M image files 21,000 audio files 2.8 M text files 851,000 compressed Google books containing 672 M files 6,300 compressed web harvests containing 14 M web files
15
DRS growth Fueled by large projects Recent explosion – mass digitization (Google book project)
16
Broadening content and metadata requirements New formats and genres, born-digital content Email archiving, more audio, drawing, video Descriptive metadata, linkages to catalogs Rights management, more access restrictions Auxiliary content Contextual material, licenses, donor agreements, collection objects, documentation, repository agents
17
2.DRS 1 Architecture
18
DRS System Architecture
19
Metadata Storage Database DRS-1 Objects are modeled as related files File Metadata: Administrative (owners, projects, deposit dates, owner IDs, etc.) Technical (format mime-type & format specific data) Role, purpose, quality No descriptive metadata Access restrictions (public, Harvard-only, dark) MD5 file digest and byte count Relationship triples “is_part_of”, “is_preservation_replacement_for”, etc. 21 relationship types ~13M files, 12.3M relationships
20
Content Storage Service Bit preservation Redundancy, heterogeneity, extensibility, scalability, simple file access protocol Access demands high availability and high performance delivery Functional requirements: At least three copies in three physical locations Two media types Two on-line copies for high availability One near-line copy, one off-line copy
21
Content Storage Service Storage provider SUN SAM/QFS Storage Archive Manager 2 file classes: highuse and lowuse Archiving rules High use files Copy 1 on disk at local server center Copy 2 on disk at remote server center Copy 3 on tape in library Copy 4 on tape off line at Harvard Depository Low use files Copy 1 on disk at remote server center Copy 2 on tape in library Copy 3 on tape off line at Harvard Depository High speed cache for access
22
Consistency Validation Service Continuous monitoring for file system and database consistency Crawls the file system and confirms that every disk file has a DRS metadata record Crawls the DRS metadata records table and confirms that every file referenced exists in the file system Confirms that the MD5 checksum for each file is the same as recorded in the database Reports errors to administrators
23
Delivery and Access Services Real time web delivery Image delivery service JPEG, JPEG 2000, TIF, GIF Page turned object delivery service METS + page images + page text Streaming delivery service Real Audio File delivery service PDFs Web Archiving Service Asynchronous delivery service Archival masters
24
Administrative Services DRS Web Administrator Searching, reporting, file operations, archival master download Page Turned Object Maintenance METS structure editor Name Resolution Service Maintenance URN create/update/report
25
DRS System Architecture
26
DRS System Architecture Ingest Services
27
DRS System Architecture Delivery Services
28
DRS System Architecture Persistent Naming and Access Services
29
DRS System Architecture Storage Services
30
Storage Services Implementation Sun SAM-QFS 4.6 Rule-based automatic archiving – no “backups” Unified file name space Dual Sun T2000 Solaris SAM servers Redundant servers at site 1, DR failover at site 2 Nightly samfsdump from site 1 - samfsrestore at site 2 EMC CLARiiON disk storage arrays RAID 1+0 FC cache/ RAID 5 SATA Disk Archives 35TB CX3-40 at site 1, 109 TB CX3-80 at site 2 StorageTek SL500 tape library LTO-4 In production since Feb 2008
31
Storage Services Redundancy
32
Metadata Storage Service Implementation DRS metadata storage Oracle 10G Live production server – copy 1 Dataguard failover copy – copy 2 Legato Tape backups – copy 3
33
Ingest Services Implementation Batch deposit of SIPs to SFTP drop boxes DRS Batch Loader operates 8AM-8PM 51 object owners – libraries, museums ~12 depositors 234 project codes Daily weekday deposits average ~60 GB/day
34
Delivery Services Implementation High availability design Redundant public access servers Delivery, access management, name resolution Cisco Content Switch Load balancing, sticky sessions MRTG monitoring Change control – no downtime on updates RHE linux, java 1.5, tomcat Tomcat and log4j logging and statistics
35
3.DRS 2 Highlights
36
Scope of work Builds on the early 2008 storage upgrade 2008-~2013 Effects every part of the DRS! Expanded data model New and different metadata Object descriptors Content models Preservation plans Enhanced deposit tools New management applications New backend services First major release: Summer 2011
37
Object descriptors A METS metadata file per object on the file system alongside content files Descriptive, administrative, preservation, technical and structural metadata Describes the object, all its files and bitstreams and related significant events Gives the metadata the same secure storage as the content files Self-contained, portable objects
38
Some technical challenges Amount of metadata to store Bitstream description Many elements (esp. MODS, MIX) Efficient, scalable search implementation Database, index, combination? Keeping metadata in sync Database, object descriptors on file system Effect on system of continued growth Consistency checks, migrations, format analysis, etc. HRCI requirements Email archiving
39
4.Questions? andrea_goethals@harvard.edu randy_stern@harvard.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.