Presentation is loading. Please wait.

Presentation is loading. Please wait.

Harvard’s Digital Repository Service (DRS) Architecture Harvard University Library (HUL) Andrea Goethals, Randy Stern December 10, 2009.

Similar presentations


Presentation on theme: "Harvard’s Digital Repository Service (DRS) Architecture Harvard University Library (HUL) Andrea Goethals, Randy Stern December 10, 2009."— Presentation transcript:

1 Harvard’s Digital Repository Service (DRS) Architecture Harvard University Library (HUL) Andrea Goethals, Randy Stern December 10, 2009

2 Today’s Agenda 1. What is the DRS? 2. DRS 1 Architecture 3. DRS 2 Highlights 4. Questions

3 1.What is the DRS?

4 DRS Context  A core portion of HUL’s mission is to provide current and future access to research materials and resources, with recognition that preserving access to digital content requires different strategies, tools and skills  Digital Preservation projects and activities (2000-)  Digital Preservation Program (June 2008-) Centerpiece: the Digital Repository Service (DRS)

5 What is the DRS?  Set of professionally managed services for preservation and access metadata and content storage & monitoring service creation & format guidelines, training, ingest service delivery services, access restrictions, persistent names preservation planning & activities, administration, management tools use creation/ acquisition

6 What’s in the DRS?

7

8

9

10

11

12

13

14 DRS by the numbers  103 TB of content 335 TB total (counting all copies)  13 M files 10 M image files 21,000 audio files 2.8 M text files 851,000 compressed Google books  containing 672 M files 6,300 compressed web harvests  containing 14 M web files

15 DRS growth Fueled by large projects Recent explosion – mass digitization (Google book project)

16 Broadening content and metadata requirements  New formats and genres, born-digital content Email archiving, more audio, drawing, video  Descriptive metadata, linkages to catalogs  Rights management, more access restrictions  Auxiliary content Contextual material, licenses, donor agreements, collection objects, documentation, repository agents

17 2.DRS 1 Architecture

18 DRS System Architecture

19 Metadata Storage Database DRS-1 Objects are modeled as related files File Metadata: Administrative (owners, projects, deposit dates, owner IDs, etc.) Technical (format mime-type & format specific data) Role, purpose, quality No descriptive metadata Access restrictions (public, Harvard-only, dark) MD5 file digest and byte count Relationship triples “is_part_of”, “is_preservation_replacement_for”, etc. 21 relationship types ~13M files, 12.3M relationships

20 Content Storage Service Bit preservation  Redundancy, heterogeneity, extensibility, scalability, simple file access protocol  Access demands high availability and high performance delivery  Functional requirements: At least three copies in three physical locations Two media types Two on-line copies for high availability One near-line copy, one off-line copy

21 Content Storage Service Storage provider  SUN SAM/QFS Storage Archive Manager  2 file classes: highuse and lowuse  Archiving rules High use files  Copy 1 on disk at local server center  Copy 2 on disk at remote server center  Copy 3 on tape in library  Copy 4 on tape off line at Harvard Depository Low use files  Copy 1 on disk at remote server center  Copy 2 on tape in library  Copy 3 on tape off line at Harvard Depository  High speed cache for access

22 Consistency Validation Service  Continuous monitoring for file system and database consistency Crawls the file system and confirms that every disk file has a DRS metadata record Crawls the DRS metadata records table and confirms that every file referenced exists in the file system Confirms that the MD5 checksum for each file is the same as recorded in the database Reports errors to administrators

23 Delivery and Access Services  Real time web delivery Image delivery service  JPEG, JPEG 2000, TIF, GIF Page turned object delivery service  METS + page images + page text Streaming delivery service  Real Audio File delivery service  PDFs Web Archiving Service Asynchronous delivery service  Archival masters

24 Administrative Services  DRS Web Administrator Searching, reporting, file operations, archival master download  Page Turned Object Maintenance METS structure editor  Name Resolution Service Maintenance URN create/update/report

25 DRS System Architecture

26 DRS System Architecture Ingest Services

27 DRS System Architecture Delivery Services

28 DRS System Architecture Persistent Naming and Access Services

29 DRS System Architecture Storage Services

30 Storage Services Implementation  Sun SAM-QFS 4.6 Rule-based automatic archiving – no “backups” Unified file name space  Dual Sun T2000 Solaris SAM servers Redundant servers at site 1, DR failover at site 2 Nightly samfsdump from site 1 - samfsrestore at site 2  EMC CLARiiON disk storage arrays RAID 1+0 FC cache/ RAID 5 SATA Disk Archives 35TB CX3-40 at site 1, 109 TB CX3-80 at site 2  StorageTek SL500 tape library LTO-4  In production since Feb 2008

31 Storage Services Redundancy

32 Metadata Storage Service Implementation DRS metadata storage Oracle 10G Live production server – copy 1 Dataguard failover copy – copy 2 Legato Tape backups – copy 3

33 Ingest Services Implementation  Batch deposit of SIPs to SFTP drop boxes  DRS Batch Loader operates 8AM-8PM  51 object owners – libraries, museums  ~12 depositors  234 project codes  Daily weekday deposits average ~60 GB/day

34 Delivery Services Implementation  High availability design  Redundant public access servers Delivery, access management, name resolution Cisco Content Switch Load balancing, sticky sessions MRTG monitoring  Change control – no downtime on updates  RHE linux, java 1.5, tomcat  Tomcat and log4j logging and statistics

35 3.DRS 2 Highlights

36 Scope of work  Builds on the early 2008 storage upgrade  2008-~2013  Effects every part of the DRS! Expanded data model New and different metadata Object descriptors Content models Preservation plans Enhanced deposit tools New management applications New backend services  First major release: Summer 2011

37 Object descriptors  A METS metadata file per object on the file system alongside content files Descriptive, administrative, preservation, technical and structural metadata Describes the object, all its files and bitstreams and related significant events Gives the metadata the same secure storage as the content files  Self-contained, portable objects

38 Some technical challenges  Amount of metadata to store Bitstream description Many elements (esp. MODS, MIX)  Efficient, scalable search implementation Database, index, combination?  Keeping metadata in sync Database, object descriptors on file system  Effect on system of continued growth Consistency checks, migrations, format analysis, etc.  HRCI requirements Email archiving

39 4.Questions? andrea_goethals@harvard.edu randy_stern@harvard.edu


Download ppt "Harvard’s Digital Repository Service (DRS) Architecture Harvard University Library (HUL) Andrea Goethals, Randy Stern December 10, 2009."

Similar presentations


Ads by Google