Semantic Digital Preservation Rathachai Chawuthai Information Management CSIM / AIT Introduction Issued document 1.0.

Slides:



Advertisements
Similar presentations
Creating Institutional Repositories Stephen Pinfield.
Advertisements

Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Felix Ubogu & Charl Roberts. What is an IR?  An Institutional Repository is an online locus for collecting, preserving, and disseminating -- in digital.
Versioning Requirements and Proposed Solutions CM Jones, JE Brace, PL Cave & DR Puplett OR nd April
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
3. Technical and administrative metadata standards Metadata Standards and Applications.
PREMIS What is PREMIS? – Preservation Metadata Implementation Strategies When is PREMIS use? – PREMIS is used for “repository design, evaluation, and archived.
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Descriptive Metadata o When will mods.xml be used by METS (aip.xml) ?  METS will use the mods.xml to encode descriptive metadata. Information that describes,
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
Special Study Presentation 2 Rathachai Chawuthai CSIM/SET/AIT.
Carol Hixson Dean, Nelson Poynter Memorial Library and Alex Brice Associate Professor, College of Education Promote and Publish Your Work A Presentation.
Presented by Ansie van der Westhuizen Unisa Institutional Repository: Sharing knowledge to advance research
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Statewide Digitization and the FCLA Digital Archive Priscilla Caplan, Florida Center for Library Automation Statewide Digitization Planners Meeting OCLC,
A Logical Model for Digital Archives Rathachai Chawuthai Information Management CSIM / AIT Draft document 0.1.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Alternative Models of Scholarly Communication: The "Toddler Years" for Open Access Journals and Institutional Repositories Greg Tananbaum President The.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Rathachai Chawuthai. Preface Draft idea only Something may be informal – Formula sign may be informal, such as, dark delta – No any axioms – Not enough.
OAIS Open Archival Information System. “Content creators, systems developers, custodians, and future users are all potential stakeholders in the preservation.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Digital Preservation MetaArchive Cooperative.  9:00-9:45 - Session 1: Digital Preservation Overview  9:45-11:00 - Session 2: Policy & Planning Overview.
1 ARRO: Anglia Ruskin Research Online Making submissions: Benefits and Process.
Linked Digital Archive Institutional Repository Rathachai Chawuthai CSIM/SET/AIT.
Uganda Scholarly Digital Library (USDL) Makerere University’s Institutional Repository By Margaret Nakiganda URL:
Multimedia ETD Questions Bill Savage UMI Dissertations Publishing ETD 2002 Provo, Utah Saturday, June 1, 2002.
Institute Repositories and Digital Preservation : Assessing Current Practices at Research Library Rathachai Chawuthai Information.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
A LOGICAL MODEL FOR DIGITAL ARCHIVES RATHACHAI CHAWUTHAI.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Filling institutional repositories: considering copyright issues Susan Veldsman eIFL Content Manager
How to Implement an Institutional Repository: Part IV A NASIG 2006 Pre-Conference May 4, 2006 Policy Issues.
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Digital Preservation What, Why, and How? Dan Albertson’s Digital Libraries Class April 13, 2016 Jody DeRidder Head, Metadata & Digital Services University.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Present apply review Introduce students to a new topic by giving them a set of documents using a variety of formats (e.g. text, video, web link etc.) outlining.
Wanted: The Right Content and The Content Rights Putting Knowledge to Work: Building an Institutional Repository for Your Campus California Polytechnic.
Working with personal digital archives Susan Thomas Project Manager & Digital Archivist project Manuscripts Matter, Electronica panel London, October.
Preservation Planning Bojana Tasić FORS SEEDS Workshop I Belgrade, October.
Joint Meeting of CSUL Committees,
Ingest and Dissemination with DAITSS
Statewide Digitization and the FCLA Digital Archive
How to Implement an Institutional Repository: Part IV
Introduction to Implementing an Institutional Repository
Implementing an Institutional Repository: Part II
Implementing an Institutional Repository: Part II
Digital Library and Plan for Institutional Repository
How to Implement an Institutional Repository: Part II
Digital Library and Plan for Institutional Repository
Presentation transcript:

Semantic Digital Preservation Rathachai Chawuthai Information Management CSIM / AIT Introduction Issued document 1.0

22 nd Century Digital Preservation Needs of Archive in IR Knowledge Preservation Technology Review 2

3

Assume that incoming scenario is happening in 22 nd century 4

Imagine that how a man in the future is able to read your today digital document. Alice Bob 5 ReaderArchivist

Hi Bob, do you have information about USA president “Barack Obama” Hi Bob, do you have information about USA president “Barack Obama” Oh! It is hard to find out. Because the information is older than 100 years. Oh! It is hard to find out. Because the information is older than 100 years. 6

What is a DVD? Hi Alice. Luckily, I found a DVD containing his information ? 7

Do you believe that you current media will be useful in the future? 8

No !!! That thing is unreadable ! Error: DVD unreadable Don’t be silly, Alice. It was popular in 100 years ago. It can be read by DVD reader. See it !! Don’t be silly, Alice. It was popular in 100 years ago. It can be read by DVD reader. See it !! 9

An age of digital media is quite short. Do you have plant to move your data to a freshly new media? 10

Hey, … How to open PDF file? ! Fortunately, I can get that file. Can you open “obama2009.pdf” Fortunately, I can get that file. Can you open “obama2009.pdf” Error: No program can open file format PDF 11

Do you inform them about software, hardware, and version to open your file? 12

How I know the password? As I see, it need Adobe Reader 9.0 to open it. As I see, it need Adobe Reader 9.0 to open it. File is read protected Please key password 13

Your file might be secured. Do you inform them how to access your file? Your file might be secured. Do you inform them how to access your file? 14

!7rò??àÕ ??ߟ²ÂÚ Õ??ߟ²ÂÚ ðŽɳ !Z?g! Õr / ÕŸ / ?rò? Why the author documented in alien language? Why the author documented in alien language? ? ? ! ! 15

It still has issues about encoding; such as, ASCII, ANSI, ISO-8859, UTF7, big-endian,little-endian, and font; such as, Tahoma, Venada. How do you tell them what it require to render? It still has issues about encoding; such as, ASCII, ANSI, ISO-8859, UTF7, big-endian,little-endian, and font; such as, Tahoma, Venada. How do you tell them what it require to render? 16

Barack Obama 44 th president of USA Born 08/04 /1961 Confuse!!! When he was born? 4 th August or 8 th April ? Confuse!!! When he was born? 4 th August or 8 th April ? No idea !!!! You need to ask the author living 100 years ago. No idea !!!! You need to ask the author living 100 years ago. 17

Knowledge of today creator and future reader might be different. How to ensure that reader understand it correctly? Knowledge of today creator and future reader might be different. How to ensure that reader understand it correctly? 18

What should I do if I need to find more information relevance to Barack Obama’s family? What should I do if I need to find more information relevance to Barack Obama’s family? You may have to browse every file from here. Good luck … You may have to browse every file from here. Good luck … 19

Many of files have relationship to other files. How to let them know? Many of files have relationship to other files. How to let them know? 20

It would be good if an older generation has a good plan for digital preservation It would be good if an older generation has a good plan for digital preservation 21

22

Printed Age – Paper is durable format – Store under proper condition Digital Age – Information is fragile Technological obsolescence Deterioration of media

24 A digital object that copy from a printed document. Store in common format format such as TIFF Digitized Object

25 Born-Digital Object A digital object that create from software It needs to keep versioning rather than finalized document

Capacities v.s. Age 1000 Years 15 Years 26 A digital media can contain much more information than printed paper at the same volume. But the digital media’s life is shorter than printed paper. Fortunately, content in digital media is duplicated to another one easily.

An active management of digital information to ensure its – Maintainability Bitstream is still be existing originally – Accessibility Bitstream forming a file is able to be opened – Renderability An opened file presents a digital object originally – Understandability A reader understand a digital object originally over the time Digital Preservation 27 wikipedia.org

Do you have these? 28 How to preserve bitstream whether life of digital media is short and itself becomes old fashion? Issue

29 Current solution is migration. To migrate bitstream by duplicating itself from one media to anther media every interval time. Propose Solution Challenge How to notify that it is time to migrate? Do anyone have Right that intellectual property owner allow to copy the work? How to guarantee that nothing is lost during the migration process? How to keep change of the migration process?

30 A bitstream need to be represent as a file in order to be opened by software. Issue In order to form an accessible file, it need to construct bitsream to be object structure that make software understand. -Datatype: number, string, array, …. -Format: text, image, video, audio, … To open file, it requires environment including hardware, software, and version. Furthermore, some of files cannot be accessible because issue about protection from security concern

31 Propose Solution Use metadata to record information that anyone need to know in order to access the file, such as – Byte encoding – File format – Hardware & Software, and their version – Password to open file Provide the way to access file – Use virtual environment to access file – Migrate file according to newer software

32 Challenge How to make a common metadata structure? – Which information that every organization agree to include. How to notify that it is time to migrate to a new software? Do anyone have Right that intellectual property owner allow to copy and modify the work in order to support a newer software? How to guarantee that nothing is lost during the migration process? How to keep change of the migration process?

33 Although digital object is able to opened, how to guarantee that it is rendered originally or not? Issue

34 Purpose Solution Use metadata to record information about look and feel of digital object, such as, – Character Code – Font – Color template Challenge Which information is necessary to include in metadata? Does it has process to verify the correctness of rendered object?

35 Issue How to ensure that our today digital object characteristics including: – Documentation style Date format Number format Grammar, Sentence, Phrase, Vocabulary, Symbol – Contemporary knowledge Commonsense Contextual knowledge Knowledge automatically understanding in community are understanding by future readers who have difference knowledge?

36 Purpose Solution Preserve underlying community knowledge as well as digital preservation Link relevance digital objects and its contents to explore original knowledge and new knowledge – Using semantic technology

37 Challenge How to model and implement theory of underlying community knowledge? How to collect context knowledge for each period? How to claim correctness of knowledge?

38 To accomplish the preservation requirements, an archive information system seems answer the solution. Thus, a good system should supports: – Flexible information model – Long-term storage – Well-formed metadata – Preservation activities – Browsing and searching – Knowledge exploration – Preservation policy – Access policy – Right and agreement policy

39 To complete full features of system, it needs to support following roles: Provider – One who ingest digital objects to archive Consumer – One who retrieve preservation information. Management – One who provide preservation strategies and do preservation activities such as migration A good system should support each of uses of these roles as well

40 The goal of preservation is to maintain knowledge over the time. To do preservation, it needs well established metadata and system. A preservation system should serve functionalities to provider, consumer, and management

Institutional Repositories and Digital Preservation: Assessing Current Practices at Research Libraries Yuan Li Syracuse University Meghan Banach University of Massachusetts Amherst

Archive – Is a collection of historical records, or the physical place they are located. – contain primary source documents that have accumulated over the course of an individual or organization's lifetime, and are kept to show the function of an organization. Digital Archive – Is a digital format of archive that need to do digital preservation Digital Media Environment to render wikipedia.org

An Institutional Repository is an online locus for collecting, preserving, and disseminating - in digital form - the intellectual output of an institution, particularly a research institution. For a university, this would include materials such as research journal articles, before (preprints) and after (postprints) undergoing peer review, and digital versions of theses and dissertations, but it might also include other digital assets generated by normal academic life, such as administrative documents, course notes, or learning objects. The four main objectives for having an institutional repository are: – to provide open access to institutional research output by self-archiving it; – to create global visibility for an institution's scholarly research; – to collect content in a single location; – to store and preserve other institutional digital assets, including unpublished or otherwise easily lost ("grey") literature (e.g., theses or technical reports). wikipedia.org

Review – Be archive with in IRs – Manage digital content – Produce copies being digital 44

Preservation system requires – Natural and juridical people – Institutions – Applications – Infrastructure – Procedure 45

Issues of Preservation – Little control over ingestion process – Less-optimal formats – Poor metadata – Insufficient intellectual property rights clearance – Difficult or costly to preserve 46

Analyze needs of digital preservation (digital archive) in domain of intuitional repository 47

Is preservation part of the mission and goal of IRs? What preservation policies exists for IRs? What preservation strategies are IRs currently implementing? Are the necessary rights and agreements in place to preserve the content of IRs? Are all of the materials in IRs of sufficient quality and importance to warrant long-term preservation (Content policies)? Do IRs currently have the necessary sustainability in terms of funding and staffing to carry out long- term preservation of their contents? 48

Is preservation part of the mission and goal of IRs? 49

NO YES Is preservation part of IRs? 50

What preservation policies exists for IRs? 51

Duration – Short | Medium | Long Recommended file formats – Text formats : pdf, txt, rtf, xml, odb, ods, odp – Image file formats : tiff, jp2, jpg – Audio formats : aif, aiff, wav – Video formats: avi, mj2, mjp2 Preservation Policies 52

What preservation strategies are IRs currently implementing? 53

Preservation Strategies Backup System Security Storage System Checksum 54

Preservation Strategies By IR system By external system Preservation metadata 55

Metadata varies based on the sophistication of the collection Working on standard and best practices address all type of metadata Preservation Strategies 56

Are the necessary rights and agreements in place to preserve the content of IRs? 57

Rights and Agreements Digital content may be changed if technology change Does this impact copyright? Players – Content contributor – Copyright holder 58

Rights and Agreements What is Agreement? – Click through – Written – Policies – MOUs – Verbal Most Agreement Contributor needs permission to submit work that is own by another party Most Agreement Contributor needs permission to submit work that is own by another party 59

Are all of the materials in IRs of sufficient quality and importance to warrant long-term preservation (Content policies)? 60

Content Policies CollectManageDisseminate 61

Problem – Format obsolescence – Poor quality – Unreadable – Insufficient metadata To manage To preserve Content Policies 62

It should – Track user activities e.g. submit work – Peer review before deposit in IRs (To ensure quality) Journal article Conference proceeding Content Policies 63

Do IRs currently have the necessary sustainability in terms of funding and staffing to carry out long-term preservation of their contents? 64

Sustainability Period Time Technology Change Infinity Short-term Medium-term Long-term 65

To realize to implement Digital Archive in Institutional Repository To Make Agreements and secure permissions for preserving IR contents To have guidance of digital format preservation to content contributors To plan for Long-term digital preservation To solve issue of lack of preservation funding 66

Terminology and Wish List for a Formal Theory of Preservation Giorgos Flouris FORTH or ICS CNR of ISTI Meghan Banach CNR of ISTI

68 Barack Obama 44 th president of USA Born 04/08 /1961 Bit Preservation Currently, the system can do Object Preservation Bit stream is preserved for long-term by modern media Bit stream are able to be rendered and display to user originally.

69 Barack Obama 44 th president of USA Born 08/04 /1961 Information Preservation Currently, the system may not focus It becomes a new challenge that the system can preserve ability of understanding the rendered object over the time. To achieve this challenge, the reader is able to understand rendered object’s content by understanding the terms, concepts, or other information that appears in it, by placing it in its correct context. Currently, this feature is not exist in existing preservation approaches.

70 Barack Obama 44 th president of USA Born 04-Aug-1961 Producer Consumer Archive System Ingest Render The objective is that a reader (consumer) is able to perceive information context following his/her background knowledge and understand it originally.

71 Terms Producer The creator of the digital object P P D D Digital Object An object that present knowledge in understood-language C C DC Consumer Designated Community A reader who read digital object A group of readers who have shared common characteristics and knowledge

1.Producer produced Digital Object and stored in storage media. 2.Consumer opens Digital Object from storage media by rendering sequence of bit values represent the document. 3.Consumer obtains Digital Object by light from output device taking to his eyes. 4.Consumer understands meaning of Digital Object by D itself and contextual knowledge from his/her Designated Community 72 Understanding Process Goal Consumer is able to understand Digital Object originally over the time

The key is “meaning” of digital knowledge. – The meaning of a digital object can be viewed as a special kind of mapping that associates a symbol with a particular real-world concept. – This association is not always clear by looking at the digital object alone. A date format is a good example that make people confuse. – If European notation, he was born on 8 th of April. – If American notation, he was born on 4 th of August. 73 Barack Obama 44 th president of USA Born 08/04 /1961 Flouris & Meghan

In order to capture the “meaning” of a Digital Object, the Digital Object needs to be described in Language. 74 L L Language An arrangement symbols that associate with real-world concept Language should be a formal language that can be interpreted by both Producer and Consumer. Purposes of Language are – Providing formulation rules that encode real-world concept to be symbols. – Providing logic’s semantic that use contextual, background, or commonsense information in order to decode symbol to be real-world concept

75 08/04 August 4 PD L The producer need to represent “4 th of August” in a common language. Thus, she need to use contextual, knowledge, or commonsense information that she agree with her community in order to write a symbol representing “4 th of August”. She decides to use “08/04” because everyone in the same community understand this and can interpret to “4 th of August”. It means that she, and readers in the same community at that period understand the same meaning.

From simple Math function f(x) = y 76 Every people use Interpret function to understand meaning of language producer.interpret( “08/04” ) = “4 th of August” reader01.interpret( “08/04” ) = “4 th of August” reader02.interpret( “08/04” ) = “4 th of August” In this case, everyone interprets language “08/04” to be “4 th of August” because inside the interpret process has formula. Formula comes from knowledge. If knowledge is agreed in community, formula is produced from community knowledge. It means that Producer and all reader have the same formula, so they understand the same thing together.

77 Underlying Community Knowledge Knowledge from designated community (DC) that help members to similarly understand association between language and real-world concept. Therefore, key feature of UCK is to produce formulas that are able to -Encode real-world concept to be language -Decode language to be real-world concept UCK

78 08/04 April CD L 8 producer.interpret( “08/04” ) = “4 th of August” consumer.interpret( “08/04” ) = “8 th of April” Why consumer understand incorrectly?

When the time change, designated community may be changed, and knowledge may be changed. Thus, “understanding” may be changed, too. The critical cause is a change of UCK. – Because difference UCK makes difference formula that makes difference understanding. Next challenge is “How to capture change of UCK” 79

80 UCK Evolution Structure A structure that represent difference (delta) of UCKs. UCKES captures change of UCK’s language from change of UCK’s theory such as ontology evolution. UCKES UCKES represent a gap of each UCK C P

81 C P UCK Mapping Structure A complex mechanism that use UCKES to produce relationship between Consumer’s formula and Producer’s formula. The main function is to change language in order to make the same understanding of real-world concept UCKMS

Is it possible? 82 producer.interpret( “08/04” ) = “4 th of August” consumer.interpret( “04/08” ) = “4 th of Auguse”

83 Consumer Producer Right now, Consumer get incorrect understanding from language that Producer need to present. UCK Formula UCK 08/04 Read Digital Object D

84 Consumer Producer 08/04 The system should understand knowledge from Consumer’s side and generate mapping between Producer’s formula and Consumer’s formula using UCKES and UCKMS mechanism UCK Formula UCK UCKES UCKMS Digital Object D

85 Consumer Producer 08/04 Then, the system transform the digital object D to be D’. D’ contains language that make Consumer understand same thing as Producer UCK Formula UCK UCKES UCKMS 04/08 Digital Object D Digital Object D’ Read

86 Barack Obama 44 th president of USA Born 08/04 /1961 Barack Obama 44 th president of USA Born 04/08 /1961 Consumer understand D’ as same thing as Producer understand D. It means that D’ has preservability relation with D. D D D’ D’ D

87 Next step How to preserve underlying community knowledge as well as digital object? It needs to think of “Reader” when do preservation by providing information to ensure that the reader can understand digital object originally from their knowledge.

88

The PREMIS Data Dictionary defines preservation metadata as "the information a repository uses to support the digital preservation process” The metadata including – Intellectual information Intellectual unit such as book, map, movie, song, … – Digital object information A digital object that actualize from intellectual information. E.g. pdf, image, video, audio, … – Agent information Person or system involving with digital object – Event information Record of activities of an digital object – Right information Agreement of the digital object wikipedia.org, LOC.gov 89

An Open Archival Information System (or OAIS) is a reference model of archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. Features – Ingest, Archive, Preservation Plan, Administration, Dissemination, and Access End users – Provider, Consumer, and Management wikipedia.org, OLCL.org 90

91

ype=pdf ype=pdf Preservation_Metadata:_Implementation_Strategies_(PREMIS) Preservation_Metadata:_Implementation_Strategies_(PREMIS)