Presentation is loading. Please wait.

Presentation is loading. Please wait.

APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and.

Similar presentations


Presentation on theme: "APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and."— Presentation transcript:

1 APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and STFC

2 Digital Preservation Ensure that digitally encoded information are understandable and usable over the long term – Long term could start at just a few years Easy to make claims – Difficult to provide proof Reference Model for Open Archival Information System (ISO 14721) – The basic standard for work in digital preservation – Defines terminology and compliance criteria

3 Definitions (OAIS) Long Term Preservation: The act of maintaining information, Independently Understandable by a Designated Community, and with evidence supporting its Authenticity, over the Long Term. Long Term: A period of time long enough for there to be concern about the impacts of changing technologies, including support for new media and data formats, and of a changing Designated Community, on the information being held in an OAIS. This period extends into the indefinite future. Not just BIT preservation Not just rendering Information not just DATA or Documents Authenticity

4 Basic concept Digital preservation had been dominated by libraries and (state) archives However there was a focus there on “rendered objects” and Tendency to think data is an “easy” add-on HOWEVER Need to deal with DATA – processed to new things, not just rendered Need to follow OAIS – finer grained view Need to test and prove that things work “metadata” “CASPAR banned the use of the term metadata unless absolutely necessary”

5 Data… Level 2 GOME Satellite instrument data

6 Contains numbers – need meaning 6

7 ...to process to this 7

8 ...or this 8

9 ... through complex processing schemes 9

10 10 Just Format? sfqsftfoubujpo jogpsnbujpo svmft You have a file JHOVE tells you it is WORD version 7

11 ..with some extra information.. 11 representation information rules Format Registries – useful but not enough: formats can be used for multiple purposes e.g. audio files used to store configuration parameters

12 12 Examples (cont) “504b0304140000000800f696….” “This is a ZIP file which contains Word files, each of which contains an encoded message which needs the key ‘!D$G^AJU*KI’ to decode it using encryption method SHA7”

13 13 Examples (cont) LaTex file containing an EPS (Encapulated Postscript) version of an image Web page containing Java Applet generating random numbers SWISS-PROT data Foreign Language emails

14 14 XML enough? – can stare at this and probably understand it John Mary Paul

15 ..but what about this? 15 <VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.1 http://www.ivoa.net/xml/VOTable/v1.1" xmlns="http://www.ivoa.net/xml/VOTable/v1.1"> URL of data file used to create this table. Target name U0lNUExFICA9ICAgICAgICAgICAgICAgICAgICBUIC8gU3RhbmRhcmQgRklUUyBm b3JtYXQgICAgICAgICAgICAgICAgICAgICAgICAgICBCSVRQSVggID0gICAgICAg ICAgICAgICAgICAgIDggLyBDaGFyYWN0ZXIgZGF0YSAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgIE5BWElTICAgPSAgICAgICAgICAgICAgICAgICAgMCAv IE5vIGltYWdlLCBqdXN0IGV4dGVuc2lvbnMgICAgICAgICAgICAgICAgICAgICAg

16

17 Performance Viewer: side-by-side comparison and validation of the transformation. From left to right: 3D visualization in Ogre3D, 3D model of the stage including the virtual dancer in VRML.

18 Figure 8 Some aspects of acousmatic production

19 Rendered Non- Rendered StaticDynamic Static Simple Complex Simple Complex Rendered Non- Rendered

20 20 Information Model & Representation Information The Information Model is key Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region) Information Object Representation Information 1+ interpreted using 1+ Data Object interpreted using Physical Object Digital Object Bit Sequence 1+

21 Representation Information Network

22

23 Modules and Dependencies: defining the Designated Community README.txt TEXT EDITOR ENGLISH LANGUAGE WINDOWS XP FITS FILE FITS STANDARD PDF STANDARD FITS JAVA s/w JAVA VM PDF s/w FITS DICTIONARY SPECIFICATION UNICODE SPECIFICATION XML SPECIFICATION MULTIMEDIA PERFORMANCE DATA C3D DirectXMAX/MSP 3D motion data files 3D scene data files motion to music mapping strategy

24 FITS FILE FITS DICTIONARY FITS STANDARD PDF SOFTWARE JAVA VM PDF STANDARD FITS JAVA SOFTWARE DICTIONARY SPECIFICATION XML SPECIFICATION UNICODE SPECIFICATION DDL DESCRIPTION DDL DEFINITION DDL DEFINITION DDL SOFTWARE DDL SOFTWARE

25 If we can run this then we can run the Java software to extract the numbers If we cannot run this then we can use an emulator or use its RepInfo to re- create a Java VM If we cannot run the Java Virtual Machine then we use this source code to re-write in another programming language such as C If we can run this then we can use this in a generic application to extract the numbers If we cannot run the DDL software then we can look at the DDL definition and write some software to extract the numbers In principle we could use this, plus the Dictionaries in order to understand the keywords in order to extract the numbers

26 Rep Info /DISCIPLINE Virtualisation

27

28 2-D array 2-D image 2-D astronomical image Height Width Bits per Pixel Height Width Bits per Pixel Height Width Bits per Pixel Co-ordinate system Time Height Width Bits per Pixel Co-ordinate system Time Height Width Bits per Pixel Astronomical co-ordinate system Time – EPOCH Bandpass Height Width Bits per Pixel Astronomical co-ordinate system Time – EPOCH Bandpass

29 General Table Time series Science data table Number of columns Names of columns Number of rows Value in cell at any row, column Number of columns Names of columns Number of rows Value in cell at any row, column Number of columns Names of columns Number of rows Value in cell at any row, column Time corresponding to any row Number of columns Names of columns Number of rows Value in cell at any row, column Time corresponding to any row Number of columns Names of columns Number of rows Value in cell at any row, column Type of column value Column “metadata” Table “metadata” Number of columns Names of columns Number of rows Value in cell at any row, column Type of column value Column “metadata” Table “metadata”

30 Root node Node 4 Node 3 Node 2 Node 1 Node 6 Node 5 Node 9 Node 8 Node 7 Get the Root Get the number of children for a node Get child number “i” Get the Root Get the number of children for a node Get child number “i”

31 Image Cultural Heritage Image Cultural Heritage Image Artistic Image Artistic Image Astronomical Image Earth Observation Image Optical Astronomical Image X-ray Astronomical Image

32 Archival Information Package Preservation Description Information Preservation Description Information Content Information further described by Package Description Packaging Information derived from described by delimited by identifies

33 Preservation Description Information Preservation Description Information Fixity Information Fixity Information Provenance Information Provenance Information Reference Information Reference Information Context Information Context Information Access Rights Information Access Rights Information

34 34

35 35 Preservation Data Flows and Strategies

36 Representation Information Representation Information Provenance has

37 USE DATA Use application to find data in Repository Create DIP with enough RepInfo for the user (via DC profile) Obtain more RepInfo from Registry if necessary DRM Cost sharing Preservable infrastructure

38 JPA Integration JPA Research JPA Spreading excellence APARSEN Technical 2000 Management 5000 Spreading excellence 4000 Economic/Legal 3000 2100: Preservation Services 1200: Staff and experience exchange 2200: Identifiers & citabillity 2300: Storage solutions 2400: Authenticity & Provenance 2500: Interoperability & intelligibility 2600: Annotation, Reputation & data quality 3100: Digital Rights & access management 3200: Cost /benefit data collection and modelling 3300: Peer Review & 3 rd party Certification 3400: Brokerage services 3500: Data policies and governance 4100: External W/S & symposia 4200: Formal qualifications 4300: Training courses 4400: Awareness raising 5100: Financial management 5200: Technical co-ord. 2700: Scalability 3600: Business cases Integration 1000 1400: Common testing environments 4500: Liaison with other stakeholders 1300: Common standards 1100: Common Vision 4600: International liaison 1500: Internal W/S & symposia 1600: Common tools, software repository and market place 5300: Evaluate impact of the Network of Excellence

39 JPA Research Technical 2000 Economic/Legal 3000 2100: Preservation Services 2200: Identifiers & citabillity 2300: Storage solutions 2400: Authenticity & Provenance 2500: Interoperability & intelligibility 2600: Annotation, Reputation & data quality 3100: Digital Rights & access management 3200: Cost /benefit data collection and modelling 3300: Peer Review & 3 rd party Certification 3400: Brokerage services 3500: Data policies and governance 2700: Scalability 3600: Business cases

40 Trust Certification of repositories Reputation and trustability of datasets, publications and people Authenticity Sustainability Business cases Preservation Cost/benefit analysis Transfer of custody – who to hand over to and what to hand over Storage solutions Usability Intelligibility Use by common tools Cross domain usability Interoperability Access Identify of datasets, publication, people Rights and responsibilities Policies and governance

41 FUTURE Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved Non-maintainability of essential hardware, software or support environment may make the information inaccessible The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity Access and use restrictions may not be respected in the future Loss of ability to identify the location of data The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future The ones we trust to look after the digital holdings may let us down

42 Links CASPAR – http://www.casparpreserves.eu CASPAR Source code - http://sourceforge.net/projects/digitalpreserve/ OAIS Reference Model - http://public.ccsds.org/publications/archive/650x0b1.pdf and the updated draft is available from http://public.ccsds.org/sites/cwe/rids/Lists/CCSDS%206500P11/Overview.aspx CASPAR Validation report http://www.casparpreserves.eu/Members/cclrc/Deliverables/caspar- validation-evaluation-report/at_download/file PARSE.Insight: – www.parse-insight.eu Alliance for Permanent Access: – www.alliancepermanentaccess.eu Digital Curation Centre: – www.dcc.ac.uk 42

43 END


Download ppt "APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and."

Similar presentations


Ads by Google