©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Comparative Study of Metadata for Scientific Information: The Place of CERIF in CRISs & Scientific Repositories Keith G Jeffery, Director, IT CLRC Andrei Lopatenko, Manchester University Anne Asserson, University of Bergen
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Series of Presentations CERIF: Past, Present and Future: An Overview Anne Asserson, Andrei Lopatenko, Keith Jeffery CERIF - Information Retrieval of Research Information in a Distributed Heterogeneous Environment Andrei Lopatenko, Keith Jeffery, Anne Asserson Comparative Study of Metadata for Scientific Information: The place of CERIF in CRISs and Scientific Repositories Keith Jeffery, Andrei Lopatenko, Anne Asserson
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories STRUCTURE DATA, INFORMATION & KNOWLEDGE DATA DELUGE, INFORMATION EXPLOSION AND METADATA USAGE OF METADATA IN CRISs METADATA AND CERIF CONCLUSIONS AND RECOMMENDATIONS
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories DATA, INFORMATION & KNOWLEDGE Data DATA : –representation of observation of real world –A lexical string of characters or symbols
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories INFORMATION : –USA: 3 rd June 2002, –UK:6 th March 2002 Instead use: –Data : –Metadata: yyyymmdd : a ‘format template’ Date : a type –Structured data in context DATA, INFORMATION & KNOWLEDGE Information
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories KNOWLEDGE –Theories or hypotheses –Representation of: Facts (i.e. information) Rules (when a, if b, then x, else y) –Processing of them by inference: Deduction, induction, abduction –Commonly accepted justified belief DATA, INFORMATION & KNOWLEDGE Knowledge
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Start-Time Departureairport Flight Arrivalairport End-Time 0800 LHR BA123 FRA LHR BA125 FRA LHR BA127 FRA LHR BA129 FRA 1300 Etc etc 1800 LHR BA137 FRA 2000 DATA, INFORMATION & KNOWLEDGE Knowledge: Facts
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Start-Time Departureairport Flight Arrivalairport End-Time 0800 LHR BA123 FRA LHR BA125 FRA LHR BA127 FRA LHR BA129 FRA 1300 Etc etc 1800 LHR BA137 FRA 2000 between 0800 and 1800 every hour, on the hour a BA flight leaves LHR for FRA INDUCTION (data mining) DATA, INFORMATION & KNOWLEDGE Knowledge: Induction
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Collecting Observed Facts DATA DATA, INFORMATION & KNOWLEDGE Putting it together
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Structuring in Context DATA INFORMATION DATA, INFORMATION & KNOWLEDGE Putting it together
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Inducing commonly accepted belief DATA INFORMATION KNOWLEDGE DATA, INFORMATION & KNOWLEDGE Putting it together
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Value-Adding for Business Needs DATA INFORMATION KNOWLEDGE INSIGHT DATA, INFORMATION & KNOWLEDGE Putting it together
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories STRUCTURE DATA, INFORMATION & KNOWLEDGE DATA DELUGE, INFORMATION EXPLOSION AND METADATA USAGE OF METADATA IN CRISs METADATA AND CERIF CONCLUSIONS AND RECOMMENDATIONS
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories DATA DELUGE, INFORMATION EXPLOSION AND METADATA Technology Capacity Communications –2.4Kb/s 20Gb/s in 30 years 2 * 10 6 Online Storage –1.2 Mb 40 Gb in 30 years 4 * 10 4 Processor Speed increased even more With acknowledgements to
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories e-Science –Petabytes per year –Particle Physics –Space Science –Genomics e-Information –Terabytes per year –Eprints –Hyperlinked data –Hypermedia e-Learning e-Business With acknowledgements to CLRC/BITD/PS DATA DELUGE, INFORMATION EXPLOSION AND METADATA Application Areas
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories WWW (1989) DATA DELUGE, INFORMATION EXPLOSION AND METADATA Technology Takeup
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Much of this data is inaccessible Need to be able to –Find relevant data as information –Understand it : syntax, semantics –Understand any restrictions on its use data required METADATA DATA DELUGE, INFORMATION EXPLOSION AND METADATA Data & Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Metadata is data about data Metadata to one application is data to another Application1Application2 DATA DELUGE, INFORMATION EXPLOSION AND METADATA Data & Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories data (document) SCHEMANAVIGATIONALASSOCIATIVE how to get it constrain it view to users DATA DELUGE, INFORMATION EXPLOSION AND METADATA Three Kinds of Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories intensional description of extensional instances –database: name size security authorisations –attributes: name type constraints formal logic relationship to data instances DATA DELUGE, INFORMATION EXPLOSION AND METADATA Metadata Kinds: Schema
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories data (document) SCHEMANAVIGATIONALASSOCIATIVE how to get it constrain it view to users DATA DELUGE, INFORMATION EXPLOSION AND METADATA Three Kinds of Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories How to get to information resource direct –filename –DB name + navigational algorithm –DB name + predicate (query) –URL –URL + predicate (query) or any of the above via –web indexing system (eg AltaVista, ExCite…) –local indexing system bookmarks or proxy server) DATA DELUGE, INFORMATION EXPLOSION AND METADATA Metadata Kinds: Navigational
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories data (document) NAVIGATIONAL how to get it SCHEMA constrain it ASSOCIATIVE view to users DATA DELUGE, INFORMATION EXPLOSION AND METADATA Three Kinds of Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories information for application assistance –catalog record (e.g. Dublin Core) - descriptive –content rating (e.g. PICS) - restrictive –security, privacy (cryptography, digital signatures) - restrictive –information from dictionaries, thesauri, hyperglossaries, domain ontologies - supportive no formal logic relationship to data instances DATA DELUGE, INFORMATION EXPLOSION AND METADATA Metadata Kinds: Associative
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories STRUCTURE DATA, INFORMATION & KNOWLEDGE DATA DELUGE, INFORMATION EXPLOSION AND METADATA USAGE OF METADATA IN CRISs METADATA AND CERIF CONCLUSIONS AND RECOMMENDATIONS
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Data quality Access Understanding answers Improving Queries Interoperability with other CRISs Interoperability with other Systems e.g. –Local management information systems –Bibliographic systems –Scientific data systems USAGE OF METADATA IN CRISs Benefits
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories All CRISs based on –DB SYSTEM –IR SYSTEM Have schema metadata It may not be sufficient –To ensure integrity –To provide rich enough program interface –To ensue integrity in foreign key - primary key linkage to associated CRISs or other systems SCHEMA constrain it USAGE OF METADATA IN CRISs Schema Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories NAVIGATIONAL how to get it ‘Base CRISs’ may have navigational metadata –If provide raw information only: no –If provide URLs to e.g. publications, scientific datasets: yes ‘Meta-CRISs’ which act as catalogues or indexes to other CRISs do have navigational metadata USAGE OF METADATA IN CRISs Navigational Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories AdM –Associative descriptive ArM –Associative restrictive AsM –Associative supportive ASSOCIATIVE view to users USAGE OF METADATA IN CRISs Associative Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories CRISs have AdM if –Provide summary record of >= 1 { | | } and point to detailed records –The AdM provides machine-readable (syntax) and machine-understandable (semantics) information ASSOCIATIVE view to users USAGE OF METADATA IN CRISs Associative descriptive Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories CRISs have ArM if –Provide separate metadata record with information on access rights, copyright, IPR, 3 rd party liability disclaimer, pricing –The ArM provides machine-readable (syntax) and machine-understandable (semantics) information ASSOCIATIVE view to users USAGE OF METADATA IN CRISs Associative restrictive Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories CRISs have AsM if –Provide >= 1 {dictionary | hyperglossary | thesaurus | domain ontology} –The AsM provides machine-readable (syntax) and machine-understandable (semantics) information and / or knowledge ASSOCIATIVE view to users USAGE OF METADATA IN CRISs Associative supportive Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories USAGE OF METADATA IN CRISs Typical CRIS and Metadata Metadata for whole collection of base CRIS data records Metadata for data record in base CRIS Metadata within base CRIS Data schemanavigational associative Other data system
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories STRUCTURE DATA, INFORMATION & KNOWLEDGE DATA DELUGE, INFORMATION EXPLOSION AND METADATA USAGE OF METADATA IN CRISs METADATA AND CERIF CONCLUSIONS AND RECOMMENDATIONS
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories METADATA & CERIF The CERIF Models Metadata Model Export Model Full CRIS Model CRIS C CRIS B CRIS A
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories Metadata Model Designed in from the start –Schema and navigational metadata defined –AdM : e.g. to be used as catalog in ERGO –AsM : e.g. controlled lists of terms Also designed to assist evolution of further linkages –Flexible ‘articulated’ structure –Links (metadata within records) to e.g. bibliographic, scientific datasets METADATA & CERIF The CERIF Metadata Model
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories AdM : provide crosswalks from CERIF Metadata Standard to: –Dublin Core (at least formalised version) –Other relevant standards as they emerge –Using RDF, XML-Schema –Coded in XML METADATA & CERIF CERIF: What Next? : Associative descriptive Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories ArM: define metadata standard taking into account existing ones –Access rights –IPR, copyright, right to use –3 rd party liability disclaimer –Charges XrML ? (PARC, Palo Alto) METADATA & CERIF CERIF: What Next? : Associative restrictive Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories AsM : take current work on formal domain ontologies and push further –Using pre-existing ontologies when relevant –Providing crosswalks to related ontologies DAML/OIL METADATA & CERIF CERIF: What Next? : Associative supportive Metadata
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories STRUCTURE DATA, INFORMATION & KNOWLEDGE DATA DELUGE, INFORMATION EXPLOSION AND METADATA USAGE OF METADATA IN CRISs METADATA AND CERIF CONCLUSIONS AND RECOMMENDATIONS
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories CONCLUSIONS & RECOMMENDATIONS CERIF: Good Basis CERIF already provides a good metadata standard –Formally defined –Proper subset of export and full CRIS models –Recognised by EU
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories CERIF Metadata already provides to a large extent the facility for: –Data quality –Access –Understanding answers –Improving Queries –Interoperability with other CRISs –Interoperability with other Systems e.g. Local management information systems Bibliographic systems Scientific data systems CONCLUSIONS & RECOMMENDATIONS CERIF Metadata Provides
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories But can do more –Improvements proposed earlier –Plenty of scope for ideas, enthusiasm euroCRIS CERIF Task Group CONCLUSIONS & RECOMMENDATIONS More can be done
©CLRC/BITD/Keith G JefferyCERIF in CRISs & Scientific Repositories CONCLUSIONS & RECOMMENDATIONS CERIF Metadata: Use It! hieroglyphics demotic greek CERIF METADATA Is developed to assist: Quality Understanding (answers) Precision (queries) Interoperability Of CRISs With acknowledgements to the British Museum