Infrastructure requirements for linked e-science The requirements of the agINFRA VRC for e-infrastructures. Miguel-Angel Sicilia University of Alcalá,

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau University of Southampton.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
1 DCS861A-2007 Emerging IT II Rinaldo Di Giorgio Andres Nieto Chris Nwosisi Richard Washington March 17, 2007.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Chinese-European Workshop on Digital Preservation Beijing (China), July.
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Semantic Publishing Update Second TUC meeting Munich 22/23 April 2013 Barry Bishop, Ontotext.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Provenance Metadata for Shared Product Model Databases Etiel Petrinja, Vlado Stankovski & Žiga Turk University of Ljubljana Faculty of Civil and Geodetic.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Semantic Web Applications GoodRelations BBC Artists BBC World Cup 2010 Website Emma Nherera.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
MPEG-7 Interoperability Use Case. Motivation MPEG-7: set of standardized tools for describing multimedia content at different abstraction levels Implemented.
A view-based approach for semantic service descriptions Carsten Jacob, Heiko Pfeffer, Stephan Steglich, Li Yan, and Ma Qifeng
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
Joint agINFRA & SCI-BUS workshop, 30/05/2013, Budapest, Hungary FP 7-INFRASTRUCTURES programme agINFRA Joint agINFRA & SCI-BUS workshop agINFRA.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Common Terminology Services 2 CTS 2 Submission Team Status Update HL7 Vocabulary Working Group May 17, 2011.
OWL Representing Information Using the Web Ontology Language.
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
Semantic Web COMS 6135 Class Presentation Jian Pan Department of Computer Science Columbia University Web Enhanced Information Management.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Human-Aware Sensor Network Ontology (HASNetO): Semantic Support for Empirical Data Collection Paulo Pinheiro 1, Deborah McGuinness 1, Henrique Santos 1,2.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.
Semantic metadata in the Catalogue Frédéric Houbie.
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Technologies Stuart N. Wrigley 1, Raúl García-Castro 2 and Cassia Trojahn 3 1.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
ONTOLOGY LIBRARIES: A STUDY FROM ONTOFIER AND ONTOLOGIST PERSPECTIVES Debashis Naskar 1 and Biswanath Dutta 2 DSIC, Universitat Politècnica de València.
Enhancements to Galaxy for delivering on NIH Commons
The Semantic Web By: Maulik Parikh.
Cloud based linked data platform for Structural Engineering Experiment
Introduction Multimedia initial focus
Improving Data Discovery Through Semantic Search
Tomas Kliment Junior Researcher Italian National Research Council
Web Service Modeling Ontology (WSMO)
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
Session 2: Metadata and Catalogues
Presentation transcript:

Infrastructure requirements for linked e-science The requirements of the agINFRA VRC for e-infrastructures. Miguel-Angel Sicilia University of Alcalá, Spain agINFRA coordinator

Why agINFRA?

Why sharing data? Sharing research data is “an intrincate and difficult problem” (Borgman, 2011, JASIST) Not much data sharing may be taking place – with exceptions in some domains. Sharing takes different forms, from private data exchange to posting on-line, and including journal supplementary materials. There are few standards for giving shared data the required computational semantics to build automated tools. …however reusing data is at the core of the principles of the scientific method … and a major concern for scientists and policy makers.

What kinds of data? Primary data: –Structured data, e.g. datasets as tables –Digitized data: images, videos, etc. Secondary data –Elaborations of the primary, e.g. a dendogram Provenance information, including authors, their organizations and projects Methods and procedures followed Reports, including papers Secondary documents, e.g. training resources Metadata about the above EML

PhysicalDataFormat Access and Distribution LogicalDataModel MethodsCoverage: Space, Time, Taxa Identity and Discovery Information A … modular Extensible comprehensive Ecological Metadata Language Sharing example: EML From: Matthew B. Jones, “Data, Metadata, and Ontology in Ecology”

EML Model: Attribute structure Describes data tables and their variables/attributes a typical data table with 10 attributes –some metadata are likely apparent, other ambiguous –definitions need to be explicit, as well as data typing YEAR MONTH DATE SITE TRANSECT SECTION SP_CODE SIZE OBS_CODE NOTES ABUR CLIN ABUR OPIC ABUR OPIC ABUR OPIC ABUR OPIC ABUR OPIC ABUR COTT ABUR CLIN ABUR NF AHND NF Species Codes Value bounds Date Format Code definitions From: Matthew B. Jones, “Data, Metadata, and Ontology in Ecology”

Example (i) Air temperature at Lake Hoare –Approximate location in –Temporal extent in –Method in human readable form: “sample sensors every 30 seconds and send summary statistics […] to solid-state storage modules every 10 minutes” –Instruments at least provide a recognizable change: : Campbell Scientific 207 temp/rh probe present: Campbell Scientific 107 temp probe.

Example (ii) Entity name: Air_Temperature_Units Units are correctly specified using range and precision.

The good and bad of EML Metadata schemas as EML: –Provide raw data, e.g. text tables with a possibility of relating them –Provide reasonable support for measurement units and instruments –Is comprehensive in the description of the context. But: –Do not reference entities and attributes formally – requires a human to identify them. –Provenance information is not linked to other systems. –Methods and procedures are in textual form, along with other information items. It is a good vehicle for sharing, but still does not support computation, contrast and repetition.

Example (iii) The dataset can be made semantically rich by adding some mappings to existing ontologies. –NASA SWEET (Semantic Web for Earth and Environmental Terminology) is a candidate. Entity name to an appropriate ontology term: SWEET: Temperature a ThermodynamicProperty for Characteristic (attribute definition in EML) SWEET: Atmosphere a PlanetaryRealm for Entity Measurement conditions refining attribute definition – requires new definitions in SWEET.

Example (iv) Make the entity concrete: –E = #Realm[#partOf #Atmosphere] [boundedBy ] Further classify the entity: E boundedBy #Lake Make the measurement concrete: –M = #Temperature[measurementCondition #Altitude 3 m] –M #measuredBy #Instrument[commercialName = “Campbell Scientific 107 temp probe”] Relation between E and M (and unit expression) already in OBOE (an observation ontology), concretely an observation that has ofEntity E and hasMeasurement M’ ofCharacteristic M.

Example (v) The entity is unambiguously expressed. Refines the incorrect use of “air temperature” (attribute measured instead of entity) Makes formal the expression of measurement conditions.

Example (vi) The mapping enables different matching for datasets. Entity matching: –Measurements for “atmosphere segments at the same latitude” –Approximate matching “at similar latitude” Measurement matching: –Measurements equivalent to M “with similar precision” – requires a detailed model of instruments. –Measurements other than M for entities like E (“atmospheric regions bounded by a lake”) –All measurements of M in the temporal scale of All the above can be expressed in triple query languages as SPARQL.

agINFRA – the linked data view (i) The above can be achieved through tools that progressively help in refining metadata into more formal representations. Sharing can be enhanced via linked data, i.e. using RDF(S) combined with terminologies/ontologies.

agINFRA and data EML LTER node FAO rep. triplification (to RDF) Bootstrapping (concept identification, automated tagging, etc.) Concept/KOS server (with mappings) Exposure (virtual data INFRA layer) … … Service registry (agINFRA RING)

Why linked data is not enough? Linked data is only a set of conventions for publishing semantically rich data on the Web. Allows expressing data in relation to ontologies But a LD endpoint does not necessarily: –Support computation beyond SPARQL queries –Support high traffic –Be reliable and robust –Be scalable –Provide services explicitly targeted for researchers Does not support full lifecycle across datasets –see Bechhofer et al. (2010) “Why Linked Data is Not Enough for Scientists”

The complete picture

Which are the requirements (for infrastructures)?

Two sources of requirements KOS maintenance and use –Storage: distributed, heterogeneous, replicated?. –Harmonization: mapping, multiple representations. –KOS retrieval: bulk, navigation using structure, free (SPARQL) –Evolution: bulk update, lazy clients. KOS-enabled processing: – Dataset management – Schema management – Retrieval: bulk, distributed query (SPARQL) – Research support: tools, instrumentation, scripts – Meta-analysis: dataset alignment, contrast – Replication: workflow

Example

Example: search Two demanding processes: –Traversal of large terminologies –Search on large and distributed metadata (triple) stores Introduces a requirement on high availability of concept (KOS) servers Scalability in RDF seacrh – using cluster algoritmhs as MapReduce? Navigation – how to support reliable links between systems? Building massive metadata repositories or implementing a distributed search protocol?

Example: repetition (i) Checking the model of decrease of temperature in Doran et al. (2002) Extend and repeat automatically with new data (same entity) Mix with observations from nearby places (different entity, same characteristic)

Example: repetition (ii) The data is semantically identified… –…but what about the objectives/methods? Following the previous example: –Hypothesis: “TemperatureSeries of E is Growing[Decreasing]” (a classification of #DynamicPropertySeries) The assertions of the hypothesis are generated outside the formal ontology language. –More general hypothesis “TemperatureSeries(?t) of AtmospherePart(?a) -> Growing(?t)” E is #partOf #Atmosphere, should be true for all the transitively related parts. –What happens with the outcome of the rule and of the computational mechanism?

Example: repetition (iii) Requirements –Define dynamic properties of measurements: growing disjointWith decreasing –Define techniques for generating the properties of the series, in this case Regression requires a model of regression methods and parameters that can be used for e.g. generating MatLab or R scripts. –Define rules with general hypothesis. –These will generate facts as Decreasing(?t) that produce an inconsistency when reasoning! …obviously this is not exhausting all the cases.

Final remarks agINFRA is aimed at developing a linked data infrastructure Linked data exposure is just the basic sharing mechanism. Requirements for infrastructure are derived from the commitment from linked data and shared semantics.

References Borgman, Christine L. (2011, submitted). The conundrum of sharing research data. Journal of the America Society for Information Science and Technology. Bechhofer, S., Ainsworth, J., Bhagat, J., Buchan, I., Couch, P., Cruickshank, D., Delderfield, M., Dunlop, I., Gamble, M., Goble, C., Michaelides, D., Missier, P., Owen, S., Newman, D., De Roure, D. and Sufi, S. (2010) Why Linked Data is Not Enough for Scientists. In: Sixth IEEE e–Science conference (e- Science 2010), December 2010, Brisbane, Australia.