Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Semantic Web and Retrieval of Scientific Data Semantics Goran Soldar University of Brighton UK Dan Smith University of East Anglia UK.

Similar presentations


Presentation on theme: "1 Semantic Web and Retrieval of Scientific Data Semantics Goran Soldar University of Brighton UK Dan Smith University of East Anglia UK."— Presentation transcript:

1 1 Semantic Web and Retrieval of Scientific Data Semantics Goran Soldar University of Brighton UK Dan Smith University of East Anglia UK

2 2  Semantic Web  Introduced by Tim Berners-Lee  Data and resources described, interchanged, and processed  Machine understanding of heterogeneous data  Most search engines on the Web are human use oriented  Finding and processing scientific data on the web is time- consuming process Example  Search: Web pages containing the word temperature  Search engine: Google  Search domain: www.cru.uea.ac.uk  Results: 773 web pages Introduction

3 3 Inefficiency of the traditional search  Humans have to browse through web pages  No guarantee that the wanted information will be found Preferred approach  Describe the semantics of data using RDF/XML format  Store the data in a DBMS  Automatically retrieve desired information based on users requests  Enable client machines to learn the semantics of RDF format described data

4 4 Introduction Objectives of this work  To address the problem of extracting semantics from data files within the meteorology domain.  To build the ontology for the meteorology domain.  To create semantic cases with RDF Model/RDF Schema.  To employDB2 DBMS as the data repository.  To enhance standard DBMS with RDF Triples Engine.  To manage the RDF graph structure with RDF Triples Engine.

5 5 RDF and Domain Ontology  RDF is a framework for describing metadata.  It enables interoperability between machines by interchanging information about information resources  It is represented with a Directed Labeled Graph Name File ltgrid.dat Resource Property Value (Subject) (Predicate) (Object) RDF structure

6 6 RDF and Domain Ontology  Specific domains represented with RDF  Our focus: The Meteorology domain  The concepts, semantics and the relations between the concepts defined with RDF Schema.  Ontology: An explicit specification of an information domain  RDF Schema: Uses the syntax of RDF Model  Corresponds to XML’s DTD or XML Schema  RDF Schema is a basis for RDF instances

7 7 Modelling RDF Model for Meteorology Three phases of modelling  Development of the vocabulary (ontology)  Design of semantic cases to capture resource description  Creation of semantic case instances  The vocabulary is comprised of main concepts and classes represented by classes and properties  RDF Schema uses RDF Model encoding syntax  rdf:type separates RDF classes from properties  rdfs:subClassOf allows expression of inheritance-relationship between RDF classes

8 8 Modelling RDF Model for Meteorology The Meteorology domain at cru.sys.uea.ac.uk:  Contains about 1000 data files  Made of 9 meteorological topic (sub-domains)  Have all sub-domains designed as RDF classes  have all concepts and elements defined in its Namespace The ontology is defined in two RDF files:  Class.rdf  Property.rdf  Semantic cases are based on the existing vocabulary  Simple semantic cases designed first  Complex cases are the combination of complex ones

9 9 Modelling RDF Model for Meteorology Our prototype model:  Describes 100 data sets  Contains 4 semantic cases HeaderCase  URL  FormatType  DataParameter  Comment  Domain SizeCase  Compression  FileSize  Value ObservationCase  Frequency  TimePeriod  Value The semantic cases PeriodCase  TimeRange  TimePeriod  Value

10 10 http://www.cru.uea.ac.uk/cru/pressure/hgt/hgt1000_6h ASCII GeopotentialHeight_AtPressure 6-Hourly GeopotentialHeight at 1000mb cru:Height RDF Instance of HeaderCase for a data file Modelling RDF Model for Meteorology

11 11 From RDF to Relational Model Our prototype model:  Comprises of 12 RDF files  One holds semantic case descriptions  Two hold RDF Schema descriptions  Nine contain RDF onstances of semantic cases Management of RDF-described data  W3C does not recommend any method for manipulating RDF Triples  RDF structure is similar to XML  XML comes with APIs for data manipulation (SAX, DOM), RDF does not

12 12 Mapping RDF model for Meteorology into RDBMS DB2 CRU Meteorological Domain RDF Triple Engine SiRPAC RDF Triples Model Ontology Semantic Cases  We utilise RDF triple structure to achieve the manipulation of data  XML parsers check the syntax of RDF  RDF parsers converts it into triples  RDF tags removed  Triples converted onto Relational model  Stored in DB2 DBMS Modelling RDF Model for Meteorology

13 13 Modelling RDF Model for Meteorology

14 14 Retrieval of Semantic Information  RDF Triple Engine is responsible for manipulating triples and executing semantic queries  Based on Client/Server architecture with specialised RDF servers  Records in DBMS have graph structure  Not semantically atomic  Additional query processing added to RTE  RTE is aware of graph structure of triples  Able to produce results that reconstruct the graph structure and present in format specified by users

15 15 Property Resource Value frequency temperature daily domain temperature weather recorded temperature file name file ltgrid.dat url file www.cru.uea.ac.uk size file size_id value size_id 40 temperature file recorded domain weather frequency name daily size unit ltgrid.dat size_id 40 Kb value www.cru.uea.ac.uk url unit size_id Kb RDF graph for the Weather domain Relational structure of the RDF graph Retrieval of Semantic Information

16 16 Property Resource Value cru:URL hgt.1958.1000.6h.w1.53x21.dat.gz http://www.cru.uea.ac.uk/cru/data/ncep/window1/ 6hourly/pressure/hgt/hgt1000_6h cru:FormatType hgt.1958.1000.6h.w1.53x21.dat.gz ASCII cru:DataParameter hgt.1958.1000.6h.w1.53x21.dat.gz GeopotentialHeight_AtPressure rdfs:comment, hgt.1958.1000.6h.w1.53x21.dat.gz 6-Hourly GeopotentialHeight at 1000mb rdfs:domain hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height rdf:type cru:Height#genid2 Rdf:Seq rdf:_1 cru:Height#genid2 Compressed rdf:_2 cru:Height#genid2 Kilobyte rdf:_3 cru:Height#genid2 2593 cru:size hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height#genid2 rdf:type cru:Height#genid3 rdf:Seq rdf:_1 cru:Height#genid3 Frequency rdf:_2 cru:Height#genid3 Hour rdf:_3 cru:Height#genid3 6 cru:observation hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height#genid3 rdf:type cru:Height#genid4 rdf#Seq rdf:_1 cru:Height#genid4 TimeRange rdf:_2 cru:Height#genid4 Year rdf:_3 cru:Height#genid4 1958 cru:period hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height#genid4 RDF instance“MetInstance”converted into a relational table Retrieval of Semantic Information

17 17 Retrieval of Semantic Information  RTE relies on SQL query processor to extract relevant triples  Semantics Retrieval Language (SRL) prototype developed  SQL-similar syntax Example DESCRIBE RESOURCE “hgt.1958.1000.6h.w1.53x21.dat.gz”; Processing of the above SRL query Step 1: Transform the query into a standard SQL sentence and submit it to DB2 SELECT * FROM MetInstance WHERE RESOURCE=“hgt.1958.1000.6h.w1.53x21.dat.gz”;

18 18 Retrieval of Semantic Information Step 2 RTE applies the rules to generate XML as the output: 1. Extract name space prefixes and generate XML namespace node. 2. For all (real) atomic value create XML elements with Property values as XML elements 3. For all non-atomic values, create XML nodes as sub-elements of the resources where they appear as values 4. Ensure that if the node type is Seq container, all elements must be ordered

19 19 Conclusion  RTE-DBS approach enables querying and retrieval of semantic information from scientific data files available on the Web  Such retrieved information can be further processed by a machine or used by humans  Future work will be based on building a user interface into RTE to maintain individual triples to prevent removal of triples who are nodes  A method for for identifying data semantics of data sets, based on reasoning over semantic cases will be developed


Download ppt "1 Semantic Web and Retrieval of Scientific Data Semantics Goran Soldar University of Brighton UK Dan Smith University of East Anglia UK."

Similar presentations


Ads by Google