1 Introduction to the Semantic Web for Bioinfomatics Ken Baclawski Northeastern University.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
19 June 2008GBC/ACM Monthly Meeting 1 The Semantic Web: Its not just for searching anymore! Ken Baclawski Northeastern University Vistology.
Requirements. UC&R: Phase Compliance model –RIF must define a compliance model that will identify required/optional features Default.
Tutorial on the Semantic Web Ken Baclawski Northeastern University Versatile Information Systems.
Using DAML format for representation and integration of complex gene networks: implications in novel drug discovery K. Baclawski Northeastern University.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
An Introduction to RDF(S) and a Quick Tour of OWL
CS570 Artificial Intelligence Semantic Web & Ontology 2
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
PR-OWL: A Framework for Probabilistic Ontologies by Paulo C. G. COSTA, Kathryn B. LASKEY George Mason University presented by Thomas Packer 1PR-OWL.
Chapter 8: Web Ontology Language (OWL) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley, 2005.
ModelicaXML A Modelica XML representation with Applications Adrian Pop, Peter Fritzson Programming Environments Laboratory Linköping University.
Description Logics. Outline Knowledge Representation Knowledge Representation Ontology Language Ontology Language Description Logics Description Logics.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
From SHIQ and RDF to OWL: The Making of a Web Ontology Language
Describing Syntax and Semantics
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
The Bayesian Web Adding Reasoning with Uncertainty to the Semantic Web
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
Introduction to the Semantic Web for Bioinfomatics Ken Baclawski Northeastern University.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Emerging Semantic Web Applications for the Life Sciences Ken Baclawski Northeastern University.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Practical RDF Chapter 1. RDF: An Introduction
1 XML Schemas. 2 Useful Links Schema tutorial links:
Dr. Azeddine Chikh IS446: Internet Software Development.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
The Semantic Web William M Baker
1 Representing Data with XML September 27, 2005 Shawn Henry with slides from Neal Arthorne.
Logics for Data and Knowledge Representation
Ming Fang 6/12/2009. Outlines  Classical logics  Introduction to DL  Syntax of DL  Semantics of DL  KR in DL  Reasoning in DL  Applications.
Emerging Semantic Web Commercialization Opportunities Ken Baclawski Northeastern University.
OWL 2 in use. OWL 2 OWL 2 is a knowledge representation language, designed to formulate, exchange and reason with knowledge about a domain of interest.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Requirements for RDF Validation Harold Solbrig Mayo Clinic.
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
22 September 2009 CS 6520 Methods of Software Development CS6520 Methods of Software Development: Representing Data Kenneth Baclawski College of Computer.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
PHS / Department of General Practice Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Knowledge representation in TRANSFoRm AMIA.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Mining the Biomedical Research Literature Ken Baclawski.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Problems with XML & XML Schemas XML falls apart on the Scalability design goal. 1.The order in which elements appear in an XML document is significant.
Web Technologies for Bioinformatics Ken Baclawski.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
Description of Information Resources: RDF/RDFS (an Introduction)
Of 38 lecture 6: rdf – axiomatic semantics and query.
Tutorial on the Semantic Web Ken Baclawski Northeastern University Versatile Information Systems.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
Ontology Technology applied to Catalogues Paul Kopp.
26/02/ WSMO – UDDI Semantics Review Taxonomies and Value Sets Discussion Paper Max Voskob – February 2004 UDDI Spec TC V4 Requirements.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
The Semantic Web By: Maulik Parikh.
Ontology.
ece 720 intelligent web: ontology and beyond
Chapter 9 Web Services: JAX-RPC, WSDL, XML Schema, and SOAP
Ontologies and Model-Based Systems Engineering
Ontology.
Presentation transcript:

1 Introduction to the Semantic Web for Bioinfomatics Ken Baclawski Northeastern University

2 The Problems The dramatic increase of bioinformatics data available in web-based systems and databases calls for novel processing methods. The high degree of complexity and heterogeneity of bioinformatics data and analysis requires integration methods. Information must be processed by a sequence of tools that often use different formats and data semantics.

Example of a complex data format 3

4 Flat File Records Consider the following records in a flat file: What do they mean?

5 Metadata NAME LENGTH FORMAT LABEL instudy 6 MMDDYY Date of randomization into study bmi 8 Num Body Mass Index. obesity 3 0=No 1=Yes Obesity (30.0 <= BMI) ovrwt 8 0=No 1=Yes Overweight (25 <= BMI < 30) Height 3 Num Height (inches) Wtkgs 8 Num Weight (kilograms) Weight 3 Num Weight (pounds) The explanation of what data means is called metadata or “data about data.” For a flat file or database the metadata is called the schema.

6 XML Data is Self-Describing <ATTLIST Interview RandomizationDate CDATA #REQUIRED BMI CDATA #IMPLIED Height CDATA #REQUIRED >

7 Attribute Types Attributes generally contain a specific kind of data such as numbers, dates and codes. XML does not include any capability for specifying kinds of data like these. XML Schema (XSD) allows one to specify data structures and data types. The syntax for XSD differs from that for DTDs, but it is easy to convert from DTD to XSD using the dtd2xsd.pl Perl script.

8 XSD Basic Types string Arbitrary text without embedded elements. decimal A decimal number of any length and precision. integer An integer of any length. This is a special case of decimal. There are many special cases of integer, such as positiveInteger and nonNegativeInteger. date A Gregorian calendar date. time An instant of time during the day, for example, 10:00. dateTime A date and a time instance during that date. duration A duration of time. gYear A Gregorian year. gYearMonth A Gregorian year and month in that year. boolean Either true or false. anyURI A web resource.

9 Element Hierarchy Element Hierarchy 9 XML elements can contain other elements. An XML document is a hierarchy of elements. But what does the hierarchy mean?

10 Formal Semantics Semantics is primarily concerned with sameness. It determines that two entities are the same in spite of appearing to be different. Number semantics: 5.1, 5.10 and 05.1 are all the same number. DNA sequence semantics: cctggacct is the same as CCTGGACCT. XML document semantics is defined by infosets.

XML infoset for carbon monoxide root molecule atomArraybondArray bond atom o1 carbon monoxide c1 o1 m1 c1 O C id title atomRefs id elementType 11

12 The Resource Description Framework RDF is a language for representing information about resources in the web. While RDF is expressed in XML, it has different semantics. RDF decouples information from the document where it is asserted. This has many advantages for data integration and interoperability.

13 RDF Semantics All relationships are explicit and labeled with a property resource. The distinction in XML between attribute and containment is dropped, but the containment relationship must be labeled on a separate level. This is called striping.

14

15 XSD vs. RDF XML semantics is based on infosets Meaning of hierarchy is implicit Support for data structures and types Data is contextual: element and document RDF semantics is based on graphs All relationships are explicit (self-describing) Uses only XSD basic data types Data is decoupled from any context

16 XML vs. RDF Terminology XMLRDF Element TypeClass Element InstanceResource Data attributeDatatypeProperty Reference attributeObjectProperty ContainmentProperty

m1 carbon monoxide Molecule c1 o1 Atom CO Bond atom rdf:type title rdf:type bond rdfs:subClassOf <Molecule rdf:id=“m1” title=“carbon monoxide”> </Bond atomRef RDF graph for carbon monoxide 17

18 RDF Triples RDF graphs consist of edges called triples because they have three components: subject, predicate and object. The semantics of RDF is determined by the set of triples that are explicitly asserted or inferred. In the chemical example, some of the triples are: – (m1, rdf:type, cml:Molecule) – (m1, cml:title, “carbon monoxide”) – (m1, cml:atom, c1) – (m1, cml:atom, o1) Properties are many-to-many relationships.

19 Web Ontology Language OWL classes can be constructed from other classes. Resources can be can be declared (or inferred) to be the same. Class constructors and resource equivalence are useful for interoperability. Properties can be constrained to be – Functional (many-to-one) – Inverse functional (database key)

20 Class Construction Concepts are generally defined in terms of other concepts. For example: The iridocorneal endothelial syndrome (ICE) is a disease characterized by corneal endothelium proliferation and migration, iris atrophy, corneal oedema and/or pigmentary iris nevi. ICE-Syndrome class is the intersection of: – The set of all diseases – The set of things that have at least one of the four symptoms

Example of Class Construction 21

22 OWL Semantics An OWL ontology defines a theory of the world. States of the world that are consistent with the theory are called interpretations of the theory. A fact that is true in every model is said to be entailed by the theory. OWL semantics is defined by entailment. By contrast relational database semantics is defined by constraints.

23 Open vs. Closed Worlds OWL assumes an open world, while databases assume a closed world. The advantage of the open world assumption is that it is more compatible with the web where one need not know all of the facts, and new facts are continually being added. The disadvantage is that operations (such as queries) are much more computationally complex.

24 The Semantic Web and Uncertainty There are many sources of uncertainty, such as measurements, unmodeled variables, and subjectivity. The Semantic Web is based on formal logic for which one can only assert facts that are unambiguously certain. The Bayesian Web is a proposal to add reasoning about certainty to the Semantic Web. The basis for the Bayesian Web is the concept of a Bayesian network.

25 Bayesian Web facilities Common interchange format Ability to refer to common variables (diseases, drugs,...) Context specification Authentication and trust Open hierarchy of probability distribution types Component based construction of BNs BN inference engines Meta-analysis services

26 Bayesian Web Capabilities Use a BN developed by another group as easily as navigating from one Web page to another. Perform stochastic inference using information from one source and a BN from another. Combine BNs from the same or different sources. Reconcile and validate BNs.

27 Ontology Issues 1 What is the most appropriate language? – XML, RDF, OWL (Lite, DL, Full) – The choice depends on the requirements Ontology design – Classes, properties and rules What tools are appropriate? – Design tools, rule engines, theorem provers Reuse vs. interoperation

28 Ontology Issues 2 Coping with complexity – Worst cases can be very complex – In practice, processing is efficient Validation – Correctness, formal consistency Maintenance – Requirements and circumstances change

29 To Learn More For more information, see K. Baclawski and T. Niu, Ontologies for Bioinformatics, MIT Press, October, The website the book is ontobio.org.ontobio.org A longer version of this talk is available at CSB2005 Tutorial. CSB2005 Tutorial Data fusion is covered in meta-analysis.meta-analysis