The Semantic Web: New-style data-integration (and how it works for life-scientists too!) Frank van Harmelen AI Department Vrije Universiteit Amsterdam.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Ontology Assessment – Proposed Framework and Methodology.
Languages on the Semantic Web Frank van Harmelen Vrije Universiteit Amsterdam Ian Horrocks University of Manchester.
An ontology server for the agentcities.NET project Dr. Manjula Patel Technical Research and Development
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
10-Sep-02 Page 1 Gadjah Mada University - Yogyakarta - Indonesia Gadjah Mada University10-Sep-02 Page 1 Gadjah Mada University - Yogyakarta - Indonesia.
CS570 Artificial Intelligence Semantic Web & Ontology 2
Semantic Web Agents: Hope or Hype Nicholas Gibbins School of Electronics and Computer Science University of Southampton.
RDF Briefing Frank van Harmelen Vrije Universiteit Amsterdam.
The Semantic Web: New-style data-integration (and how it works for life-scientists too!) Frank van Harmelen AI Department Vrije Universiteit Amsterdam.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
Semantic Web research anno 2006: main streams, popular falacies, current status, future challenges Frank van Harmelen Vrije Universiteit Amsterdam.
Information and Business Work
Ontology Notes are from:
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Bioinformatics Databases: Fundamental Concepts of Database Technology & Data Organization Kristen Anton Director of BioInformatics Dartmouth Medical School.
The Semantic Web: New-style data-integration (and how it works for life-scientists too!) Frank van Harmelen AI Department Vrije Universiteit Amsterdam.
1 Bluffers Guide to The Semantic Web Frank van Harmelen CS Department Vrije Universiteit Amsterdam Data wants to be free.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Bioinformatics Databases: Fundamentals of Database Technology & Data Organization Kristen Chambers Director of Bioinformatics Dartmouth Medical School.
Implementing Metadata Marjorie M K Hlava, President Access Innovations, Inc. Albuquerque, NM
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
Knowledge based Learning Experience Management on the Semantic Web Feng (Barry) TAO, Hugh Davis Learning Society Lab University of Southampton.
Clément Troprès - Damien Coppéré1 Semantic Web Based on: -The semantic web -Ontologies Come of Age.
Logics for Data and Knowledge Representation
Diane E. Beck, Pharm.D. Director of Educational & Faculty Development and Professor College of Pharmacy University of Florida Unit B Module 2.1 Finding.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Copyright OpenHelix. No use or reproduction without express written consent1.
Building a Topic Map Repository Xia Lin Drexel University Philadelphia, PA Jian Qin Syracuse University Syracuse, NY * Presented at Knowledge Technologies.
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
CODE (Committee on Digital Environment) July 26, 2000 Rice University THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
- EVS Overview - Biomedical Terminology and Ontology Resources Frank Hartel, Ph.D. Director, Enterprise Vocabulary Services NCI Center for Bioinformatics.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Mining the Biomedical Research Literature Ken Baclawski.
Workshop on The Transformation of Science Max Planck Society, Elmau, Germany June 1, 1999 TOWARDS INFORMATIONAL SCIENCE Indexing and Analyzing the Knowledge.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース GDGDB - Glyco-Disease Genes Database The complexity of glycan metabolic pathways.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Information Retrieval
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース PACDB - Pathogen Adherence to Carbohydrate Database The Pathogen Adherence.
Ontologies for the Semantic Web Prepared By: Tseliso Molukanele Rapelang Rabana Supervisor: Associate Professor Sonia Burman 20 July 2005.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
26/02/ WSMO – UDDI Semantics Review Taxonomies and Value Sets Discussion Paper Max Voskob – February 2004 UDDI Spec TC V4 Requirements.
SEMANTIC WEB Presented by- Farhana Yasmin – MD.Raihanul Islam – Nohore Jannat –
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The Semantic Web By: Maulik Parikh.
Scientific Reproducibility using the Provenance for Healthcare and Clinical Research Framework Satya S. Sahoo Collaborators/Co-Authors: Joshua Valdez,
Bio68: Bioinformatics Databases
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Service-enabling in Financial Domain
Metadata Framework as the basis for Metadata-driven Architecture
LOD reference architecture
Presentation transcript:

The Semantic Web: New-style data-integration (and how it works for life-scientists too!) Frank van Harmelen AI Department Vrije Universiteit Amsterdam

What’s the problem? (data-mess in bio-inf)

Source: PhRMA & FDA 2003 Pharmaceutical Productivity

The Industry’s Problem Too much unintegrated data: –from a variety of incompatible sources –no standard naming convention –each with a custom browsing and querying mechanism (no common interface) –and poor interaction with other data sources Kenneth Griffiths and Richard Resnick Tut. At Intell. Systems for Molec. Biol., 2003

What are the Data Sources? Flat Files URLs Proprietary Databases Public Databases Data Marts Spreadsheets s …

Sample Problem: Hyperprolactinemia Over production of prolactin –prolactin stimulates mammary gland development and milk production Hyperprolactinemia is characterized by: –inappropriate milk production –disruption of menstrual cycle –can lead to conception difficulty

Understanding transcription factors for prolactin production “Show me all genes in the public literature that are putatively related to hyperprolactinemia, have more than 3-fold expression differential between hyperprolactinemic and normal pituitary cells, and are homologous to known transcription factors.” “Show me all genes that are homologous to known transcription factors” SEQUENCE “Show me all genes that have more than 3-fold expression differential between hyperprolactinemic and normal pituitary cells” EXPRESSION “Show me all genes in the public literature that are putatively related to hyperprolactinemia” LITERATURE (Q1Q2Q3)(Q1Q2Q3)

The Medical tower of Babel Mesh l Medical Subject Headings, National Library of Medicine l descriptions EMTREE l Commercial Elsevier, Drugs and diseases l terms, synonyms UMLS l Integrates 100 different vocabularies SNOMED l concepts, College of American Pathologists Gene Ontology l terms in molecular biology NCI Cancer Ontology: l 17,000 classes (about 1M definitions),

Stitching this all together by hand? Source: Stephens et al. J Web Semantics 2006

Why would Semantic technology help?

machine accessible meaning (What it’s like to be a machine) symptoms drug administration disease IS-A alleviates META-DATA

What is meta-data? it's just data it's data describing other data its' meant for machine consumption disease name symptoms drug administration

Required are: 1. one or more standard vocabularies l so search engines, producers and consumers all speak the same language 2. a standard syntax, l so meta-data can be recognised as such 3. lots of resources with meta-data attached mechanisms for attribution and trust is this page really about Pamela Anderson?

no shared understanding Conceptual and terminological confusion Actors: both humans and machines Agree on a conceptualization Make it explicit in some language. world concept language What are ontologies & what are they used for

standard vocabularies (“Ontologies”) Identify the key concepts in a domain Identify a vocabulary for these concepts Identify relations between these concepts Make these precise enough so that they can be shared between l humans and humans l humans and machines l machines and machines

Biomedical ontologies (a few..) Mesh l Medical Subject Headings, National Library of Medicine l descriptions EMTREE l Commercial Elsevier, Drugs and diseases l terms, synonyms UMLS l Integrates 100 different vocabularies SNOMED l concepts, College of American Pathologists Gene Ontology l terms in molecular biology NCBI Cancer Ontology: l 17,000 classes (about 1M definitions),

Remember “required are”: ü one or more standard vocabularies l so search engines, producers and consumers all speak the same language 2. a standard syntax, l so meta-data can be recognised as such 3. lots of resources with meta-data attached

Stack of languages

XML: l Surface syntax, no semantics XML Schema: l Describes structure of XML documents RDF: l Datamodel for “relations” between “things” RDF Schema: l RDF Vocabular Definition Language OWL: l A more expressive Vocabular Definition Language

Remember “required are”: ü one or more standard vocabularies l so search engines, producers and consumers all speak the same language ü a standard syntax, l so meta-data can be recognised as such 3. lots of resources with meta-data attached

Question: who writes the ontologies? Professional bodies, scientific communities, companies, publishers, …. See previous slide on Biomedical ontologies l Same developments in many other fields Good old fashioned Knowledge Engineering Convert from DB-schema, UML, etc.

Question: Who writes the meta-data ? -Automated learning -shallow natural language analysis -Concept extraction amsterdam trade antwerp europe amsterdam merchant city town center netherlands merchant city town Example: Encyclopedia Britannica on “Amsterdam”

exploit existing legacy-data l Databases l Lab equipment l (Amazon) side-effect from user interaction l keyword extraction NOT from manual effort Question: Who writes the meta-data ?

Remember “required are” ü one or more standard vocabularies l so search engines, producers and consumers all speak the same language ü a standard syntax, l so meta-data can be recognised as such lots of resources with meta-data attached

Some working examples? DOPE

DOPE: Background Vertical Information Provision l Buy a topic instead of a Journal ! l Web provides new opportunities Business driver: drug development l Rich, information-hungry market l Good thesaurus (EMTREE)

The Data Document repositories: l ScienceDirect: approx fulltext articles l MEDLINE: approx abstracts Extracted Metadata l The Collexis Metadata Server: concept- extraction ("semantic fingerprinting") Thesauri and Ontologies l EMTREE: preferred terms synonyms

RDF Schema EMTREE Query interface RDF Datasource 1 RDF Datasource n …. Architecture:

Ontology disambiguates query

Ontology groups results

Ontology clusters results

Ontology refines query

Some working examples? DOPE HCLS (

RDF Schema EMTREE Query interface RDF Datasource 1 RDF Datasource n …. Architecture: RDF Schema Gene Ontology ….

Summarising… Data integration on the Web: l machine processable data besides human processable data Syntax for meta-data l Representation l Inference Vocabularies for meta-data l Lot’s of them in bio-inf. Actual meta-data: l Lot’s in bio-inf. Will enable: l Better search engines (recall, precision, concepts) l Combining information across pages (inference) l …

Things to do for you Practical: Use existing software to construct new use-scenario’s Conceptual: Create on ontology for some area of bio-medical expertise l from scratch l as a refinement of an existing ontology Technical: Transform an existing data-set in meta-data format, and provide a query interface (for humans and machines)