Semantic Web: promising technologies and current applications in Health care & Life Sciences Amit Sheth Thanks: Kno.e.sis team, collaborators at CCRC,

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for.
Trailblazing, Complex Hypothesis Evaluation, Abductive Reasoning and Semantic Web Trailblazing, Complex Hypothesis Evaluation, Abductive Reasoning and.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
RDB2RDF: Incorporating Domain Semantics in Structured Data Satya S. Sahoo Kno.e.sis CenterKno.e.sis Center, Computer Science and Engineering Department,
Semantic Web & Semantic Web Services: Applications in Healthcare and Scientific Research International IFIP Conference on Applications of Semantic Web.
Knowledge Enabled Information and Services Science Schema-Driven Relationship Extraction from Unstructured Text Cartic Ramakrishnan Kno.e.sis Center, Wright.
1 Schema-Driven Relationship Extraction from Unstructured Text Cartic Ramakrishnan, Krys Kochut and Amit Sheth LSDIS Lab, University of Georgia, Athens,
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Knowledge Enabled Information and Services Science Semantic Web for Health Care and Biomedical Informatics Keynote at NSF Biomed Web Workshop, December.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Semantic Web Technology in Support of Bioinformatics for Glycan Expression Amit Sheth Large Scale Distributed Information Systems (LSDIS) lab, Univ. of.
Accelerate Business Success With CRM CRM Interoperability.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
1 Digital Libraries and Evidence in the Developing World Context Dr. Jon Ferguson Senior Health Database Scientist IMMPACT Project University of Aberdeen.
Overview of Search Engines
Redefining Perspectives A thought leadership forum for technologists interested in defining a new future June COPYRIGHT ©2015 SAPIENT CORPORATION.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Semantic Web applications in Financial Industry, Government, Health care and Life Sciences SWEG 2006, March 2006 Amit Sheth LSDIS Lab, Department of Computer.
Knowledge Enabled Information and Services Science GlycO.
Kno.e.sis Center, Wright State University,
Life Sciences Integrated Demo Joyce Peng Senior Product Manager, Life Sciences Oracle Corporation
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Flexible Text Mining using Interactive Information Extraction David Milward
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
KMS Products By Justin Saunders. Overview This presentation will discuss the following: –A list of KMS products selected for review –The typical components.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Semantic empowerment of Life Science Applications October 2006 Amit Sheth LSDIS Lab, Department of Computer Science, University of Georgia Acknowledgement:
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Knowledge Enabled Information and Services Science SAWSDL: Tools and Applications Amit P. Sheth Kno.e.sis Center Wright State University, Dayton, OH Knoesis.wright.edu.
Semantic (Web) Technology in Action - today The Semantic Web – Scientific American article considered harmful? WWW2003 Panel (PN2), Budapest, May 21, 2003.
ADVANCED DB SYSTEMS BIOMEDICAL ENGINEERING. Index INTRODUCTION  BIOMEDICAL ENGINEERING  B.E. DATASETS APPLICATIONS  DATA MINING ON FDA DATABASE  ONTOLOGY-BASED.
Knowledge Enabled Information and Services Science Glycomics project overview.
OWL Representing Information Using the Web Ontology Language.
Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Knowledge Enabled Information and Services Science Relationship Web: Realizing the Memex vision with the help of Semantic Web SemGrail Workshop, Redmond,
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Mining the Biomedical Research Literature Ken Baclawski.
Unsupervised Discovery of Compound Entities for Relationship Extraction Cartic Ramakrishnan, Pablo N. Mendes Shaojun Wang, Amit P. Sheth
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
An Ontological Approach to Financial Analysis and Monitoring.
Building Enterprise Applications Using Visual Studio®
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
PolyAnalyst Data and Text Mining tool
Web Mining Department of Computer Science and Engg.
Collaborative RO1 with NCBO
Amit Sheth, CTO, Semagix Inc
Presentation transcript:

Semantic Web: promising technologies and current applications in Health care & Life Sciences Amit Sheth Thanks: Kno.e.sis team, collaborators at CCRC, NLM, CCHMC, W3HCLS

Note: All demos are at

Death by data: amount, variety, sources How to exploit this data? Silver Bullet – SEMANTICS (approach) & Semantic Web (technology) Applications to show the value –Science (health care and biomedicine) [today’s focus] –Industry (financial services) –Government (intelligence) Outline

Not data (search), but integration, analysis and insight, leading to decisions and discovery

Data captured per year = 1 exabyte (10 18 ) (Eric Neumann, Science, 2005) Multiple formats: Structured, unstructured, semi-structured Multimodal: text, image, a/v, sensor, scientific/engineering Thematic, Spatial, Temporal Enterprise to Globally Distributed Death by Data: Size, Heterogeneity & Complexity

What? Moving from Syntax/Structure to Semantics Is There A Silver Bullet?

Approach & Technologies Semantics: Meaning & Use of Data Semantic Web: Labeling data on the Web so both humans and machines can use them more effectively i.e., Formal, machine processable description  more automation; emerging standards/technologies (RDF, OWL, Rules, …)

How? Ontology: Agreement with Common Vocabulary & Domain Knowledge Semantic Annotation: metadata (manual & automatic metadata extraction) Reasoning: semantics enabled search, integration, analysis, mining, discovery Is There A Silver Bullet?

Agreement Common Nomenclature/Vocabulary Conceptual Model with associated knowledgebase (ground truth/facts) for an industry, market, field of science, activity In some domains, extensive building of open-source ontologies, in others, build as you go Ontology

Time, Space Gene Ontology, Glycomics, Proteomics Pharma Drug, Treatment-Diagnosis Repertoire Management Equity Markets Anti-money Laundering, Financial Risk, Terrorism Biomedicine is one of the most popular domains in which lots of ontologies have been developed and are in use. See: Clinical/medical domain is also a popular domain for ontology development and applications: Ontology Examples

Building ontology Three approaches: social process/manual: many years, committees –Can be based on metadata standard automatic taxonomy generation (statistical clustering/NLP): limitation/problems on quality, dependence on corpus, naming Descriptional component (schema) designed by domain experts; Description base (assertional component, extension) by automated processes

© Semagix, Inc. A Commercial Semantic Application Building Platform Semagix Freedom for building ontology-driven information system

Empirical Observations Modeling human activity versus modeling the natural world Very simple to very complex (deep hierarchy, constraints) Few classes and relationships to hundreds of classes and relationships Few instances to millions of relationships Few sources of knowledge/expertise to many

GlycO is a focused ontology for the description of glycomics models the biosynthesis, metabolism, and biological relevance of complex glycans models complex carbohydrates as sets of simpler structures that are connected with rich relationships An ontology for structure and function of Glycopeptides Published through the National Center for Biomedical Ontology (NCBO) More at:

An ontology for capturing process and lifecycle information related to proteomic experiments Two aspects of glycoproteomics: What is it? → identification How much of it is there? → quantification Heterogeneity in data generation process, instrumental parameters, formats Need data and process provenance → ontology-mediated provenance Hence, ProPreO models both the glycoproteomics experimental process and attendant data Approx 500 classes, 3million+ instances Published through the National Center for Biomedical Ontology (NCBO) and Open Biomedical Ontologies (OBO) More info. On Knowledge Representation in Life Sciences at Kno.e.sisOn Knowledge Representation in Life Sciences at Kno.e.sis ProPreO ontology

N-Glycosylation metabolic pathway GNT-I attaches GlcNAc at position 2 UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 GNT-V attaches GlcNAc at position 6 UDP-N-acetyl-D-glucosamine + G00020 UDP + G00021 N-acetyl-glucosaminyl_transferase_V N-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4

Pathway Steps - Reaction Evidence for this reaction from three experiments Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia

Pathway Steps - Glycan Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia Abundance of this glycan in three experiments

Annotation of -Textual data -Experimental Data -Structured Data -Rich Media -Etc.

Semantic Annotation – Elsevier iConsult content

Semantic annotation of scientific/experimental data

N-Glycosylation Process (NGP) Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I Glycopeptides Fraction n*m n Signal integration Data correlation Peptide Fraction ms datams/ms data ms peaklist ms/ms peaklist Peptide listN-dimensional array Glycopeptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Peptide identification binning n 1

parent ion m/z fragment ion m/z ms/ms peaklist data fragment ion abundance parent ion abundance parent ion charge ProPreO: Ontology-mediated provenance Mass Spectrometry (MS) Data

Workflow based on Web Services = Web Process

Semantic Web Process to incorporate provenance Storage Standard Format Data Raw Data Filtered Data Search Results Final Output Agent Biological Sample Analysis by MS/MS Raw Data to Standard Format Data Pre- process DB Search (Mascot/ Sequest) Results Post- process (ProValt) OIOIOIOIO Biological Information Semantic Annotation Applications ISiS – Integrated Semantic Information and Knowledge System

Evaluate the specific effects of changing a biological parameter: Retrieve abundance data for a given protein expressed by three different cell types of a specific organism. Retrieve raw data supporting a structural assignment: Find all the raw ms data files that contain the spectrum of a given peptide sequence having a specific modification and charge state. Detect errors: Find and compare all peptide lists identified in Mascot output files obtained using a similar organism, cell-type, sample preparation protocol, and mass spectrometry conditions. ProPreO concepts highlighted in red A Web Service Must Be Invoked Semantic Annotation Facilitates Complex Queries

Semantic Biological Web Services Registry

Converting biological information to the W3C Resource Description Framework (RDF): Experience with Entrez Gene Collaboration with Dr. Olivier Bodenreider (US National Library of Medicine, NIH, Bethesda, MD)

Entrez Biomedical Knowledge Repository …. Biomedical Knowledge Repository

XSLT Entrez GeneEntrez Gene XML Entrez Gene RDF graph Entrez Gene RDF Implementation

XSLT ENTREZ GENEENTREZ GENE XML ENTREZ GENE RDF GRAPH ENTREZ GENE RDF …. Web interface

APP (geneid-351)Alzheimer’s Disease eg:has_protein_reference_name_E subjectpredicateobject RDF Graph

Entrez Gene RDF graph (W3C Validator Site - RDF Graph

APP gene [Homo sapiens] APP gene [Gallus gallus] APP gene [Canis familiaris ] protease nexin-II amyloid beta A4 protein amyloid-beta protein A4 amyloid protein beta-amyloid peptide amyloid beta (A4) precursor protein (protease nexin-II, Alzheimer disease) cerebral vascular amyloid peptide amyloid protein eg:has_protein_reference_name_E amyloid beta A4 protein Human APP gene is implicated in Alzheimer's disease. Which genes are functionally homologous to this gene? Connecting different genes

Rules are objects that allow inference from RDF data [1] Oracle 10g allows the creation of rulebase based on RDFS (RDF Schema) eg:Neurodegenerative Diseases eg:Gene-track_geneid/351 amyloid beta (A4) precursor protein (protease nexin-II, Alzheimer disease) eg:has_protein_reference_name_E eg:is_associated_with Inference

Raw2mzXMLmzXML2PklPkl2pSplitMASCOT SearchProVault Raw mzXMLPklpSplit MACOT result ProVault result Experimental Data Semantic Annotation Metadata File SPARQL query-based User Interface Semantic Metadata Registry PROTEOMECOMMONS PROTEOMICS WORKFLOW ProPreO ontology EXPERIMENTAL DATA Have I performed an error? Give me all result files from a similar organism, cell, preparation, mass spectrometric conditions and compare results. Is the result erroneous? Give me all result files from a similar organism, cell, preparation, mass spectrometric conditions and compare results. Integrated Semantic Information and knowledge System (Isis)

Applications: –search –integration –analysis –discovery

Content Exploitation Understand and leverage siloed data Increase worker productivity Better KM across enterprises Knowledge Discovery Access/leverage universe of data More accurate competitive/threat assessment Competitive Advantage Outmaneuver competitors Improve enterprise decision making Less damage control AML Comply with current/future regulations Ensure broker/trade compliance Reduce risks and costs Enhance CRM Homeland Security Improve intelligence gathering/analysis Enable information sharing/preserve security Create effective first responder programs Pharmaceuticals Represent/update known data Expedite drug discovery process Enhance speed-to- market Reduce redundancy Horizontal Needs Industry Needs

BLENDED BROWSING & QUERYING INTERFACE ATTRIBUTE & KEYWORD QUERYING uniform view of worldwide distributed assets of similar type SEMANTIC BROWSING Targeted e-shopping/e-commerce assets access VideoAnywhere and Taalee Semantic Search Engine (2000)

Focused relevant content organized by topic (semantic categorization) Automatic Content Aggregation from multiple content providers and feeds Related relevant content not explicitly asked for (semantic associations) Competitive research inferred automatically Automatic 3 rd party content integration Equity Research Dashboard with Blended Semantic Querying and Browsing

Active Semantic Medical Records Active Semantic Medical Records ( An operational health care application at the Athens Heart Center since 01/2006 that uses multiple ontologies, semantic annotations and rule based decision support. ) Goals: Increase efficiency Reduce Errors, Improve Patient Satisfaction & Reporting Improve Profitability (better billing) Technologies: Medical Records in XML created from database Ontologies (Practice, CPT/ICD-9, Drug), semantic annotations & rules Service Oriented Architecture Semantic Applications: Health Care Thanks -- Dr. Agrawal, Dr. Wingeth, and others. ISWC2006 paperISWC2006 paper

System though out the practice

Active Semantic Document (ASD) A document (typically in XML) with the following features: Semantic annotations –Linking entities found in a document to ontology –Linking terms to a specialized lexicon Actionable information –Rules over semantic annotations –Violated rules can modify the appearance of the document (Show an alert)

Annotate ICD9s Annotate Doctors Lexical Annotation Level 3 Drug Interaction Insurance Formulary Drug Allergy Active Semantic Doc with 3 Ontologies

Passenger Threat Analysis Need to Know -> Demo Financial Irregularity * * a classified application Primary Funding by ARDA, Secondary Funding by NSF Semantic Web Applications in Government

Aim: Legislation (PATRIOT ACT) requires banks to identify ‘who’ they are doing business with Problem Volume of internal and external data needed to be accessed Complex name matching and disambiguation criteria Requirement to ‘risk score’ certain attributes of this data Approach Creation of a ‘risk ontology’ populated from trusted sources (OFAC etc); Sophisticated entity disambiguation Semantic querying, Rules specification & processing Solution Rapid and accurate KYC checks Risk scoring of relationships allowing for prioritisation of results; Full visibility of sources and trustworthiness Semantic Application in a Global Bank

Watch list Organization Company Hamas WorldCom FBI Watchlist Ahmed Yaseer appears on Watchlist member of organization works for Company Ahmed Yaseer: Appears on Watchlist ‘FBI’ Works for Company ‘WorldCom’ Member of organization ‘Hamas’ The Process

Example of Fraud Prevention application used in financial services User will be able to navigate the ontology using a number of different interfaces World Wide Web content Public Records BLOGS, RSS Un-structure text, Semi-structured Data Watch Lists Law Enforcement Regulators Semi-structured Government Data Scores the entity based on the content and entity relationships Establishing New Account Global Investment Bank

Semantic Browser: contextual browsing of PubMed Semantic Browser Semantic Applications: Life Science

Now possible – Extracting relationships between MeSH terms from PubMed Biologically active substance Lipid Disease or Syndrome affects causes affects causes complicates Fish Oils Raynaud’s Disease ??????? instance_of UMLS Semantic Network MeSH PubMed 9284 documents 4733 documents 5 documents

Schema-Driven Extraction of Relationships from Biomedical Text Cartic RamakrishnanCartic Ramakrishnan, Krys Kochut, Amit P. Sheth: A Framework for Schema- Driven Relationship Discovery from Unstructured Text. International Semantic Web Conference 2006: [.pdf]Krys KochutAmit P. ShethInternational Semantic Web Conference 2006[.pdf]

Method – Parse Sentences in PubMed SS-Tagger (University of Tokyo) SS-Parser (University of Tokyo) (TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ exogenous) ) (NN stimulation) ) (PP (IN by) (NP (NN estrogen) ) ) ) (VP (VBZ induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT the) (NN endometrium) ) ) ) ) ) ) Entities (MeSH terms) in sentences occur in modified forms “adenomatous” modifies “hyperplasia” “An excessive endogenous or exogenous stimulation” modifies “estrogen” Entities can also occur as composites of 2 or more other entities “adenomatous hyperplasia” and “endometrium” occur as “adenomatous hyperplasia of the endometrium”

Method – Identify entities and Relationships in Parse Tree TOP NP VP S NP VBZ induces NP PP NP IN of DT the NN endometrium JJ adenomatous NN hyperplasia NP PP IN by NN estrogen DT the JJ excessive ADJP NN stimulation JJ endogenous JJ exogenous CC or MeSHID D MeSHID D MeSHID D UMLS ID T147 Modifiers Modified entities Composite Entities

Resulting Semantic Web Data in RDF Modifiers Modified entities Composite Entities

Blazing Semantic Trails in Biomedical Literature Cartic RamakrishnanCartic Ramakrishnan, Amit P. Sheth: Blazing Semantic Trails in Text: Extracting Complex Relationships from Biomedical Literature. Tech. Report #TR-RS2007 [.pdf]Amit P. Sheth [.pdf]

Relationships -- Blazing the Trails “The physician, puzzled by her patient's reactions, strikes the trail established in studying an earlier similar case, and runs rapidly through analogous case histories, with side references to the classics for the pertinent anatomy and histology. The chemist, struggling with the synthesis of an organic compound, has all the chemical literature before him in his laboratory, with trails following the analogies of compounds, and side trails to their physical and chemical behavior.” [V. Bush, As We May Think. The Atlantic Monthly, (1): p ]

Once you have Semantic Web Data

Original documents PMID PMID

Semantic Trail

Complex relationships connecting documents – Semantic Trails

Semantic Browser

Hypothesis Driven retrieval of Scientific Literature PubMed Complex Query Supporting Document sets retrieved Migraine Stress Patient affects isa Magnesium Calcium Channel Blockers inhibit Keyword query: Migraine[MH] + Magnesium[MH]

World class research center- coupled with daytaOhio for tech transfer and commercialization Core expertise in –data management: integration, mining, analytics, visualization –distributed computing: services/grid computing –Semantic Web –Bioinformatics, etc. With domain/application expertise in Government, Industry, Biomedicine Member of World Wide Web Consortium and extensive industry relationships Kno.e.sis

Interns, future employees Targets research & prototyping dataOhio supported technology transfer/incubator Joint projects (e.g., C3 funding from DoD) Guidance on using standards and technologies “increasing the value of data to you and your customers” Invitation to work with the center

Introducing