LSDIS Lab, Department of Computer Science,

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for.
Semantic empowerment of Health Care and Life Science Applications WWW 2006 W3C Track, May WWW 2006 W3C Track, May Amit Sheth LSDIS LabLSDIS.
© 2007 IBM Corporation Enterprise Content Management Integrating Content, Process, and Connectivity for Competitive Advantage Malcolm Holden October 2007.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Semantic Web & Semantic Web Services: Applications in Healthcare and Scientific Research International IFIP Conference on Applications of Semantic Web.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
AceMedia Personal content management in a mobile environment Jonathan Teh Motorola Labs.
Semantics powered Bioinformatics Amit Sheth, William S. York, et al Large Scale Distributed Information Systems Lab & Complex Carbohydrate Research Center.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Data Mining Techniques
Semantic Web applications in Financial Industry, Government, Health care and Life Sciences SWEG 2006, March 2006 Amit Sheth LSDIS Lab, Department of Computer.
Interoperability Scenario Producing summary versions of compound multimedia historical documents.
Knowledge Enabled Information and Services Science GlycO.
Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Part III: Biological Applications Keynote - the.
PART IV: REPRESENTING, EXPLAINING, AND PROCESSING ALIGNMENTS & PART V: CONCLUSIONS Ontology Matching Jerome Euzenat and Pavel Shvaiko.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Andrew S. Budarevsky Adaptive Application Data Management Overview.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Semantic (Web) Technology in Action - today The Semantic Web – Scientific American article considered harmful? WWW2003 Panel (PN2), Budapest, May 21, 2003.
Knowledge Enabled Information and Services Science Glycomics project overview.
From Domain Ontologies to Modeling Ontologies to Executable Simulation Models Gregory A. Silver Osama M. Al-Haj Hassan John A. Miller University of Georgia.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Aim Ability to automate the detection of financial inconsistency and irregularity Problem Need to create a unified and logically rigorous terminology.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Managing Data Resources File Organization and databases for business information systems.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Overview of MDM Site Hub
Semantic Visualization
Data challenges in the pharmaceutical industry
Distributed web based systems
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Amit Sheth LSDIS Lab & Semagix University of Georgia
MANAGING DATA RESOURCES
Data Warehousing and Data Mining
ece 627 intelligent web: ontology and beyond
Database Systems Instructor Name: Lecture-3.
A perspective on proteomics in cell biology
Session 2: Metadata and Catalogues
Overview of Oracle Site Hub
Accelerating Research in Life Sciences
Information Networks: State of the Art
Collaborative RO1 with NCBO
Business Process Management and Semantic Technologies
Scalable and Efficient Reasoning for Enforcing Role-Based Access Control
Analytics, BI & Data Integration
Presentation transcript:

LSDIS Lab, Department of Computer Science, Semantic Web applications in Financial Industry, Government, Health care and Life Sciences SWEG 2006, March 2006 Amit Sheth LSDIS Lab, Department of Computer Science, University of Georgia http://lsdis.cs.uga.edu

SW research @ LSDIS Ontology design and population Semantic Middleware SeNS Ontology design and population Automatic Metadata Extraction Semantic Annotations (text, medical docs, scientific data) Semantic Computations: (Inference), Rules, Complex Relationships, Knowledge Discovery, Semantic Associations Semantic Visualization Active Semantic Documents Semantic Web Services/Processes Semantic Applications: Bioinformatics, Health Care, Intelligence/Gov., (Commercial: Risk & Compliance, Content Aggregators) Semantics Enabled Networking SemDis SemViz WSDL-S METEOR-S Bioinformatics for Glycan Expressions Semantic Applications

Part II: Semantic Web Applications in Government Passenger Threat Analysis Need to Know -> Demo* Financial Irregularity *on the Web: Google “SemDis”, go to: NeedToKnow Primary Funding by ARDA, Secondary Funding by NSF

Financial Irregularity Aim Ability to automate the detection of financial inconsistency and irregularity Problem Need to create a unified and logically rigorous terminology of financial domain Need to integrate data from multiple disparate structured and semi-structured sources Need to create, store, update and execute analytic formulas on financial data

Financial Irregularity Approach Creation of financial domain ontology, populated from trusted sources Creation of multiple extractors to disambiguate data and form relevant relationships Creation of framework for mathematical formula/rule specification and semantic querying of ontology

Financial Irregularity Solution Developed ontology schema for financial domain using modeling capabilities of Semagix Freedom toolkit Extracted, merged, and linked financial data from multiple sources using the extraction and disambiguation capabilities of Semagix Freedom toolkit Utilized MathML, a Mathematical Markup Language, to represent mathematical formulas and rules Extended MathML to include ability to represent RDF subgraphs of paths through the financial ontology

Financial Irregularity

Financial Irregularity Subset of Financial Domain Ontology

Financial Irregularity Graphical User Interface

Financial Irregularity Creation of financial asset variable “bank account value”

Financial Irregularity Creation of financial asset variable “bank account value”

Financial Irregularity Creation of financial liability variable “loan value”

Financial Irregularity Creation of financial formula “solvency ratio”

Financial Irregularity Creation of financial rule “solvency ratio check”

Financial Irregularity Result display of “solvency ratio check” rule execution

Semantic Visualization

Semantic Visualization Aim Provide a comprehensive visualization and interactive search and analytics interface for exploiting Semantic Web capabilities Problem Need for intuitive visualization of highly expressive ontologies (e.g., complex carbohydrate molecules) Need for intuitive visual display of semantic analytics showing "connections between the dots" between heterogeneous documents and multi-modal content Need for graphical tracking and association of activities to discover semantic associations between events using thematic and topological relations

Semantic Visualization Solution OntoVista is an ontology visualization tool, with unique capabilities related to complex (representationally rich) biological and biochemical ontologies. Semantic Analytics Visualization (SAV) is a 3D visualization tool for Semantic Analytics. It has the capability for visualizing ontologies and meta-data including annotated web documents, images, and digital media such as audio and video clips in a synthetic three-dimensional semi-immersive environment. Semantic EventTracker (SET) is a highly interactive visualization tool for tracking and associating activities (events) in a Spatially Enriched Virtual Environment (SEVE).

GlycO – A domain ontology for glycans GlycO is a domain ontology to capture knowledge of the structure and function of glycans. It is a comprehensive ontology with 770 classes with extensive relationships and specific constraints. This real world ontology built with and by domain experts is 11 levels deep for some classes. The above is a snapshot of the GlycO ontology, using the GlycoVista visualization tool developed by the LSDIS lab. The parameters defined for this view include specification of the number of hops i.e. number of relationships between two concepts, in this case the value being 20. The other parameters being used are the types of relationships to be displayed (selectable from a list), the types of nodes and also types of arcs (outgoing as well as incoming in this case). This slide conveys the complexity, breadth and density of classes and their relationships in a real-world domain ontology.

OntoVista representation of Glycan Molecule (with monosaccharide residue composition)

Pathway representation in GlycO Pathways do not need to be explicitly defined in GlycO. The residue-, glycan-, enzyme- and reaction descriptions contain the knowledge necessary to infer pathways.

Zooming in a little … Reaction R05987 catalyzed by enzyme 2.4.1.145 adds_glycosyl_residue N-glycan_b-D-GlcpNAc_13 The N-Glycan with KEGG ID 00015 is the substrate to the reaction R05987, which is catalyzed by an enzyme of the class EC 2.4.1.145. The product of this reaction is the Glycan with KEGG ID 00020. Specific reactions of the kind enzyo:R?????, e.g. enzyo:R05987 are subclasses of reaction classes which are defined by the same number as the enzyme class that catalyzes these reactions. Need screenshots from Ravi here that show the 2 glycans with their structure

Semantic Analytics Visualization representation of entities and relationships blue rectangles Relationships arrows between entities - a yellow rectangle above the arrow is the property's label

Overview of Virtual Environment GraphViz’s "Dot" layout of instances and their relationships in the foreground. In the background, the document nodes are shown as red 3D ovals.

Interaction Remote object selection using ray casting. A laser bean extends from the user's hand to infinity. The first object that is penetrated by the laser is selected.

“Detail” of Selection, “Overview” still visible After a selection of a property (shown at the center of the figure), all entities and properties become semi-transparent but the selected property and the attached entities. Additionally, all documents become semi-transparent but the common documents attached to the entities.

Layout using “dot” "Dot" layout of instances and their relationships (no documents are shown for clarity)

Layout using “neato” "Neato" layout of instances and their relationships no documents are shown for clarity

Space Partitioning Foreground visualization of entities and their properties in the foreground. Background documents are visualized in the background.

Semantic EventTracker representation of geospatial and temporal dimensions for semantic associations Visualization of association unfolding over time Integration of associated multimedia content Separate Temporal, Geospatial, and Thematic ontologies describe data

Part III: A Healthcare Application Thanks to our collaborators the Athens Heart Center & Dr. Wingeth ©UGARF and Amit Sheth (except when attributed to someone else).

Active Semantic Document A document (typically in XML) with Lexical and Semantic annotations (tied to ontologies) Actionable information (rules over semantic annotations) Application: Active Semantic Patient Record for Cardiology Practice. - 3 populated ontologies - EMRs in XML

Practice Ontology

Practice Ontology

Drug Ontology Hierarchy (showing is-a relationships)

Drug Ontology showing neighborhood of PrescriptionDrug concept

First version of Procedure/Diagnosis/ICD9/CPT Ontology maps to diagnosis maps to procedure specificity

Active Semantic Doc with 3 Ontologies

Explore neighborhood for drug Tasmar Explore: Drug Tasmar

Explore neighborhood for drug Tasmar classification classification belongs to group classification brand / generic belongs to group interaction Semantic browsing and querying-- perform decision support (how many patients are using this class of drug, …)

More on ontologies, Languages and Rules Schema Population (knowledge source) Freshness Use of W3C standards (XML, RDF, OWL, RQL/SPARQL, SWRL)

On-line demo of Active Semantic Electronic Medical Record (deployed at Athens Heart Center) For on line demo: Google: Active Semantic Documents

Extreme User Friendliness Electronic Health Record is the focus of all activities, no extra search, no switching of windows Error Prevention Decision Support Better Patient Support and Insurance management

Part IV: Biological Applications Funded by NIH-NCRR Acknowledgement: NCRR funded Bioinformatics of Glycan Expression, collaborators, partners at CCRC (Dr. William S. York) and Satya S. Sahoo, Christopher Thomas, Cartic Ramakrishan.

Computation, data and semantics in life sciences “The development of a predictive biology will likely be one of the major creative enterprises of the 21st century.” Roger Brent, 1999 “The future will be the study of the genes and proteins of organisms in the context of their informational pathways or networks.” L. Hood, 2000 "Biological research is going to move from being hypothesis-driven to being data-driven." Robert Robbins We’ll see over the next decade complete transformation (of life science industry) to very database-intensive as opposed to wet-lab intensive.” Debra Goldfarb We will show how semantics is a key enabler for achieving the above predictions and visions.

Semantic GlcyoInformatics - Ontologies GlycO: A domain ontology for glycan structures, glycan functions and enzymes (embodying knowledge of the structure and metabolisms of glycans) Contains 600+ classes and 100+ properties – describe structural features of glycans; unique population strategy URL: http://lsdis.cs.uga.edu/projects/glycomics/glyco ProPreO: a comprehensive process Ontology modeling experimental proteomics Contains 330 classes, 6 million+ instances Models three phases of experimental proteomics URL: http://lsdis.cs.uga.edu/projects/glycomics/propreo Cell culture, separation (HPLC), Analysis (MS)

GlycO This slide shows various parts of the ontology in detail. The relationship modeled in these snapshots are only ‘is-a’ relationships. The actual number and types of relationships are extremely large and rich. Snapshot1: Shows a highly detailed view different ions and biomolecules like peptides and nucleic acids. Snapshot2: A slightly higher level of the ontology showing functional proteins and oligosaccharides. The final view shows the tree view of the ontology GlycO with the depth and highly involved relationships used to capture the domain knowledge, to optimal advantage. The specific details are not visible, but it captures the richness of the complete ontology.

GlycO – A domain ontology for glycans GlycO is a domain ontology to capture knowledge of the structure and function of glycans. It is a comprehensive ontology with 770 classes with extensive relationships and specific constraints. This real world ontology built with and by domain experts is 11 levels deep for some classes. The above is a snapshot of the GlycO ontology, using the GlycoVista visualization tool developed by the LSDIS lab. The parameters defined for this view include specification of the number of hops i.e. number of relationships between two concepts, in this case the value being 20. The other parameters being used are the types of relationships to be displayed (selectable from a list), the types of nodes and also types of arcs (outgoing as well as incoming in this case). This slide conveys the complexity, breadth and density of classes and their relationships in a real-world domain ontology.

Ontology population workflow Websites are converted into Freedom-internal descriptions. Parts of the instance data is the carbbank id, which is not really considered in this slide. Another part is the IUPAC glycan representation, which is 2 dimensional and ambiguous. This is converted into the linear LINUCS format, which again is a purely syntactical translation of the IUPAC structure. For that reason, ambiguity might still be there in the individual residue descriptions, whereas the structure is unambiguous. LINUCS is then given to the LINUCS to GLYDE converter which analyzes different spellings of monosaccharide residues to find an unambiguous representation, which is then compared to the instances in the KB.

N-Glycosylation Process (NGP) Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I n*m n Signal integration Data correlation Peptide Fraction ms data ms/ms data ms peaklist ms/ms peaklist Peptide list N-dimensional array Glycopeptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Peptide identification binning 1 By N-glycosylation Process, we mean the identification and quantification of glycopeptides Separation and identification of N-Glycans Proteolysis: treat with trypsin Separation technique I: chromatography like lectin affinity chromatography From PNGase F: we get fractions that contain peptides and glycans – we focus only on peptides. Separation technique II: chromatography like reverse phase chromatography

A view of ProPreO using our visualization tool. A tree view of the ontology using only ‘is-a’ relationships.

Phase II: Ontology Population Populate ProPreO with all experimental datasets? Two levels of ontology population for ProPreO: Level 1: Populate the ontology with instances that are stable across experimental runs Ex: Human Tryptic peptides – 1.5 million+ instances in ProPreO Level 2: Use of URIs to point to actual experimental datasets But, a single sample of peptide needs 200,000 runs (ms and ms/ms for Thermofinnigan LTQ ion-trap). Each file generated is couple of MB at the minimum. Level 1: Populate the ontology with tryptic peptides that are common for all trypsin digested proteins – used commonly by all relevant experimental datasets for reference Level 2: The datasets are located in each lab’s own data repository. This makes ProPreO instance layer scalable and adaptable. Important point: This enables us to have data provenance, knowledge discovery on datasets of customized relevance, size and computability. Add or delete datasets from the instance base as required.

Web Services based Workflow = Web Process Windows XP WORKFLOW Web Service 1 WS1 Web Service 4 LINUX WS 2 Add Transaction Semantics: Rollback, Recover Search and integration of alternate Web Services: (http://glycomics.ccrc.uga.edu/stargate) Semantic Annotation of Experimental Data WS 3 Web Service 2 Web Service 3 WS 4 MAC Solaris

Semantic Annotation of Scientific Data <ms/ms_peak_list> <parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer mode = “ms/ms”/> <parent_ion_mass>830.9570</parent_ion_mass> <total_abundance>194.9604</total_abundance> <z>2</z> <mass_spec_peak m/z = 580.2985 abundance = 0.3592/> <mass_spec_peak m/z = 688.3214 abundance = 0.2526/> <mass_spec_peak m/z = 779.4759 abundance = 38.4939/> <mass_spec_peak m/z = 784.3607 abundance = 21.7736/> <mass_spec_peak m/z = 1543.7476 abundance = 1.3822/> <mass_spec_peak m/z = 1544.7595 abundance = 2.9977/> <mass_spec_peak m/z = 1562.8113 abundance = 37.4790/> <mass_spec_peak m/z = 1660.7776 abundance = 476.5043/> 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 The semantic annotation uses concepts defined in the ProglycO ontology. The data element not only has provenance information (tracing its ancestry) but also is mapped to concepts that help formalize its own meaning. This enables interpretation and inference with that data element like a mass_spec_peak is a digested piece of data. It is like a sampling of data generated from the mass spectrometry of a given separated peptide fraction. This is not possible with just data provenance associated with a scientific data element. ms/ms peaklist data Annotated ms/ms peaklist data

Semantic annotation of Scientific Data <ms/ms_peak_list> <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” mode = “ms/ms”/> <parent_ion_mass>830.9570</parent_ion_mass> <total_abundance>194.9604</total_abundance> <z>2</z> <mass_spec_peak m/z = 580.2985 abundance = 0.3592/> <mass_spec_peak m/z = 688.3214 abundance = 0.2526/> <mass_spec_peak m/z = 779.4759 abundance = 38.4939/> <mass_spec_peak m/z = 784.3607 abundance = 21.7736/> <mass_spec_peak m/z = 1543.7476 abundance = 1.3822/> <mass_spec_peak m/z = 1544.7595 abundance = 2.9977/> <mass_spec_peak m/z = 1562.8113 abundance = 37.4790/> <mass_spec_peak m/z = 1660.7776 abundance = 476.5043/> Salient point: one ms/ms data element is annotated (mapped to) with multiple concepts in the ontology. Using ontological concepts enables the following: Comparison of data generated from heterogeneous sources. Interoperability of heterogeneously generated data Discovery of a relationship through a chain of mappings/connections. Hypothesis formulation. Annotated ms/ms peaklist data

Summary, Observations, Conclusions Ontology Schema: relatively simple in business/industry, highly complex in science Ontology Population: could have millions of assertions, or unique features when modeling complex life science domains Ontology population could be largely automated if access to high quality/curated data/knowledge is available; ontology population involves disambiguation and results in richer representation than extracted sources, rules based population Ontology freshness (and validation—not just schema correctness but knowledge—how it reflects the changing world)

Summary, Observations, Conclusions Quite a few applications: semantic search, semantic integration, semantic analytics (AML, need to know, financial irregularity), decision support and validation (e.g., error prevention in healthcare), knowledge discovery, process/pathway discovery, …

More information at http://lsdis.cs.uga.edu/projects/glycomics