1 Dejing Dou Computer and Information Science University of Oregon, Eugene, Oregon September, Kent State University.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Three-Step Database Design
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
February 11, 2011 Overview of All-Hands Meeting Agenda Gwen Frishkoff
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
Ontology Notes are from:
A Framework for Ontology-Based Knowledge Management System
MAPPING RESULTS Experiments were performed on two simulated datasets, each using both metric sets. Cross spatial join: calculate the Euclidean distance.
OHBM Morning Workshop – June 20, 2009 Neurocognitive ontologies: Methods for sharing and integration of human brain data Neural ElectroMagnetic Ontologies.
Haishan Liu 1, Gwen Frishkoff 2, Robert Frank 1, Dejing Dou 1 1 University of Oregon 2 Georgia State University.
Haishan Liu 1, Gwen Frishkoff 2, Robert Frank 1, Dejing Dou 1 1 University of Oregon 2 Georgia State University.
Rubber Hits the Road: Why NEMO needs RDF Paea LePendu Stanford Center for Biomedical Informatics Research National Center for Biomedical Ontology (NCBO)
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
1 CIS607, Fall 2005 Semantic Information Integration Instructor/Organizer: Dejing Dou Week 1 (Sept. 28)
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
February 26, 2010 NEMO All-Hands Meeting: Overview of Day 1
Feb 28, 2010 NEMO data meta-analysis: Application of NEMO analysis workflow to consortium datasets (redux)
The Neural ElectroMagnetic Ontology (NEMO) System: Design & Implementation of a Sharable EEG/MEG Database with ERP ontologies G. A. Frishkoff 1,3 D. Dou.
Data Mining – Intro.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Data Mining Techniques
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
ERP DATA ACQUISITION & PREPROCESSING EEG Acquisition: 256 scalp sites; vertex recording reference (Geodesic Sensor Net)..01 Hz to 100 Hz analogue filter;
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
EXCS Sept Knowledge Engineering Meets Software Engineering Hele-Mai Haav Institute of Cybernetics at TUT Software department.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Chapter 1 Introduction to Data Mining
Interoperability in Information Schemas Ruben Mendes Orientador: Prof. José Borbinha MEIC-Tagus Instituto Superior Técnico.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
OPTIMIZATION OF FUNCTIONAL BRAIN ROIS VIA MAXIMIZATION OF CONSISTENCY OF STRUCTURAL CONNECTIVITY PROFILES Dajiang Zhu Computer Science Department The University.
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Semantic Web - an introduction By Daniel Wu (danielwujr)
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Working with Ontologies Introduction to DOGMA and related research.
Mining the Biomedical Research Literature Ken Baclawski.
1 MedAT: Medical Resources Annotation Tool Monika Žáková *, Olga Štěpánková *, Taťána Maříková * Department of Cybernetics, CTU Prague Institute of Biology.
Rubber Hits the Road: How RDF benefits NEMO
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12 RDF, OWL, Minimax.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
Of 24 lecture 11: ontology – mediation, merging & aligning.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Data Mining – Intro.
Development of NeuroElectroMagnetic Ontologies (NEMO): A Framework for Mining Brainwave Ontologies Dejing Dou 1, Gwen Frishkoff 2, Jiawei Rong 1, Robert.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Presentation transcript:

1 Dejing Dou Computer and Information Science University of Oregon, Eugene, Oregon September, Kent State University

Where is Eugene, Oregon?

Outline Introduction Ontology and the Semantic Web Biomedical Ontology Development Challenges for Data-driven Approaches The NEMO Project Mining ERP Ontologies (KDD’07) Modeling NEMO Ontology Databases ( SSDBM’08, JIIS’10) Mapping ERP Metrics (PAKDD’10) Ongoing Work 3

4 What is Ontology? Formal specification of a vocabulary of domain concepts and relationships relating them.

5 A Genealogy Ontology Individual Family Event Male Female MarriageEvent DivorceEvent DeathEvent BirthEvent husband childIn wife marriage divorce birth Gender sex Classes: Individual, Male, Female, Family, MarriageEvent… Properties: sex, husband, wife, birth…… Axioms: If there is a MarriageEvent, there will be a Family related to the husband and wife properties. Ontology languages : OWL, KIF, OBO …

6 Current WWW The majority of data resources in WWW are in human readable format only (e.g. HTML). human WWW

7 The Semantic Web One major goal of the Semantic Web is that web-based agents can process and “understand” data[Berners-Lee et al 2001]. Ontologies formally describe the semantics of data and web- based agents can take web documents (e.g. in RDF, OWL) as a set of assertions and draw inferences from them. human SW Web-based agents

Biomedical Ontologies The Gene Ontology (GO): to standardize the formal representation of gene and gene product attributes across all species and gene databases (e.g., zebrafish, mouse, fruit fly) Classes: cellular component, molecular function, biological process, … Properties: is_a, part_of The Unified Medical Language System (UMLS): a comprehensive thesaurus and ontology of biomedical concepts. The National Center of Biomedical Ontology (NCBO) at Stanford University >200 ontologies (hundreds to thousands concepts each one) 4 millions of mappings. 8

Biomedical Ontology Development Typically Knowledge Driven: top down process Some basic steps and principles: Discussions among domain experts and ontology engineers Select basic (root) classes and properties (i.e., terms) Go to deeper depth for sub-concepts and relationships. Modularization may be considered if the ontology is expected to be large. Add constraints (axioms) Add unique IDs (e.g., URLs) and textual definitions for terms Consistency checking Updating and Evolution (e.g., GO is updated every 15 minutes) 9

Challenges: Knowledge Sharing does not help Data Sharing Automatically Annotation (like tags) helps Search in text (e.g., papers), but not good for experimental data (e.g., numerical values) Three main challenges for knowledge/data sharing: Heterogeneity: different labs use different analysis methods, spreadsheet attributes, DB schemas. Reusability: knowledge mined from different experimental data may not be consistent and sharable Scalability: the size of experimental data grow much larger than the size of ontologies. Ontology-based reasoning (e.g., ABox) for large size data is a headache. 10

Case Study: EEG data Electroencephalogram (EEG) data Observing Brain Functions through EEG 11 Brain activity occurs in cortex and cortex activity generates scalp EEG EEG data (dense-array, 256 channels) has high temporal (1msec) / poor spatial resolution (2D), MR imaging (fMRI, PET) has good spatial (3D) / poor temporal resolution (~1.0 sec)

ERP data and Pattern Analysis Event-related potentials (ERP) are created by averaging across segments of EEG data in different trials and time-locking (e.g., every 2 seconds) to stimulus events or response. Some existing tools (e.g., Net Station, EEGLAB, APECS, the Dien PCA Toolbox) can process ERP data and do pattern analysis. h 12 (A) 128-channel ERPs to visual word and nonword stimuli. (B) Time course for P100 pattern by PCA. (C) Scalp topography (spatial distribution) of P100 pattern.

NEMO: NeuroElectroMagnetic Ontologies Some challenges in ERP study Patterns can be difficult to identify and definitions vary across research labs. Methods for ERP analysis differ across research sites. It is hard to compare and share the results across experiments and across labs. The NEMO (NeuroElectroMagnetic Ontologies) project is to address those challenges by developing ontologies to support ERP data and pattern representation, sharing and meta-analysis. It has been funded by the NIH as an R01 project since

Architecture 14

Progress in Data Driven Approaches Mining ERP Ontologies (KDD’07) -- Reusability Modeling NEMO Ontology Databases ( SSDBM’08, JIIS’10) -- Scalability Mapping ERP Metrics (PAKDD’10) -- Heterogeneity 15

Ontology Mining Ontology mining is a process for learning an ontology, including classes, class taxonomy, properties and axioms, from data. Existing ontology mining approaches focus on text mining or web mining (web content, usage, structure, user profiles). Clustering and association rule mining have been used for classes and properties. TKDE 18(4), EKAW’00, Reinberger et ODBASE’03]. NetAffix Gene ontology mining tool is applied to microarray data [Cheng et Bioinformatics 20 (9)] Our approach includes hierarchical clustering and classification for mining class taxonomy, properties and axioms of the first- generation of ERP data-specific ontology from spreadsheets, which is novel. 16

17 Knowledge Reuse in KDD Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation ? Lack of formal Semantics

Our Framework (KDD’07) 18 A semi-automatic framework for mining ontologies

Four General Procedures Classes <= Clustering-based Classification Class Taxonomy <= Hierarchical Clustering Properties <= Classification Axioms <= Association Rule Mining and Classification 19

Experiments on ERP Data Preprocessing Data with Temporal PCA Mining ERP Classes with Clustering-based Classification Mining ERP Class Taxonomy with Hierarchical Clustering Mining Properties and Axioms (Rules) with Classification Discovering Axioms among Properties with Association Rules Mining 20

Input Raw ERP data 21 SubjectConditionChannel#Time1(µv)Time2(µv)Time3(µv)Time4(µv)Time5(µv)Time6(µv) S01A S01A S01A S01A S01A S01B S01B S01B S01B S01B S02A S02A S02A S02A S02A S02B S02B S02B S02B S02B Sampling rate: 250Hz for 1500ms (375 samples) Experiment 1-2: 89 subjects and 6 experiment conditions Experiment 3: 36 subjects and 4 experiment conditions

Data Preprocessing (1) Temporal PCA Decomposition 22 component 1+ component 2 = complex waveform += PCA PCA extracts as many factors (components) as there are variables (i.e., number of samples). We retain the first 15 PCA factors, accounting for most of variances (> 75%). The remaining factors are assumed to contain “noise”.

Data Preprocessing (2) Intensity, spatial, temporal and functional metrics (attributes) for each factor 23

ERP Factors after PCA Decomposition 24 TI-max (µs) IN-mean (ROI) (µv) IN-mean (ROCC) (µv)...SP-min (channel#) … … … … …65 For Experiment 1 data, number of Factors = (474) (594) For Experiment 2 data, number of Factors = (588) (598) For Experiment 3 data, number of Factors = 708

Mining ERP Classes with Clustering (1) We use EM (Expectation-Maximization) clustering E.g. for Experiment 1 group 2 data 25 Cluster/ Pattern 0123 P N lateN1/N P

Mining ERP Classes with Clustering (2) We use OWL to represent ERP Classes 26

Mining ERP Class Taxonomy with Hierarchical Clustering We use EM clustering in both divisive and agglomerative ways. E.g. for Experiment 3 data 27

Mining ERP Class Taxonomy with Hierarchical Clustering We use OWL to represent class taxonomy 28

Mining Properties and Axioms with Clustering-based Classification (1) We use decision tree learning (C4.5) to do classification with the training data labeled by clustering results. 29

Mining Properties and Axioms with Clustering-based Classification (2) We use OWL to represent datatype properties which are based on those attributes with high information gain (e.g., top 6). 30

Mining Properties and Axioms with Clustering-based Classification (3) We use SWRL to represent axioms. In FOL: 31

Discovering Axioms among Properties with Association Rule Mining We use Apriori algorithm to find association rules among properties. The split points are determined by classification rules. In FOL, they looks like: 32

Rule Optimization 33 Idea: (A → B)  (A  B → C) => (A → C) And

A Partial View of the Mined ERP Data Ontology 34 Our first-generation ERP ontology consists of 16 classes, 57 properties and 23 axioms.

Ontology-based Data Modeling (SSDBM’08, JIIS’10) In general, ontologies can be treated as one kind of conceptual model. Considering the size of data (e.g., PCA factors) can be large, instead of building a knowledge base to store those data, we propose to use relational databases. We designed database schemas based on our ERP ontologies which include temporal, spatial and functional concepts. 35

Ontology Databases Axioms Class Datat ype Objects Facts Relation Datat ype keys constraints triggers tuples Now we have bridged these.

Ontology Databases Axioms Class Datat ype Objects Facts Relation Datat ype keys constraints views triggers tuples

Loading time in Lehigh University Benchmark Load Time (1.5 million facts) (10 Universities, 20 Departments)

Query time Query Performance (logarithmic time)

Ontology-based Data Modeling For example, especially for the important subsumption axioms (e.g., subclassof ) of the current ERP ontologies, we use SQL Triggers and Foreign-Keys to represent them. 40

Ontology-based Data Modeling 41 The ER Diagram for the ERP ontology database shows tables (boxes) and foreign key constraints (arrows). The concepts pattern, factor, and channel are most densely connected (toward the right-side of the image) as expected.

42

NEMO Data Mapping (PAKDD’10) Motivation Lack of meta-analysis across experiment because different labs may use different metrics Goal of the study Mapping alternative sets of ERP spatial and temporal metrics

Problem definition Alternative sets of ERP metrics

Challenges Semi-structured data Uninformative column headers (string similarity matching does not work) Numerical values

Grouping and reordering

Sequence post-processing

Cross-spatial Join Process all point- sequence curves Calculate Euclidean distance between sequences in the Cartesian product set (Cross-spatial join) ● ● ● Metric Set1 Metric Set2

Cross-spatial Join

Assumptions and Heuristics The two datasets contain the same or similar ERP patterns if they are from the same paradigms (e.g., oddball in visual/audio - watching or listening uncommon or fake words among common words)

Wrong Mappings. Precision = 9/13 Gold standard mapping falls along the diagonal cells

Experiment Design of experiment data 2 simulated “subject groups” (samples) SG1 = sample 1 SG2 = sample 2 2 data decompositions tPCA = temporal PCA decomposition sICA = spatial ICA (Independent Component Analysis) decomposition 2 sets of alternative metrics m1 = metric set 1 m2 = metric set 2

Experiment Result Overall Precision: 84.6%

NEMO Related Ongoing Work Application of our framework to other domain microRNA, medical informatics, gene databases, Mapping discovery and integration across ontologies related to different modalities (e.g., EEG vs. fMRI). 55

56 Joint EEG-fMRI Data Mapping

Joint work with: Gwen Frishkoff, Jiawei Rong, Robert Frank, Paea LePendu, Haishan Liu, Allen Malony, and Don Tucker 3,4 57

Thanks for your attention ! Any Question? 58