Development of NeuroElectroMagnetic Ontologies (NEMO): A Framework for Mining Brainwave Ontologies Dejing Dou 1, Gwen Frishkoff 2, Jiawei Rong 1, Robert.

Slides:



Advertisements
Similar presentations
NEMO ERP Analysis Toolkit ERP Metric Extraction An Overview.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
MAPPING RESULTS Experiments were performed on two simulated datasets, each using both metric sets. Cross spatial join: calculate the Euclidean distance.
OHBM Morning Workshop – June 20, 2009 Neurocognitive ontologies: Methods for sharing and integration of human brain data Neural ElectroMagnetic Ontologies.
NEMO Data Analysis Workflow and MATLAB Tools
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Neuroinformatics Research at UO. NeuroInformatics CenterFeb 2005BBMI: Brain, Biology, Machine Initiative Experimental Methodology and Tool Integration.
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.
February 26, 2010 NEMO All-Hands Meeting: Overview of Day 1
Presented by Zeehasham Rasheed
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Feb 28, 2010 NEMO data meta-analysis: Application of NEMO analysis workflow to consortium datasets (redux)
The Neural ElectroMagnetic Ontology (NEMO) System: Design & Implementation of a Sharable EEG/MEG Database with ERP ontologies G. A. Frishkoff 1,3 D. Dou.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Data Mining Techniques
Neuroinformatics, the ICONIC Grid, and GEMINI Allen D. Malony University of Oregon Professor Department of Computer and Information Science Director NeuroInformatics.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
ERP DATA ACQUISITION & PREPROCESSING EEG Acquisition: 256 scalp sites; vertex recording reference (Geodesic Sensor Net)..01 Hz to 100 Hz analogue filter;
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
1 Dejing Dou Computer and Information Science University of Oregon, Eugene, Oregon September, Kent State University.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Perception-Based Classification (PBC) System Salvador Ledezma April 25, 2002.
Chapter 1 Introduction to Data Mining
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
OPTIMIZATION OF FUNCTIONAL BRAIN ROIS VIA MAXIMIZATION OF CONSISTENCY OF STRUCTURAL CONNECTIVITY PROFILES Dajiang Zhu Computer Science Department The University.
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
Semantic Web - an introduction By Daniel Wu (danielwujr)
EXPERIMENT DESIGN  Variations in Channel Density  The original 256-channel data were downsampled:  127 channel datasets  69 channels datasets  34.
Fig.1. Flowchart Functional network identification via task-based fMRI To identify the working memory network, each participant performed a modified version.
High-Performance and Grid Computing for Neuroinformatics: NIC and Cerebral Data Systems Allen D. Malony University of Oregon Professor Department of Computer.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
High-Performance and Grid Computing for Neuroinformatics: EGI, UO, and Cerebral Data Systems Allen D. Malony University of Oregon Professor Department.
Data Mining and Decision Support
EEG DATA EEG Acquisition: 256 scalp sites; vertex recording reference (Geodesic Sensor Net)..01 Hz to 100 Hz analogue filter; 250 samples/sec. EEG Preprocessing:
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Database Systems: Design, Implementation, and Management Tenth Edition
Big data classification using neural network
Queensland University of Technology
Data Mining – Intro.
COMP6215 Semantic Web Technologies
NeuroInformatics Center
Semi-Supervised Clustering
Attention Components and Creative Potential: An ERP Exploration
An Artificial Intelligence Approach to Precision Oncology
School of Computer Science & Engineering
Data Mining Jim King.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Enhancing User identification during Reading by Applying Content-Based Text Analysis to Eye- Movement Patterns Akram Bayat Amir Hossein Bayat Marc.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques Course Outline
Analyzing and Securing Social Networks
Data Warehousing and Data Mining
CSc4730/6730 Scientific Visualization
A Similarity Retrieval System for Multimodal Functional Brain Images
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Dynamic Causal Modelling for M/EEG
Information Networks: State of the Art
Machine Learning for Visual Scene Classification with EEG Data
Text Categorization Berlin Chen 2003 Reference:
Context-Aware Internet
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
CSE591: Data Mining by H. Liu
Presentation transcript:

Development of NeuroElectroMagnetic Ontologies (NEMO): A Framework for Mining Brainwave Ontologies Dejing Dou 1, Gwen Frishkoff 2, Jiawei Rong 1, Robert Frank 3, Allen Malony 1,3 and Don Tucker 3,4 1Computer and Information Science, University of Oregon 2Learning Research and Development Center, University of Pittsburgh 3NeuroInformatics Center, University of Oregon 4Electrical Geodesics, Inc. August, 2007 @ KDD’07 Hello, everyone. I am Dejing Dou. Today I am going to introduce our framework for mining brainwave ontologies, which is an important step for developing NEMO (NeuroElectroMagnetic Ontologies). This collaborative research combines data mining, neuroinformatics and ontology engineering. My student Jiawei Rong and I are from computer and information science at the University of Oregon and doing research on data mining and ontologies. Gwen Frishkoff is a neuroscientist at UPittsburgh. Robert Frank and Allen Malony are from Neuroinfomatics Center of UO. Don Tucker is a professor of Psycology and also the CEO of the EGI, which is a leading company for producing EEG measurement and data analysis systems.

Outline Background and Related Work EEG, ERP data and pattern analysis The motivation of NEMO project Domain ontologies and ontology mining Framework for mining domain ontologies Experiments on ERP Data Preprocessing ERP data with PCA Mining ERP classes and taxonomy with clustering Mining properties and axioms (rules) with classification Discovering axioms among properties with association rules mining Discussion and Future Work This is the outline of my talk. I will first introduce some background on EEG, ERP data analysis, the NEMO project, domain ontologies and ontology mining. Then I will elaborate our framework for mining domain ontologies and report the result of applying our framework on ERP data by using PCA, clustering, classification and association rules mining to generate ERP classes, properties and axioms. Finally, I will conclude my talk with some discussions and introduce more ongoing research and future work in the NEMO project.

EEG data Electroencephalogram (EEG) data Observing Brain Functions through EEG Brain activity occurs in cortex and cortex activity generates scalp EEG EEG data (dense-array, 256 channels) has high temporal (1msec) / poor spatial resolution (2D), MR imaging (fMRI, PET) has good spatial (3D) / poor temporal resolution (~1.0 sec) EEG stands for Electroencephalogram (i-"lek-'trä-in-'se-f(&-)l&-"gram) which are waveforms measured by putting electrodes/channels on the head surface of human subject. As we know, brain activity occurs in cortex and cortex activity generates scalp EEG. Comparing with another major tool to study human brain functions: MR (magnetic resonance) imaging, EEG has high temporal resolution but MR has good spatial resolution. They are complementary to each other.

ERP data and Pattern Analysis Event-related potentials (ERP) are created by averaging across segments of EEG data in different trials and time-locking (e.g., every 2 seconds) to stimulus events or response. Some existing tools (e.g., Net Station, EEGLAB, APECS, the Dien PCA Toolbox) can process ERP data and do pattern analysis. h ERP stands for Event-related potentials, which are created by averaging across segments of EEG data in different trials and time-locking to stimulus events or response. Figure (A) shows 128-channel ERPs to visual word and nonword stimuli. Neuroscientists want to find some interesting patterns in the ERP data. Figure (B) shows time course for P100 pattern by PCA, which represent a spike around 100 milliseconds. Figure © is the scalp topography of P100 pattern. The red means positive and the blue means negative. Some existing tools, for example, Net Station, EEGLAB, APECS, the Dien PCA Toolbox, can process ERP data and do pattern analysis. (A) 128-channel ERPs to visual word and nonword stimuli. (B) Time course for P100 pattern by PCA. (C) Scalp topography (spatial distribution) of P100 pattern.

NEMO: NeuroElectroMagnetic Ontologies Some challenges in ERP study Patterns can be difficult to identify and definitions vary across research labs. Methods for ERP analysis differ widely across research sites. It is hard to compare and share the results across experiments and across labs. The NEMO (NeuroElectroMagnetic Ontologies) project is to address those challenges by developing ontologies to support ERP data representation, sharing and meta-analysis. However, there are still some challenges in ERP study. Patterns can be difficult to identify and definitions vary across research labs. Methods for ERP analysis differ widely across research sites. It is hard to compare and share the results across experiments or across labs. The NEMO (NeuroElectroMagnetic Ontologies) project is to address those challenges by developing ontologies to support ERP data representation, sharing and meta-analysis.

NEMO Architecture This is the architecture of the whole NEMO project which combines neuroinformatics, data mining, data integration, ontology-based reasoning. This talk is on the first step (ontology mining) in NEMO.

Domain Ontologies In general, an ontology can be defined as the formal specification of a vocabulary of concepts and relationships among them in a domain. E.g. Gene Ontology, UMLS, BIRN, NCBO Ontology languages: OWL, KIF, OKBC … Semantic components of ontologies Classes and Class taxonomy Relationships among Classes and Data types: Properties(in OWL)/Predicates(in KIF)/Slots(in OKBC) Arbitrary relationships or constraints among concepts: Rules or Axioms (in OWL or KIF)/Facets (in OKBC)/Cardinalities (in OWL) People may use or mean “domain ontology” in different ways. In general, it is the formal specification of a vocabulary of concepts and relationships among them in a domain. For example, Gene ontology, UMLS (unified medical language system), BIRN (biomedical information research network), NCBO (the national center of biomedical ontology) are successful applications for ontologies in biomedical domain. Some typical ontology languages includes OWL, KIF, OKBC which are based on description logic, first order logic and frame-logic. The semantic components of ontologies includes classes, class taxonomy, properties and axioms. In this paper, we mainly use OWL to represent ontologies.

Ontology Mining Ontology mining is a process for learning an ontology, including classes, class taxonomy, properties and axioms, from data. Existing ontology mining approaches focus on text mining or web mining (web content, usage, structure, user profiles). Clustering and association rule mining have been used for classes and properties. [Li&Zhong @ TKDE 18(4), Maedche&Staab @ EKAW’00, Reinberger et al @ ODBASE’03]. NetAffix Gene ontology mining tool is applied to microarray data [Cheng et al @ Bioinformatics 20 (9)] Our approach includes hierarchical clustering and classification for mining class taxonomy, properties and axioms of the first-generation of ERP ontology, which is novel. Ontology mining is a process for learning an ontology from data. The existing ontology mining approaches focus on text mining or web mining. Although there is some application to biomedical data, such as microarray data, the development and mining of ERP ontology is novel. We also include hierarchical clustering and classification for mining more ontology components such as class taxonomy, more properties and axioms.

Our Framework A semi-automatic framework for mining domain ontologies This is our framework for mining domain ontologies. We treat it as semi-automatic because we do believe that domain experts need to give meaningful names for classes after the clustering divide the data instances into different groups which correspond to different classes and class taxonomy.

Four General Procedures Classes <= Clustering-based Classification Class Taxonomy <= Hierarchical Clustering Properties <= Classification Axioms <= Association Rule Mining and Classification Basically, there are four general procedures. Clustering-based classification can find classes and help domain experts label the names of classes. Hierarchical clustering can find class taxonomy. Classification also can help pick out interesting properties. Association rules mining and classification can find arbitrary axioms

Experiments on ERP Data Preprocessing Data with Temporal PCA Mining ERP Classes with Clustering-based Classification Mining ERP Class Taxonomy with Hierarchical Clustering Mining Properties and Axioms (Rules) with Classification Discovering Axioms among Properties with Association Rules Mining When applying our framework to ERP data, the experiments include Preprocessing Data with Temporal PCA, Mining ERP Classes with Clustering, Mining ERP Class Taxonomy with Hierarchical Clustering, Mining Properties and Axioms (Rules) with Clustering-based Classification, Discovering Axioms among Properties with Association Rules Mining.

Input Raw ERP data Sampling rate: 250Hz for 1500ms (375 samples) Subject Condition Channel# Time1(µv) Time2(µv) Time3(µv) Time4(µv) Time5(µv) Time6(µv) S01 A 1 0.077 0.136 0.075 0.095 0.188 0.097 2 0.891 1.780 0.895 0.805 1.612 0.813 3 0.014 0.018 0.013 0.040 0.066 0.035 4 0.657 1.309 0.789 1.571 0.785 5 0.437 0.864 0.432 1.007 2.002 1.003 B 0.303 0.603 0.128 0.250 0.123 0.477 0.951 0.483 0.418 0.841 0.538 0.073 0.038 0.029 0.043 0.022 0.509 1.061 0.533 0.628 1.254 0.626 1.497 1.024 0.510 0.218 0.434 0.219 S02 1.275 2.987 1.500 0.382 0.769 0.386 0.666 2.555 1.281 0.326 0.648 0.329 0.673 1.321 1.026 2.051 1.029 0.284 1.341 0.678 1.966 3.914 0.980 0.564 0.292 0.511 1.012 0.507 0.367 1.960 0.978 1.741 3.486 1.739 0.721 0.365 1.470 2.934 1.472 0.568 1.729 0.866 1.342 2.680 1.337 0.149 1.134 0.575 0.210 0.423 0.215 0.042 0.287 0.151 0.433 0.860 This table shows the raw ERP data. Each tuple corresponds to one waveform which has 250HZ sampling rate and is related to one human subject, one experimental condition and one EEG channel. Sampling rate: 250Hz for 1500ms (375 samples) Experiment 1-2: 89 subjects and 6 experiment conditions Experiment 3: 36 subjects and 4 experiment conditions

Data Preprocessing (1) Temporal PCA Decomposition PCA + = component 1 + component 2 = complex waveform A complex waveform in each tuple can be decomposed to multiple components by temporal PCA. PCA extracts as many factors (components) as there are variables (i.e., number of samples). We retain the first 15 PCA factors, accounting for most of variances (> 75%). The remaining factors are assumed to contain “noise”. PCA extracts as many factors (components) as there are variables (i.e., number of samples). We retain the first 15 PCA factors, accounting for most of variances (> 75%). The remaining factors are assumed to contain “noise”.

Data Preprocessing (2) Intensity, spatial, temporal and functional metrics (attributes) for each factor The intensity, spatial, temporal and functional metrics (attributes) for each factor can be summarized in this table.

ERP Factors after PCA Decomposition TI-max (µs) IN-mean (ROI) (µv) IN-mean (ROCC) (µv) ... SP-min (channel#) 128 4.2823 4.7245 … 24 96 1.2223 1.3955 62 164 -6.6589 -4.7608 59 220 -3.635 -2.0782 58 244 -0.81322 0.29263 65 There are actually 25 attributes for each factor. The data table after PCA decomposition looks like. For Experiment 1 data, number of Factors = (474) (594) For Experiment 2 data, number of Factors = (588) (598) For Experiment 3 data, number of Factors = 708

Mining ERP Classes with Clustering (1) We use EM (Expectation-Maximization) clustering E.g. for Experiment 1 group 2 data Cluster/ Pattern 1 2 3 P100 76 N100 117 54 lateN1/N2 13 14 104 P300 61 110 42 To mine ERP classes, we first use EM clustering to divide the ERP factors into several groups. Compared with the pattern names labeled by domain experts defined rules, the majority factors in each group has high correspondence with one pattern labeled by domain experts. Therefore, domain expert can easily assign the class names for each cluster (class).

Mining ERP Classes with Clustering (2) We use OWL to represent ERP Classes Then we can use OWL to represent ERP classes.

Mining ERP Class Taxonomy with Hierarchical Clustering We use EM clustering in both divisive and agglomerative ways. E.g. for Experiment 3 data To study the hierarchy of ERP classes, we further use EM clustering in both divisive and agglomerative ways. The results are consistent. Actually it is interesting to Neuroscience that some patterns are hard to separate, such as N3 and P1r. It is possible that patterns previously assigned different labels in the ERP literature reflect one and the same underlying process.

Mining ERP Class Taxonomy with Hierarchical Clustering We use OWL to represent class taxonomy It is easy to use “subclassof” in OWL to represent class taxonomy.

Mining Properties and Axioms with Clustering-based Classification (1) We use decision tree learning (C4.5) to do classification with the training data labeled by clustering results. The useful properties and axioms to define each class (cluster) can be mined by decision tree learning after each factor is labeled with a cluster or a class name.

Mining Properties and Axioms with Clustering-based Classification (2) We use OWL to represent datatype properties which are based on those attributes with high information gain (e.g., top 6). We use OWL to represent datatype properties which are based on those attributes with high information gain, for example, top 6 attributes for our ERP experiments.

Mining Properties and Axioms with Clustering-based Classification (3) We use SWRL to represent axioms. In FOL: The axioms can be represented by Semantic Web Rule Language which use OWL syntax. In general FOL, the same rule can be represented as forall factors, …

Discovering Axioms among Properties with Association Rule Mining We use Apriori algorithm to find association rules among properties. The split points are determined by classification rules. In FOL, they looks like: As the last procedure, we can find the axioms among properties by using association rule mining. The split points are determined by classification rules. To save the space, we only use FOL to represent the axioms.

Rule Optimization Idea: (A → B)  (A  B → C) => (A → C) And Besides representing axioms among properties, those axioms also can be used to optimize (prune) the classification rules. The basic idea is (A → B)  (A  B → C) => (A → C)

A Partial View of the Mined ERP Ontology Our first-generation ERP ontology consists of 16 classes, 57 properties and 23 axioms. Our first-generation ERP ontology consists of 16 classes, 57 properties and 23 axioms. The partial view of the Mined ERP ontology looks like. These are class and taxonomy, datatype properties and the property between pattern and factor.

Future/Ongoing Work Refinements of clustering process because some patterns “split” to multiple clusters Additional metrics may be necessary “Gold Standard” by expert labeling Application of our framework to other paradigms Other data preprocessing techniques besides PCA (e.g., microstate analysis, segmented regression) Other experiment paradigms (e.g., auditory, nonlinguistic) Ontology-based data integration Ontology (semantic) databases: ontology-based data modeling Mapping mining across ontologies related to different spaces (surface vs. source), different studies (e.g., EEG vs. MEG) and different labs. To refine the clustering process, we may need additional metrics (attributes) and gold standard (manual labeling) from domain experts. We want to try other data preprocessing techniques besides PCA, such as microstate analysis and segmented regression. We are designing ERP databases based on the mined ontology, which we believe is an important step for semantic query answering. To realize the data sharing for neuroscience data, we need to find a way to mine the mappings across ontologies related to different spaces, different studies and different labs.

Thank you for your attention ! Any Question? The poster session: on the evening of Monday (Today) Poster board number: 25.