Download presentation
Presentation is loading. Please wait.
Published byAshlie Banks Modified over 6 years ago
1
Development of NeuroElectroMagnetic Ontologies (NEMO): A Framework for Mining Brainwave Ontologies
Dejing Dou 1, Gwen Frishkoff 2, Jiawei Rong 1, Robert Frank 3, Allen Malony 1,3 and Don Tucker 3,4 1Computer and Information Science, University of Oregon 2Learning Research and Development Center, University of Pittsburgh 3NeuroInformatics Center, University of Oregon 4Electrical Geodesics, Inc. August, KDD’07 Hello, everyone. I am Dejing Dou. Today I am going to introduce our framework for mining brainwave ontologies, which is an important step for developing NEMO (NeuroElectroMagnetic Ontologies). This collaborative research combines data mining, neuroinformatics and ontology engineering. My student Jiawei Rong and I are from computer and information science at the University of Oregon and doing research on data mining and ontologies. Gwen Frishkoff is a neuroscientist at UPittsburgh. Robert Frank and Allen Malony are from Neuroinfomatics Center of UO. Don Tucker is a professor of Psycology and also the CEO of the EGI, which is a leading company for producing EEG measurement and data analysis systems.
2
Outline Background and Related Work
EEG, ERP data and pattern analysis The motivation of NEMO project Domain ontologies and ontology mining Framework for mining domain ontologies Experiments on ERP Data Preprocessing ERP data with PCA Mining ERP classes and taxonomy with clustering Mining properties and axioms (rules) with classification Discovering axioms among properties with association rules mining Discussion and Future Work This is the outline of my talk. I will first introduce some background on EEG, ERP data analysis, the NEMO project, domain ontologies and ontology mining. Then I will elaborate our framework for mining domain ontologies and report the result of applying our framework on ERP data by using PCA, clustering, classification and association rules mining to generate ERP classes, properties and axioms. Finally, I will conclude my talk with some discussions and introduce more ongoing research and future work in the NEMO project.
3
EEG data Electroencephalogram (EEG) data
Observing Brain Functions through EEG Brain activity occurs in cortex and cortex activity generates scalp EEG EEG data (dense-array, 256 channels) has high temporal (1msec) / poor spatial resolution (2D), MR imaging (fMRI, PET) has good spatial (3D) / poor temporal resolution (~1.0 sec) EEG stands for Electroencephalogram (i-"lek-'trä-in-'se-f(&-)l&-"gram) which are waveforms measured by putting electrodes/channels on the head surface of human subject. As we know, brain activity occurs in cortex and cortex activity generates scalp EEG. Comparing with another major tool to study human brain functions: MR (magnetic resonance) imaging, EEG has high temporal resolution but MR has good spatial resolution. They are complementary to each other.
4
ERP data and Pattern Analysis
Event-related potentials (ERP) are created by averaging across segments of EEG data in different trials and time-locking (e.g., every 2 seconds) to stimulus events or response. Some existing tools (e.g., Net Station, EEGLAB, APECS, the Dien PCA Toolbox) can process ERP data and do pattern analysis. h ERP stands for Event-related potentials, which are created by averaging across segments of EEG data in different trials and time-locking to stimulus events or response. Figure (A) shows 128-channel ERPs to visual word and nonword stimuli. Neuroscientists want to find some interesting patterns in the ERP data. Figure (B) shows time course for P100 pattern by PCA, which represent a spike around 100 milliseconds. Figure © is the scalp topography of P100 pattern. The red means positive and the blue means negative. Some existing tools, for example, Net Station, EEGLAB, APECS, the Dien PCA Toolbox, can process ERP data and do pattern analysis. (A) 128-channel ERPs to visual word and nonword stimuli. (B) Time course for P100 pattern by PCA. (C) Scalp topography (spatial distribution) of P100 pattern.
5
NEMO: NeuroElectroMagnetic Ontologies
Some challenges in ERP study Patterns can be difficult to identify and definitions vary across research labs. Methods for ERP analysis differ widely across research sites. It is hard to compare and share the results across experiments and across labs. The NEMO (NeuroElectroMagnetic Ontologies) project is to address those challenges by developing ontologies to support ERP data representation, sharing and meta-analysis. However, there are still some challenges in ERP study. Patterns can be difficult to identify and definitions vary across research labs. Methods for ERP analysis differ widely across research sites. It is hard to compare and share the results across experiments or across labs. The NEMO (NeuroElectroMagnetic Ontologies) project is to address those challenges by developing ontologies to support ERP data representation, sharing and meta-analysis.
6
NEMO Architecture This is the architecture of the whole NEMO project which combines neuroinformatics, data mining, data integration, ontology-based reasoning. This talk is on the first step (ontology mining) in NEMO.
7
Domain Ontologies In general, an ontology can be defined as the formal specification of a vocabulary of concepts and relationships among them in a domain. E.g. Gene Ontology, UMLS, BIRN, NCBO Ontology languages: OWL, KIF, OKBC … Semantic components of ontologies Classes and Class taxonomy Relationships among Classes and Data types: Properties(in OWL)/Predicates(in KIF)/Slots(in OKBC) Arbitrary relationships or constraints among concepts: Rules or Axioms (in OWL or KIF)/Facets (in OKBC)/Cardinalities (in OWL) People may use or mean “domain ontology” in different ways. In general, it is the formal specification of a vocabulary of concepts and relationships among them in a domain. For example, Gene ontology, UMLS (unified medical language system), BIRN (biomedical information research network), NCBO (the national center of biomedical ontology) are successful applications for ontologies in biomedical domain. Some typical ontology languages includes OWL, KIF, OKBC which are based on description logic, first order logic and frame-logic. The semantic components of ontologies includes classes, class taxonomy, properties and axioms. In this paper, we mainly use OWL to represent ontologies.
8
Ontology Mining Ontology mining is a process for learning an ontology, including classes, class taxonomy, properties and axioms, from data. Existing ontology mining approaches focus on text mining or web mining (web content, usage, structure, user profiles). Clustering and association rule mining have been used for classes and properties. TKDE 18(4), EKAW’00, Reinberger et ODBASE’03]. NetAffix Gene ontology mining tool is applied to microarray data [Cheng et Bioinformatics 20 (9)] Our approach includes hierarchical clustering and classification for mining class taxonomy, properties and axioms of the first-generation of ERP ontology, which is novel. Ontology mining is a process for learning an ontology from data. The existing ontology mining approaches focus on text mining or web mining. Although there is some application to biomedical data, such as microarray data, the development and mining of ERP ontology is novel. We also include hierarchical clustering and classification for mining more ontology components such as class taxonomy, more properties and axioms.
9
Our Framework A semi-automatic framework for mining domain ontologies This is our framework for mining domain ontologies. We treat it as semi-automatic because we do believe that domain experts need to give meaningful names for classes after the clustering divide the data instances into different groups which correspond to different classes and class taxonomy.
10
Four General Procedures
Classes <= Clustering-based Classification Class Taxonomy <= Hierarchical Clustering Properties <= Classification Axioms <= Association Rule Mining and Classification Basically, there are four general procedures. Clustering-based classification can find classes and help domain experts label the names of classes. Hierarchical clustering can find class taxonomy. Classification also can help pick out interesting properties. Association rules mining and classification can find arbitrary axioms
11
Experiments on ERP Data
Preprocessing Data with Temporal PCA Mining ERP Classes with Clustering-based Classification Mining ERP Class Taxonomy with Hierarchical Clustering Mining Properties and Axioms (Rules) with Classification Discovering Axioms among Properties with Association Rules Mining When applying our framework to ERP data, the experiments include Preprocessing Data with Temporal PCA, Mining ERP Classes with Clustering, Mining ERP Class Taxonomy with Hierarchical Clustering, Mining Properties and Axioms (Rules) with Clustering-based Classification, Discovering Axioms among Properties with Association Rules Mining.
12
Input Raw ERP data Sampling rate: 250Hz for 1500ms (375 samples)
Subject Condition Channel# Time1(µv) Time2(µv) Time3(µv) Time4(µv) Time5(µv) Time6(µv) S01 A 1 0.077 0.136 0.075 0.095 0.188 0.097 2 0.891 1.780 0.895 0.805 1.612 0.813 3 0.014 0.018 0.013 0.040 0.066 0.035 4 0.657 1.309 0.789 1.571 0.785 5 0.437 0.864 0.432 1.007 2.002 1.003 B 0.303 0.603 0.128 0.250 0.123 0.477 0.951 0.483 0.418 0.841 0.538 0.073 0.038 0.029 0.043 0.022 0.509 1.061 0.533 0.628 1.254 0.626 1.497 1.024 0.510 0.218 0.434 0.219 S02 1.275 2.987 1.500 0.382 0.769 0.386 0.666 2.555 1.281 0.326 0.648 0.329 0.673 1.321 1.026 2.051 1.029 0.284 1.341 0.678 1.966 3.914 0.980 0.564 0.292 0.511 1.012 0.507 0.367 1.960 0.978 1.741 3.486 1.739 0.721 0.365 1.470 2.934 1.472 0.568 1.729 0.866 1.342 2.680 1.337 0.149 1.134 0.575 0.210 0.423 0.215 0.042 0.287 0.151 0.433 0.860 This table shows the raw ERP data. Each tuple corresponds to one waveform which has 250HZ sampling rate and is related to one human subject, one experimental condition and one EEG channel. Sampling rate: 250Hz for 1500ms (375 samples) Experiment 1-2: 89 subjects and 6 experiment conditions Experiment 3: 36 subjects and 4 experiment conditions
13
Data Preprocessing (1) Temporal PCA Decomposition PCA + =
component component = complex waveform A complex waveform in each tuple can be decomposed to multiple components by temporal PCA. PCA extracts as many factors (components) as there are variables (i.e., number of samples). We retain the first 15 PCA factors, accounting for most of variances (> 75%). The remaining factors are assumed to contain “noise”. PCA extracts as many factors (components) as there are variables (i.e., number of samples). We retain the first 15 PCA factors, accounting for most of variances (> 75%). The remaining factors are assumed to contain “noise”.
14
Data Preprocessing (2) Intensity, spatial, temporal and functional metrics (attributes) for each factor The intensity, spatial, temporal and functional metrics (attributes) for each factor can be summarized in this table.
15
ERP Factors after PCA Decomposition
TI-max (µs) IN-mean (ROI) (µv) IN-mean (ROCC) (µv) ... SP-min (channel#) 128 4.2823 4.7245 … 24 96 1.2223 1.3955 62 164 59 220 -3.635 58 244 65 There are actually 25 attributes for each factor. The data table after PCA decomposition looks like. For Experiment 1 data, number of Factors = (474) (594) For Experiment 2 data, number of Factors = (588) (598) For Experiment 3 data, number of Factors = 708
16
Mining ERP Classes with Clustering (1)
We use EM (Expectation-Maximization) clustering E.g. for Experiment 1 group 2 data Cluster/ Pattern 1 2 3 P100 76 N100 117 54 lateN1/N2 13 14 104 P300 61 110 42 To mine ERP classes, we first use EM clustering to divide the ERP factors into several groups. Compared with the pattern names labeled by domain experts defined rules, the majority factors in each group has high correspondence with one pattern labeled by domain experts. Therefore, domain expert can easily assign the class names for each cluster (class).
17
Mining ERP Classes with Clustering (2)
We use OWL to represent ERP Classes Then we can use OWL to represent ERP classes.
18
Mining ERP Class Taxonomy with Hierarchical Clustering
We use EM clustering in both divisive and agglomerative ways. E.g. for Experiment 3 data To study the hierarchy of ERP classes, we further use EM clustering in both divisive and agglomerative ways. The results are consistent. Actually it is interesting to Neuroscience that some patterns are hard to separate, such as N3 and P1r. It is possible that patterns previously assigned different labels in the ERP literature reflect one and the same underlying process.
19
Mining ERP Class Taxonomy with Hierarchical Clustering
We use OWL to represent class taxonomy It is easy to use “subclassof” in OWL to represent class taxonomy.
20
Mining Properties and Axioms with Clustering-based Classification (1)
We use decision tree learning (C4.5) to do classification with the training data labeled by clustering results. The useful properties and axioms to define each class (cluster) can be mined by decision tree learning after each factor is labeled with a cluster or a class name.
21
Mining Properties and Axioms with Clustering-based Classification (2)
We use OWL to represent datatype properties which are based on those attributes with high information gain (e.g., top 6). We use OWL to represent datatype properties which are based on those attributes with high information gain, for example, top 6 attributes for our ERP experiments.
22
Mining Properties and Axioms with Clustering-based Classification (3)
We use SWRL to represent axioms In FOL: The axioms can be represented by Semantic Web Rule Language which use OWL syntax. In general FOL, the same rule can be represented as forall factors, …
23
Discovering Axioms among Properties with Association Rule Mining
We use Apriori algorithm to find association rules among properties. The split points are determined by classification rules. In FOL, they looks like: As the last procedure, we can find the axioms among properties by using association rule mining. The split points are determined by classification rules. To save the space, we only use FOL to represent the axioms.
24
Rule Optimization Idea: (A → B) (A B → C) => (A → C) And
Besides representing axioms among properties, those axioms also can be used to optimize (prune) the classification rules. The basic idea is (A → B) (A B → C) => (A → C)
25
A Partial View of the Mined ERP Ontology
Our first-generation ERP ontology consists of 16 classes, 57 properties and 23 axioms. Our first-generation ERP ontology consists of 16 classes, 57 properties and 23 axioms. The partial view of the Mined ERP ontology looks like. These are class and taxonomy, datatype properties and the property between pattern and factor.
26
Future/Ongoing Work Refinements of clustering process because some patterns “split” to multiple clusters Additional metrics may be necessary “Gold Standard” by expert labeling Application of our framework to other paradigms Other data preprocessing techniques besides PCA (e.g., microstate analysis, segmented regression) Other experiment paradigms (e.g., auditory, nonlinguistic) Ontology-based data integration Ontology (semantic) databases: ontology-based data modeling Mapping mining across ontologies related to different spaces (surface vs. source), different studies (e.g., EEG vs. MEG) and different labs. To refine the clustering process, we may need additional metrics (attributes) and gold standard (manual labeling) from domain experts. We want to try other data preprocessing techniques besides PCA, such as microstate analysis and segmented regression. We are designing ERP databases based on the mined ontology, which we believe is an important step for semantic query answering. To realize the data sharing for neuroscience data, we need to find a way to mine the mappings across ontologies related to different spaces, different studies and different labs.
27
Thank you for your attention ! Any Question?
The poster session: on the evening of Monday (Today) Poster board number: 25.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.