Download presentation
Presentation is loading. Please wait.
1
1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF
2
2 Motivation Online biological data: Highly diverse in granularity and variety Various formats Different terminologies, ID systems, units
3
3 How to Build a Gene Extraction Ontology? Concepts Relationship sets Constraints Data Frames
4
4 How to Build a Gene Extraction Ontology? (G*A*U*C*)* (G*A*T*C*)*
5
5 Knowledge Sources Gene Ontology Thousands of terms All Species Toolkit 1,231,935 species names Protein Databases Thousands of protein names (Molecular Function, Biological Process,Molecular FunctionBiological Process, Cellular Component Cellular Component )
6
6 Extraction Rules Statistical NLP Machine learning Naïve Bayes Hidden Markov Models Decision Trees
7
7 Integration
8
8
9
9
10
10
11
11
12
12
13
13 Integration Information Hidden behind Links
14
14
15
15
16
16
17
17 Query-based Extraction Query the gene extraction ontology Find applicable resources Fill out forms Extract information
18
18 Query-based Extraction Example: “Find the alfR gene, its sequence, its protein's function, and any mutant that inhibits this gene.” Gene Name Gene Sequence Gene Mutant Protein Function Mutant Function
19
19
20
20
21
21
22
22
23
23 Contribution Provides a way to automatically integrate online biological data from different sources Provides an approach that can find proper online resources, fill out online forms and extract data depending on user’s query
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.