Download presentation
Presentation is loading. Please wait.
Published byMorgan Burke Modified over 9 years ago
1
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani mourad@cs.purdue.edu
2
The Big Picture
3
Intelligent Instrument Control
4
Protein Identification
5
Issues Time sensitive data Limited sample quantities Experiments repetition Massive data
6
Intelligent Instrument Control
7
Benefits The outcome of IIC will be biological knowledge instead of raw mass spectra. The biological knowledge is backed up by data acquired by IIC. Scientists do not need to review the raw mass spectra.
8
Data Flow in IIC
9
Nile Support and others
10
IIC Issues IIC system development Non-proprietary API for both data collection and control of the instrument Optimized storage for Massive data (Instrument Output and Sequences) etc.
11
Data Stream Issues Data filters that identify interesting data and reduce chemical noise Algorithms for rapid identification of the base peaks and the number of peaks in the spectrum Algorithms for prediction of upcoming peaks Online statistical analysis over the streams Data summaries on different granularities etc.
12
Data Integration
13
Non-glycosylated peptide identification
14
Data Integration and Informatics
15
Data Integration Issues Databases description and organization Schemas mediation Annotation and Provenance Use of model management techniques Query processing and optimization Web-service access Implementation and deployment
16
Requirements Data types diversity: sequences, graphs, 3D structures, etc. Unconventional queries: similarity, pattern matching, etc. Uncertainty (probability) Data curation: cleaning and annotation Data provenance (pedigree) Large scale: 100s of DBs Terminology management (semantics) etc.
17
Data Correlation Non-overlapping Schemas (different instruments or scales of resolution) Contradictory information (experiments with different assumptions) Comparing data only after matching their context (constraints)
18
Other Issues ?
19
IIC Information Flow Interesting ions? Priority list of interesting ions Empty priority list? QA/QC? Peptide identification Protein identification External Databases query Y N Y N N Step 1 Step 2 Step 3 sample N Y
20
Intelligent Instrument Control Algorithms design Spectra Deconvolution Online analysis (protein/peptide identification) Online peaks Identification for feedback Data filters and noise removal Prediction of upcoming peaks Experimental Simulation In silico generation of spectrum Algorithms simulation
21
Intelligent Instrument Control Experimental settings Selection of a biology system, e.g., yeast Two types of experiments Target analysis Global analysis Integration with the instrument Data collection Control of the instrument API Actual implementation (algorithms)
22
Intelligent Instrument Control Online data mining Other Issues: Optimized storage of massive data Data representation (streams, database)
23
Integrated Access to Glycoprotein Databases Informatics tools Glycosylated peptide identification Non-glycosylated peptide identification Enabling uniform access to different glycoprotein databases Databases description and organization Schema mediation
24
Integrated Access to Glycoprotein Databases Query Processing Data correlation Non-overlapping schemas Contradictory information Sequence alignment Web service enabled access Target databases selection (focus)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.