Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani

Similar presentations


Presentation on theme: "Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani"— Presentation transcript:

1 Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani mourad@cs.purdue.edu

2 The Big Picture

3 Intelligent Instrument Control

4 Protein Identification

5 Issues Time sensitive data Limited sample quantities Experiments repetition Massive data

6 Intelligent Instrument Control

7 Benefits The outcome of IIC will be biological knowledge instead of raw mass spectra. The biological knowledge is backed up by data acquired by IIC. Scientists do not need to review the raw mass spectra.

8 Data Flow in IIC

9 Nile Support and others

10 IIC Issues IIC system development Non-proprietary API for both data collection and control of the instrument Optimized storage for Massive data (Instrument Output and Sequences) etc.

11 Data Stream Issues Data filters that identify interesting data and reduce chemical noise Algorithms for rapid identification of the base peaks and the number of peaks in the spectrum Algorithms for prediction of upcoming peaks Online statistical analysis over the streams Data summaries on different granularities etc.

12 Data Integration

13 Non-glycosylated peptide identification

14 Data Integration and Informatics

15 Data Integration Issues Databases description and organization Schemas mediation Annotation and Provenance Use of model management techniques Query processing and optimization Web-service access Implementation and deployment

16 Requirements Data types diversity: sequences, graphs, 3D structures, etc. Unconventional queries: similarity, pattern matching, etc. Uncertainty (probability) Data curation: cleaning and annotation Data provenance (pedigree) Large scale: 100s of DBs Terminology management (semantics) etc.

17 Data Correlation Non-overlapping Schemas (different instruments or scales of resolution) Contradictory information (experiments with different assumptions) Comparing data only after matching their context (constraints)

18 Other Issues ?

19 IIC Information Flow Interesting ions? Priority list of interesting ions Empty priority list? QA/QC? Peptide identification Protein identification External Databases query Y N Y N N Step 1 Step 2 Step 3 sample N Y

20 Intelligent Instrument Control Algorithms design Spectra Deconvolution Online analysis (protein/peptide identification) Online peaks Identification for feedback Data filters and noise removal Prediction of upcoming peaks Experimental Simulation In silico generation of spectrum Algorithms simulation

21 Intelligent Instrument Control Experimental settings Selection of a biology system, e.g., yeast Two types of experiments Target analysis Global analysis Integration with the instrument Data collection Control of the instrument API Actual implementation (algorithms)

22 Intelligent Instrument Control Online data mining Other Issues: Optimized storage of massive data Data representation (streams, database)

23 Integrated Access to Glycoprotein Databases Informatics tools Glycosylated peptide identification Non-glycosylated peptide identification Enabling uniform access to different glycoprotein databases Databases description and organization Schema mediation

24 Integrated Access to Glycoprotein Databases Query Processing Data correlation Non-overlapping schemas Contradictory information Sequence alignment Web service enabled access Target databases selection (focus)


Download ppt "Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani"

Similar presentations


Ads by Google