Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani

Slides:



Advertisements
Similar presentations
Satalia (NPComplete Ltd) algorithms ● intelligence ● optimisation Daniel Hulme ● Masters.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Wrap up  Matching  Geometry  Semantics  Multiscale modelling / incremental update / generalization  Geometric algorithms  Web Services.
Search Engines and Information Retrieval
Proposal for a Standard Representation of the Results of GC-MS Analysis: A Module for ArMet Helen Fuell 1, Manfred Beckmann 2, John Draper 2, Oliver Fiehn.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
DATA WAREHOUSING.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
Big Data Course Plans at Purdue Ananth Iyer. Big Data/Analytics Coursera course on Big Data by Bill Howe claims that Big Data involves issues of
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
Mining Large Data at SDSC Natasha Balac, Ph.D.. A Deluge of Data Astronomy Life Sciences Modeling and Simulation Data Management and Mining Geosciences.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Ch10. Intermolecular Interactions and Biological Pathways
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.
Search Engines and Information Retrieval Chapter 1.
DOE Genomics: GTL Program IT Infrastructure Needs for Systems Biology David G. Thomassen Office of Biological and Environmental Research DOE Office of.
Yike Guo/Jiancheng Lin InforSense Ltd. 15 September 2015 Bioinformatics workflow integration.
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Chapter 1 Introduction to Data Mining
IPAW'08 – Salt Lake City, Utah, June 2008 Exploiting provenance to make sense of automated decisions in scientific workflows Paolo Missier, Suzanne Embury,
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
1 Enviromatics Environmental sampling Environmental sampling Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Knowledge Enabled Information and Services Science Glycomics project overview.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Combining the strengths of UMIST and The Victoria University of Manchester Quality views: capturing and exploiting the user perspective on information.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
9/03 Data Mining – Introduction G Dong (WSU)1 CS499/ Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
Advanced Database Concepts
NC-BSI: TASK 3.5: Reduction of False Alarm Rates from Fused Data Problem Statement/Objectives Research Objectives Intelligent fusing of data from hybrid.
IIC Information Flow Interesting ions? Priority list of interesting ions Empty priority list? QA/QC? Peptide identification Protein identification External.
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
High throughput biology data management and data intensive computing drivers George Michaels.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Milanesi Luciano Catania, Italy 13/03/2007 Bioinformatics challenges in European projects in Grid. Milanesi Luciano National Research Council Institute.
Cognos BI. What is Cognos? Cognos (Cognos Incorporated) was an Ottawa, Ontario-based company that makes Business Intelligence (BI) and Performance Management.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden.
Databases, Ontologies and Text mining Session Introduction Part 2
Bottom-Up Proteomics Data collection
Optimizing Biological Data Integration
What contribution can automated reasoning make to e-Science?
ALZHEIMER DISEASE PREDICTION USING DATA MINING TECHNIQUES P.SUGANYA (RESEARCH SCHOLAR) DEPARTMENT OF COMPUTER SCIENCE TIRUPPUR KUMARAN COLLEGE FOR WOMEN.
A Unifying View on Instance Selection
Data Warehousing and Data Mining
Existing Designs and Prototypes at RPI
Data Mining.
Scientific Workflows Lecture 15
Presentation transcript:

Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani

The Big Picture

Intelligent Instrument Control

Protein Identification

Issues Time sensitive data Limited sample quantities Experiments repetition Massive data

Intelligent Instrument Control

Benefits The outcome of IIC will be biological knowledge instead of raw mass spectra. The biological knowledge is backed up by data acquired by IIC. Scientists do not need to review the raw mass spectra.

Data Flow in IIC

Nile Support and others

IIC Issues IIC system development Non-proprietary API for both data collection and control of the instrument Optimized storage for Massive data (Instrument Output and Sequences) etc.

Data Stream Issues Data filters that identify interesting data and reduce chemical noise Algorithms for rapid identification of the base peaks and the number of peaks in the spectrum Algorithms for prediction of upcoming peaks Online statistical analysis over the streams Data summaries on different granularities etc.

Data Integration

Non-glycosylated peptide identification

Data Integration and Informatics

Data Integration Issues Databases description and organization Schemas mediation Annotation and Provenance Use of model management techniques Query processing and optimization Web-service access Implementation and deployment

Requirements Data types diversity: sequences, graphs, 3D structures, etc. Unconventional queries: similarity, pattern matching, etc. Uncertainty (probability) Data curation: cleaning and annotation Data provenance (pedigree) Large scale: 100s of DBs Terminology management (semantics) etc.

Data Correlation Non-overlapping Schemas (different instruments or scales of resolution) Contradictory information (experiments with different assumptions) Comparing data only after matching their context (constraints)

Other Issues ?

IIC Information Flow Interesting ions? Priority list of interesting ions Empty priority list? QA/QC? Peptide identification Protein identification External Databases query Y N Y N N Step 1 Step 2 Step 3 sample N Y

Intelligent Instrument Control Algorithms design Spectra Deconvolution Online analysis (protein/peptide identification) Online peaks Identification for feedback Data filters and noise removal Prediction of upcoming peaks Experimental Simulation In silico generation of spectrum Algorithms simulation

Intelligent Instrument Control Experimental settings Selection of a biology system, e.g., yeast Two types of experiments Target analysis Global analysis Integration with the instrument Data collection Control of the instrument API Actual implementation (algorithms)

Intelligent Instrument Control Online data mining Other Issues: Optimized storage of massive data Data representation (streams, database)

Integrated Access to Glycoprotein Databases Informatics tools Glycosylated peptide identification Non-glycosylated peptide identification Enabling uniform access to different glycoprotein databases Databases description and organization Schema mediation

Integrated Access to Glycoprotein Databases Query Processing Data correlation Non-overlapping schemas Contradictory information Sequence alignment Web service enabled access Target databases selection (focus)