داده های عظیم در دوران پساژنوم Big Data in Post Genome Era مهدی صادقی پژوهشگاه ملی مهندسی ژنتیک و زیست فناوری پژوهشکده علوم زیستی، پژوهشگاه دانش های بنیادی.

Slides:



Advertisements
Similar presentations
Machine learning methods for the analysis of heterogeneous, multi- source data Ilkka Huopaniemi Statistical machine learning and.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
Introduction to Bioinformatics Richard H. Scheuermann, Ph.D. Director of Informatics JCVI.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics: One Minute and One Hour at a Time Laurie J. Heyer L.R. King Asst. Professor of Mathematics Davidson College
1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
CS4 - Introduction to Scientific Computing Alan Usas Topics Covered Algorithms and Data Structures –Primality testing, bisection, Newton’s method,
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Bioinformatics and Phylogenetic Analysis
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
Workshop in Bioinformatics 2010 What is it ? The goals of the class… How we do it… What’s in the class Why should I take the class..
Introduction to Bioinformatics (Lecture for CS498-CXZ Algorithms in Bioinformatics) Aug. 25, 2005 ChengXiang Zhai Department of Computer Science University.
Intelligent Systems Group Emmanuel Fernandez Larry Mazlack Ali Minai (coordinator) Carla Purdy William Wee.
Algorithms in Computational Biology Tanya Berger-Wolf Compbio.cs.uic.edu/~tanya/teaching/CompBio January 13, 2006.
CIS 410/510 Probabilistic Methods for Artificial Intelligence Instructor: Daniel Lowd.
BIG Biomedicine and the Foundations of BIG Data Analysis Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley May 2014 (For more info, see:
CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.
Medical Informatics Basics
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Bioinformatics Sean Langford, Larry Hale. What is it?  Bioinformatics is a scientific field involving many disciplines that focuses on the development.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Chapter 1 Introduction to Data Mining
Master’s Degrees in Bioinformatics in Switzerland: Past, present and near future Patricia M. Palagi Swiss Institute of Bioinformatics.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Intelligent systems in bioinformatics Introduction to the course.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Centre for Integrative Bioinformatics VU (IBIVU) Tel ,
Introduction to Bioinformatics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 21, 2004 ChengXiang Zhai Department of Computer Science University.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Pathogenomics How this project began: Ann Rose - take advantage of DNA sequence information - genomics Julian Davies - use the information to understand.
Jobs, Careers, Internships, Senior Projects and Research Computer Application Development K-12 education Industrial Training Bioinformatics Validation.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Lecture 1 CS5661 Topics Basis of Bioinformatics Goals of Bioinformatics Bioinformatics Jargon 101.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
High throughput biology data management and data intensive computing drivers George Michaels.
Presenter: Bradley Green.  What is Bioinformatics?  Brief History of Bioinformatics  Development  Computer Science and Bioinformatics  Current Applications.
Data-intensive Computing: Case Study Area 1: Bioinformatics
Data challenges in the pharmaceutical industry
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
생물정보학 Bioinformatics.
What is Pattern Recognition?
“Proteomics is a science that focuses on the study of proteins: their roles, their structures, their localization, their interactions, and other factors.”
CSE 515 Statistical Methods in Computer Science
9 Future Challenges for Bioinformatics
Nancy Baker SILS Bioinformatics Seminar January 21, 2004
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Bioinformatics For MNW 2nd Year
BIOINFORMATICS Introduction
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
BIOINFORMATICS Summary
Introduction to Bioinformatic
Presentation transcript:

داده های عظیم در دوران پساژنوم Big Data in Post Genome Era مهدی صادقی پژوهشگاه ملی مهندسی ژنتیک و زیست فناوری پژوهشکده علوم زیستی، پژوهشگاه دانش های بنیادی

The Problem of Big Data in Biology

4 The Problem of Big Data Volume Velocity of process Variability

Motivation Recent developments in biotechnology have allowed the high-throughput data generation from biological samples We have lots and lots of data about all aspects of biology (although still mostly about humans) How can we make sense of all this data? – Analyse the data to extract new knowledge about the biology  Data Mining

1973 Sharp, Sambrook, Sugden Gel Electrophoresis Chamber, $ Matt Meselson & Ultracentrifuge, $500,000 The Problem of Big Data in Biology hopefully comfortable enough to minimize the technology and focus on the biology.

Human Genome: $2.7 Billion, 11 Years Human Genome: $900, 6 Hours 2012: Oxford Nanopore MiniION 2003: ABI 3730 Sequencer The Problem of Big Data in Biology A decade’s progress

9 2010: 5K$, a few days 2009: Illumina, Helicos 40-50K$ Sequencing the Human Genome Year Log 10 (price) <1000$, <24 hrs 2008: ABI SOLiD 60K$, 2 weeks 2007: 454 1M$, 3 months 2001: Celera 100M$, 3 years 2001: Human Genome Project 2.7G$, 11 years

The Problem of Big Data in Biology

A Super-Moore’s Law

So what data can we generate? Biological data can be generated at many different levels – Genomics (DNA) – Transcriptomics (RNA) – Proteomics (proteins) – Metabolomics (small compounds) – Lipidomics (lipids) Hundreds of –omics have been catalogued Hundreds

The Problem of Big Data in Biology High Throughput Phenotyping The large amount of sequence based data need balancing with equally powerful phenotypic data. Phytomorph Project (Univ. Wisconsin) $70K for 30 cameras 200 movies of root growth 4GB/day of images for processing

Data to Networks to Biology

Protein Interaction Network

Aims First Data organization researchers access to existing information submit new entries Second develop tools and resources that aid in the analysis of data Third interpret the results in a biologically meaningful manner.

Theoretical CS interdisciplinary Molecular Biology Machine Learning Data Mining Information Management Biophysics Bioinformatics Biochemistry Applied Mathematics & Statistics Biology Computer Science

General Types of “….Informatics techniques…..” Databases – Building, Querying – Object DB Text String Comparison – Text Search – 1D Alignment – Significance Statistics Finding Patterns – AI / Machine Learning – Clustering – Datamining Geometry –Robotics –Graphics (Surfaces, Volumes) –Comparison and 3D Matching (Vision, recognition) Physical Simulation –Newtonian Mechanics –Electrostatics –Numerical Algorithms –Simulation

Algorithmic vs. Statistical Perspectives Computer Scientists Data: are a record of everything that happened. Goal: process the data by positing a model to find interesting patterns and associations. Methodology: Develop approximation algorithms under different models of data access since the goal is typically computationally hard. Statisticians (and Natural Scientists) Data: are a particular random instantiation of an underlying process describing unobserved patterns in the world. Goal: is to extract information about the world from noisy data. Methodology: Make inferences (perhaps about unseen events) by positing a model that describes the random variability of the data around the deterministic or stochastic model.

Major Application : Finding Homologs

Major Application : Designing Drugs Understanding How Structures Bind Other Molecules (Function) Designing Inhibitors Docking, Structure Modeling (From left to right, figures adapted from Olsen Group Docking Page at Scripps, Dyson NMR Group Web page at Scripps, and from Computational Chemistry Page at Cornell Theory Center).

Pharmacogenomics Everybody is different The Right Drug To The Right Patient For The Right Disease At The Right Time

Big changes in the past... and future Consider the creation of: Modern Physics Management Science Computer Science Transistors and Microelectronics Molecular Biology Biotechnology These were driven by new measurement techniques and technological advances, but they led to: big new (academic and applied) questions new perspectives on the world lots of downstream applications We are in the middle of a similarly big shift!