Presentation is loading. Please wait.

Presentation is loading. Please wait.

داده های عظیم در دوران پساژنوم Big Data in Post Genome Era مهدی صادقی پژوهشگاه ملی مهندسی ژنتیک و زیست فناوری پژوهشکده علوم زیستی، پژوهشگاه دانش های بنیادی.

Similar presentations


Presentation on theme: "داده های عظیم در دوران پساژنوم Big Data in Post Genome Era مهدی صادقی پژوهشگاه ملی مهندسی ژنتیک و زیست فناوری پژوهشکده علوم زیستی، پژوهشگاه دانش های بنیادی."— Presentation transcript:

1 داده های عظیم در دوران پساژنوم Big Data in Post Genome Era مهدی صادقی پژوهشگاه ملی مهندسی ژنتیک و زیست فناوری پژوهشکده علوم زیستی، پژوهشگاه دانش های بنیادی

2

3 The Problem of Big Data in Biology

4 4 The Problem of Big Data Volume Velocity of process Variability

5 Motivation Recent developments in biotechnology have allowed the high-throughput data generation from biological samples We have lots and lots of data about all aspects of biology (although still mostly about humans) How can we make sense of all this data? – Analyse the data to extract new knowledge about the biology  Data Mining

6 1973 Sharp, Sambrook, Sugden Gel Electrophoresis Chamber, $250 1958 Matt Meselson & Ultracentrifuge, $500,000 The Problem of Big Data in Biology hopefully comfortable enough to minimize the technology and focus on the biology.

7

8 Human Genome: $2.7 Billion, 11 Years Human Genome: $900, 6 Hours 2012: Oxford Nanopore MiniION 2003: ABI 3730 Sequencer The Problem of Big Data in Biology A decade’s progress

9 9 2010: 5K$, a few days 2009: Illumina, Helicos 40-50K$ Sequencing the Human Genome Year Log 10 (price) 20102005 2000 2012<1000$, <24 hrs 2008: ABI SOLiD 60K$, 2 weeks 2007: 454 1M$, 3 months 2001: Celera 100M$, 3 years 2001: Human Genome Project 2.7G$, 11 years

10 The Problem of Big Data in Biology

11

12 A Super-Moore’s Law

13 So what data can we generate? Biological data can be generated at many different levels – Genomics (DNA) – Transcriptomics (RNA) – Proteomics (proteins) – Metabolomics (small compounds) – Lipidomics (lipids) Hundreds of –omics have been catalogued Hundreds

14 The Problem of Big Data in Biology High Throughput Phenotyping The large amount of sequence based data need balancing with equally powerful phenotypic data. Phytomorph Project (Univ. Wisconsin) $70K for 30 cameras 200 movies of root growth 4GB/day of images for processing

15 Data to Networks to Biology

16 Protein Interaction Network

17 Aims First Data organization researchers access to existing information submit new entries Second develop tools and resources that aid in the analysis of data Third interpret the results in a biologically meaningful manner.

18 Theoretical CS interdisciplinary Molecular Biology Machine Learning Data Mining Information Management Biophysics Bioinformatics Biochemistry Applied Mathematics & Statistics Biology Computer Science

19 General Types of “….Informatics techniques…..” Databases – Building, Querying – Object DB Text String Comparison – Text Search – 1D Alignment – Significance Statistics Finding Patterns – AI / Machine Learning – Clustering – Datamining Geometry –Robotics –Graphics (Surfaces, Volumes) –Comparison and 3D Matching (Vision, recognition) Physical Simulation –Newtonian Mechanics –Electrostatics –Numerical Algorithms –Simulation

20 Algorithmic vs. Statistical Perspectives Computer Scientists Data: are a record of everything that happened. Goal: process the data by positing a model to find interesting patterns and associations. Methodology: Develop approximation algorithms under different models of data access since the goal is typically computationally hard. Statisticians (and Natural Scientists) Data: are a particular random instantiation of an underlying process describing unobserved patterns in the world. Goal: is to extract information about the world from noisy data. Methodology: Make inferences (perhaps about unseen events) by positing a model that describes the random variability of the data around the deterministic or stochastic model.

21 Major Application : Finding Homologs

22 Major Application : Designing Drugs Understanding How Structures Bind Other Molecules (Function) Designing Inhibitors Docking, Structure Modeling (From left to right, figures adapted from Olsen Group Docking Page at Scripps, Dyson NMR Group Web page at Scripps, and from Computational Chemistry Page at Cornell Theory Center).

23 Pharmacogenomics Everybody is different The Right Drug To The Right Patient For The Right Disease At The Right Time

24 Big changes in the past... and future Consider the creation of: Modern Physics Management Science Computer Science Transistors and Microelectronics Molecular Biology Biotechnology These were driven by new measurement techniques and technological advances, but they led to: big new (academic and applied) questions new perspectives on the world lots of downstream applications We are in the middle of a similarly big shift!

25


Download ppt "داده های عظیم در دوران پساژنوم Big Data in Post Genome Era مهدی صادقی پژوهشگاه ملی مهندسی ژنتیک و زیست فناوری پژوهشکده علوم زیستی، پژوهشگاه دانش های بنیادی."

Similar presentations


Ads by Google