Download presentation
Presentation is loading. Please wait.
1
The data flood: We need a bigger boat James A. Foster The Initiative for Bioinformatics and Evolutionary Studies (IBEST) Biological Sciences, Bioinformatics and Computational Biology University of Idaho
2
2 JAF INBRE Data Flood 8/4/09 Outline ✦ Where is this flood of data coming from? ✦ What kind of tool is appropriate for this amount of data? ✦ What kind of a tool is “bioinformatics”? ✦ How about an example?
3
3 JAF INBRE Data Flood 8/4/09 DNA sequencing data flood Yearbp/day 19777.35 198650-ish 199519,000 1998400,000 2008 1,600,000,000 2009 3,200,000,000 2012?? ABI 3700 454 454/FLX ??? Technology ABI 370 ABI 377 Gels
4
4 JAF INBRE Data Flood 8/4/09 The data flood: DNA example Yearbp/dayNotes 19777.35Manual: φx174 198650-ishGel: ABI 370 199519,000Gel: ABI 377 1998400,000Cap: ABI 3700 200816000000 00 454 200932000000 00 454/FLX 2012?? Water 1L Barrel (176 gallons) Big pool (2x6x12m) football field, 20m deep Lakes Michigan/Huron all Great Lakes (nearly) ocean?
5
5 JAF INBRE Data Flood 8/4/09 Bioinformatics tools YearData volume 19771L 1986barrel 1995big pool 1998football field 2008Lake Michigan 2009Great Lakes 2012ocean? Technology hose pfd Kayak Orca? bigger boat? Glomar? spoon
6
6 JAF INBRE Data Flood 8/4/09 Bioinformatics: bigger boat? Your thesis Data The Computer (bioinformatics) Hypo You Your hypothesis
7
7 JAF INBRE Data Flood 8/4/09 Reflection on the metaphor ✦ At some point, you can use fundamentally different techniques: spoons versus boats ✦ At some point, you can test fundamentally new hypotheses: not “we need a smaller shark” ✦ Sometimes the old technology is still good: the kayak was appropriate in this picture ✦ The new technology may be for a different purpose: fishing versus deep sea exploration
8
8 JAF INBRE Data Flood 8/4/09 ✦T✦Technology quiz!
9
9 What does this do?
10
10 JAF INBRE Data Flood 8/4/09 What does this do? Not that! THIS! A Bigger Boat Whateve r you tell it to do!
11
11 JAF INBRE Data Flood 8/4/09 What is Bioinformatics? ✦ Bioinformatics is what you tell the computer to do with your data
12
12 JAF INBRE Data Flood 8/4/09 Of Boats and Bioinformatics Bioinformatics is what you do with the boat you are in during the data flood You might be able to do more with a bigger boat
13
13 JAF INBRE Data Flood 8/4/09 Sampling emergent diversity ✦ Get ALL DNA along a age-variant transect 10 samples per site time since exposure: 5y, 19y, 40y, 63y, 100y, and 150y “chronoclines” sample ecosystems by age ✦ Who’s there? ✦ How does ecosystem change over time?
14
14 JAF INBRE Data Flood 8/4/09 Bioinformatics problems ✦ Estimate α diversity: number of “species” in each sample and age group ✦ Estimate β diversity: amount of variation in “species” between age groups ✦ Determine which species (no quotes) are present in each sample (not part of this talk) Biological questions: How do soil bacterial respond to retreating glaciers? How do microbial soil communities change?
15
15 JAF INBRE Data Flood 8/4/09 Lots of data (post QC) AgeSamplesSequencesDNA Mbp 5y935,0928.77 19y1041,49410.37 40y833,6658.42 63y941,76710.44 100y841,17810.29 150y840,21010.05 Total52233,40658.35 Note: A SMALL run, max is 37GB/8hr run max, 1.6 Bbp/day
16
16 JAF INBRE Data Flood 8/4/09 Bioinformatics objectives determine species cluster by species cluster by age Explain data in terms of biological processes and age (tell a story) Too much data: 233K sequences!
17
17 JAF INBRE Data Flood 8/4/09 Trick: Turn it upside down Cluster each of 52 samples (approx. 6k each), choose a proxy sequence Cluster proxies by age (approx. 40k each) Cluster combined sequences to get species (quantify richness) Build +/- matrix +++ - +-- +-+ -++++
18
18 JAF INBRE Data Flood 8/4/09 Bioinformatics challenges ✦ Move data between computers (IGS, laptop, IBEST Core) ✦ File the data in a retrievable way ✦ Associate metadata with data ✦ Cluster sequences within/between samples ✦ Associate clusters with species ✦ Compute diversity statistics ✦ Prepare publications and talks ✦ (much more)
19
19 JAF INBRE Data Flood 8/4/09 Conclusions ✦ Biology There are thousands of species of bacteria in arctic soil Number of bacterial species increases as time of post-glacial exposure increase ✦ Algorithmics (want a job?) “Quantity has a quality all it’s own” (V.I.Lenin) Need new algorithms to use new hardware Database/dataset management is crucial
20
20 JAF INBRE Data Flood 8/4/09 Thanks! ✦ Ursel Schüette ✦ Zaid Abdo ✦ Jacob Pierson ✦ Larry Forney ✦ Rob Lyon ✦ The Forney-Top lab ✦ John Bunge, Cornell ✦ The Relational Database project, MSU ✦ to INBRE for the excuse ✦ to IBEST for the science ✦ to NIH, NSF, and UI for the money ( P20RR16448, P20RR016454, EPS080935)
21
21 JAF INBRE Data Flood 8/4/09 ✦ Discussion?
22
22 JAF INBRE Data Flood 8/4/09 Extra stuff Intentionally blank
23
23 JAF INBRE Data Flood 8/4/09 Roche 454: a genome a day
24
24 JAF INBRE Data Flood 8/4/09 Metagenomics ✦ Harvest approximately first 300bp of every 16s rRNA molecule, all samples Ribosome: required to translate DNA (conserved) Common marker for microbial species ✦ Cluster by evolutionary relationships (“species”) ✦ Analyze by chronocline
25
25 JAF INBRE Data Flood 8/4/09 Future work: same tune, new lyrics ✦ Data from human microbiome How do microbial communities vary between healthy and sick people? ✦ Data from polluted soil (Yangtzee river, PRC) How do microbial communities vary as pollution increases? ✦ Data from longitudinal transects How does microbial diversity change with latitude?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.