Andrew Meade School of Biological Sciences.

Slides:



Advertisements
Similar presentations
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200.
Advertisements

$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500.
Chapter 7 EM Math Probability.
Understanding Value Stream Decision Making
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300.
Hosted By Prissy & Stephy Multiplying exponents Negative exponents Dividing exponent percents
Year 6 mental test 10 second questions Numbers and number system Numbers and the number system, fractions, decimals, proportion & probability.
The Human Genome Project Main reference: Nature (2001) 409,
The IP Revolution. Page 2 The IP Revolution IP Revolution Why now? The 3 Pillars of the IP Revolution How IP changes everything.
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
After 13 years of scientist work predominatly in USA & UK the DNA sequence of the human genome was completed in 2003 Any ideas how they did it? What would.
Equal or Not. Equal or Not
Slippery Slope
1. SQL Server 2014 In-Memory by Design Arthur Zubarev June 21, 2014.
Genetica per Scienze Naturali a.a prof S. Presciuttini Homologous genes Genes with similar functions can be found in a diverse range of living things.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
THE EVOLUTIONARY HISTORY OF BIODIVERSITY
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
PHYLOGENY AND SYSTEMATICS
Classification systems have changed over time as information has increased. Section 2: Modern Classification K What I Know W What I Want to Find Out L.
High-Performance Algorithm Engineering for Computational Phylogenetics [B Moret, D Bader] Kexue Liu CMSC 838 Presentation.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Adaptive evolution of bacterial metabolic networks by horizontal gene transfer Chao Wang Dec 14, 2005.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics)
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Scientific FieldsScientific Fields  Different fields of science have contributed evidence for the theory of evolution  Anatomy  Embryology  Biochemistry.
Chapter 25 Phylogeny and Systematics. Macroevolution Attempts to explain how major adaptive characteristics came into existence These characteristics.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Classification and Systematics Tracing phylogeny is one of the main goals of systematics, the study of biological diversity in an evolutionary context.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
HPC in linguistic research Andrew Meade University Of Reading
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
17.2 Modern Classification
Condor: BLAST Rob Quick Open Science Grid Indiana University.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Systematics and Phylogenetics Ch. 23.1, 23.2, 23.4, 23.5, and 23.7.
Chapter 26 Phylogeny and Systematics. Tree of Life Phylogeny – evolutionary history of a species or group - draw information from fossil record - organisms.
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
Phylogeny and Taxonomy. Phylogeny and Systematics The evolutionary history of a species or related species Reconstructing phylogeny is done using evidence.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Darwin’s Tree of Life, July million species Phylogenetic inference from genomic.
WABI: Workshop on Algorithms in Bioinformatics
Gil McVean Department of Statistics
Phylogeny & the Tree of Life
Pipelines for Computational Analysis (Bioinformatics)
Human Cells Human genomics
In-Text Art, Ch. 16, p. 316 (1).
Multiple Alignment and Phylogenetic Trees
Molecular Phylogenetics
Biological Classification: The science of taxonomy
Biological Classification: The science of taxonomy
Molecular Evolution.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Unit Genomic sequencing
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Evidence for Evolution
Human Genome Project Seminal achievement. Scientific milestone.
Section 20.4 Mutations and Genetic Variation
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Andrew Meade School of Biological Sciences

Molecular sequence growth rates from 600 to 100 million sequences in 25 years Human Genome project

Molecular sequence growth rates 18 million new sequences a year (2007 – 2008) Rate of growth is accelerating Doubling every 2 years Likely to continue with new sequencing technology Cost, time and technical ability required has reduced

Its worse than it looks Lack of suitably tools for sequence analysis Analysis methods dont always scale linearly Methods have changed Simple heuristics Statistical methods Simple rules More realistic models Descriptive results Biological process Sub system analysis Systems biology Computing power a major rate limiting steep The widening gap between data and analytical methods is increasing

Tools for genomic analysis Current ToolsRequired Tools Co-opted for purpose Designed for smaller data sets Limited to a single computer External data required Hard to generalise Custom build Limited by available hardware Use available computers Models derived from data Identify informative information in the data

454 parallel sequencing Fast, million bases per 10 hours Human genome in 100 hours, HGP 13 years Cheap, 20¢ per kb, currently $12 Human genome for $100,000, HGP $10 billion Accurate, 99% accurate on 400 th base Small chunks 400 – 800 bases per sequence Similar to parallel computing, hard to convert raw power to usefully results The catch - analysis

454 sequencing Sequence populations of bacteria (16s) taken from cow guts under different experiential conditions Identify how changes in feed affects bacteria populations. 332,000 sequence in total £8,000 using 454, previously over £2 million

454 sequencing analysis Find how closely related sequence are to each other. Perform an approximate match between all pairs of sequences. Allowing for insertions, deletions and mutations. 332,000^2 * 0.5 = 5.5 * comparisons 874 years on a single computer Trivially parallel task, easy to distribute over nodes, different clusters, different OS / hardware.

454 sequencing analysis 2 Cluster sequences from previous steep to find what species are present and in what quantities 102 GB of data. Distributed code to reduce memory and processing requirements. Liner scaling (memory, CPU) up to 200 nodes Problems with disk access.

Bayesian Phylogenetic inference Infer evolutionally histories (phylogenies) from molecular data. Widely uses in all arias for biology. Used to investigate how genes and proteins change and adapt to their environment How viruses spread and mutate Reconstruct ancestral genes and proteins Used in conservation studies to identify species that are most at risk of extinction and most valuable to conserve

Mammal Mitochondrial 44 Taxa 13 Protein coding regions Nucleotides

Number of computers 1~ 70 days 60~ 2 days Mammal Mitochondrial scaling x x x x