Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.

Slides:



Advertisements
Similar presentations
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Advertisements

Molecular Evolution. Morphology You can classify the evolutionary relationships between species by examining their features Much of the Tree of Life was.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Molecular Evolution Revised 29/12/06
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Tracing the dispersal of human populations By analysis of polymorphisms in the Non-recombining region of the Human Y Chromosome Underhill et al 2000 Nature.
General methods of SNP discovery: PolyBayes Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College Cold Spring Harbor Laboratory Advanced Bioinformatics.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Genome Annotation and the landscape of the Human Genome Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College Cold Spring Harbor Laboratory Advanced Bioinformatics.
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
Lecture X.X1. 2 The informatics of SNPs and Haplotypes Gabor T. Marth Department of Biology, Boston College
Human Genome Sequence and Variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary,
Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Out-of-Africa Theory: The Origin Of Modern Humans
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
John Baumgardner Logos Research Associates Recent Discoveries in Human Genetics Affirm Genesis 1-11 as Authentic History.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis.
Medical variations Gabor T. Marth Boston College Biology Department BI543 Fall 2013 February 5, 2013.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Lecture 7.01 The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College CGDN Bioinformatics Workshop June.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Copyright © 2010 Pearson Education, Inc. publishing as Benjamin Cummings Lectures by Greg Podgorski, Utah State University Current Issues in Biology, Volume.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Single Nucleotide Polymorphisms (SNPs) By Amira Jhelum Rahul Shweta.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Phylogeny and the Tree of Life
Bioinformatics Overview
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Gil McVean Department of Statistics
DNA Marker Lecture 10 BY Ms. Shumaila Azam
Discovery tools for human genetic variations
Genome organization and Bioinformatics
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Incorporating changing population size into the coalescent
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
The Content of the Genome
Research for medical discovery at the Computational Genomics Laboratory at Boston College Biology Gabor T. Marth Department of Biology, Boston College.
Introduction to Bioinformatics
Presentation transcript:

Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006

Lecture overview 1. Inter-species evolution and comparative genomics 2. Intra-species evolution, population genomics, and human origins

1. Inter-species evolution and comparative genomics Initial sequencing and comparative analysis of the mouse genome Mouse Genome Sequencing Consortium Nature 420,

Questions of Evolutionary Biology What are the taxological relationships between living organisms (which organisms are more or less closely related to each other)? How do genes evolve? How do genomes evolve? How do comparisons with other organisms help us understand our own genome?

Mechanisms of molecular evolution

DNA sequence evolution: mutations

Phylogenetic relationships (1) Higgs and Attwood, Bioinformatics and Molecular Evolution, Blackwell Publishing Multiple alignment of mammalian mitochondrial small subunit rRNA sequences

Phylogenetic relationships (2) Higgs and Attwood, Bioinformatics and Molecular Evolution, Blackwell Publishing Jukes-Cantor distance matrix for mammalian mitochondrial small subunit rRNA sequences

Phylogenetic relationships (3) Higgs and Attwood, Bioinformatics and Molecular Evolution, Blackwell Publishing Phylogenetic tree constructed from mammalian mitochondrial small subunit rRNA sequences

Gene structure evolution: duplications

Gene duplication – paralogs Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Evolution of chromosome organization

Synteny Initial sequencing and comparative analysis of the mouse genome Mouse Genome Sequencing Consortium Nature 420,

Gene classes across organisms Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Gene conservation across organisms Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Comparative genomics helps gene annotations

2. Intra-species evolution, population genomics, and human origins

Questions about human evolution How do we discover / assess genetic variations? What is the level of diversity across humans? How can we model the ancestral and mutation processes? What do phylogenetic analyses of human mitochondrial sequences tell us about human origins and dispersal? Does mitochondrial DNA give us the full picture? What do we learn from model-fitting analysis of nuclear DNA? A single wave of out-of-Africa migration or multiple waves?

How do we discover SNPs? look at multiple sequences from the same genome region use base quality values to decide if mismatches are true polymorphisms or sequencing errors

SNP discovery procedure genome reference sequence 1. Fragment recruitment (database search) 2. Anchored alignment 3. Paralog identification 4. SNP detection

SNP discovery on the genome scale Sachidanandam et al. Nature 2001 ~ 8 million EST WGS BAC genome reference

Human genetic diversity polymorphism density along chromosomes varies widely average polymorphism rate between a pair of human chromosomes: 1 SNP in 1,300 bp of sequence

What explains heterogeneity? G+C nucleotide content CpG di-nucleotide content recombination rate functional constraints 3’ UTR5.00 x ’ UTR4.95 x Exon, overall4.20 x Exon, coding3.77 x synonymous 366 / 653 non-synonymous287 / 653 Variance is so high that these quantities are poor predictors of nucleotide diversity in local regions hence random processes are likely to govern the basic shape of the genome variation landscape  (random) genetic drift

The origin of genetic variations sequence variations are the result of mutation events TAAAAAT TAACAAT TAAAAAT TAACAAT TAAAAATTAACAAT TAAAAAT MRCA mutations are propagated down through generations and determine present-day variation patterns

Recombination messes up phylogenies acggttatgtaga accgttatgtaga acggttatgtaga accgttatgtaga because of recombination, DNA sequences may not have a unique common ancestor, hence phylogenetic analysis may not apply

What does mtDNA say about human origins? However, the mitochondrion is only a single locus (~16kb, short on the scale of the 3Gb human genome) Campbell and Heyer. Genomics, Proteomics, Bioinformatics. Cummings.

What does nuclear DNA say? Because of recombination, phylogenetic analysis is not feasible (there is not a unique tree that can explain the ancestry of DNA sequences) Instead, one uses statistical “genetic analysis” i.e. one examines the statistical properties of the possible ancestries that produced the nucleotide sequences observed in individuals

Polymorphism data 1. marker density (MD): distribution of number of SNPs in pairs of sequences “rare” “common” 2. allele frequency spectrum (AFS): distribution of SNPs according to allele frequency in a set of samples Clone 1 Clone 2# SNPs AL00675AL AS81034AK CB00341AL SNPMinor alleleAllele count A/GA1 C/TT9 A/GG3

Population genetic models past present stationaryexpansioncollapse MD (simulation) AFS (direct form) history bottleneck

Data fitting: polymorphism density best model is a bottleneck shaped population size history present N 1 =6,000 T 1 =1,200 gen. N 2 =5,000 T 2 =400 gen. N 3 =11,000 Marth et al. PNAS 2003 our conclusions from the marker density data are confounded by the unknown ethnicity of the public genome sequence we looked at allele frequency data from ethnically defined samples

Data fitting: allele frequency present N1=20,000 T1=3,000 gen. N2=2,000 T2=400 gen. N3=10,000 model consensus: bottleneck bottleneck ~ 3,000 generations (or 100,000 years) ago

Data from other human populations European data African data bottleneck modest but uninterrupted expansion Marth et al. Genetics 2004

What nuclear DNA tells us Recent African OriginMultiregional our results