Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,

Slides:



Advertisements
Similar presentations
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Recombination and genetic variation – models and inference
Sampling distributions of alleles under models of neutral evolution.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
MALD Mapping by Admixture Linkage Disequilibrium.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Atelier INSERM – La Londe Les Maures – Mai 2004
Signatures of Selection
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College Cold Spring Harbor Laboratory Advanced Bioinformatics.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Scott Williamson and Carlos Bustamante
Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College Cold Spring Harbor Laboratory Advanced Bioinformatics.
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
Lecture X.X1. 2 The informatics of SNPs and Haplotypes Gabor T. Marth Department of Biology, Boston College
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Lecture 21: Tests for Departures from Neutrality November 9, 2012.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Coalescent Models for Genetic Demography
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Lecture 7.01 The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College CGDN Bioinformatics Workshop June.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Lecture 20 : Tests of Neutrality
The International Consortium. The International HapMap Project.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
The Haplotype Blocks Problems Wu Ling-Yun
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Common variation, GWAS & PLINK
Of Sea Urchins, Birds and Men
Constrained Hidden Markov Models for Population-based Haplotyping
Signatures of Selection
High-resolution haplotype structure in the human genome
Statistical Modeling of Ancestral Processes
Discovery tools for human genetic variations
The Allele Frequency Spectrum in Genome-Wide Human Variation Data Reveals Signals of Differential Demographic History in Three World Populations Gabor.
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Incorporating changing population size into the coalescent
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Research for medical discovery at the Computational Genomics Laboratory at Boston College Biology Gabor T. Marth Department of Biology, Boston College.
Presentation transcript:

Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA 02467

The current public resource (dbSNP) contains over 10 million SNPs 1. How are these SNPs structured within the genome? 2. What can we learn about the processes that shape human variability? 3. What is the utility of these data for medical applications? The current variation resource

in different regions of given lengths at the scale of the chromosomes Nucleotide diversity is heterogeneous

G+C nucleotide content CpG di-nucleotide content recombination rate functional constraints 3’ UTR5.00 x ’ UTR4.95 x Exon, overall4.20 x Exon, coding3.77 x synonymous 366 / 653 non-synonymous287 / 653 Variance is so high that these quantities are poor predictors of nucleotide diversity in local regions, hence random processes are likely to govern the basic shape of the genome variation landscape described by neutral theory Compositional and functional features

marker density (MD): distribution of number of SNPs observed in pairs of sequences Strategy – study observable distributions “rare” “common” allele frequency spectrum (AFS): distribution of SNPs according to allele frequency in a set of samples

Build models of fundamental forces (drift, mutation process, demography, recombination, selection) that accurately describe these distributions Use these same models to improve our expectations of allelic association (linkage disequilibrium, LD) and human haplotype structure, properties less amenable to measurement but fundamental for medical association Strategy – modeling approach region of strong allelic association region of reduced haplotype diversity

Trace the genealogy of samples at hand, through significant events (e.g. coalescent, recombination) back into the past, until the Most Recent Common Ancestor of all samples is found. The shape of the genealogy is modulated by the underlying model structure and parameters. Tabulate the statistical properties of the resultant polymorphic structure Add mutations according to a neutral mutation model Tool – the Coalescent process N1 N2 N3 T1 T2 past present simple, but dynamic model of demography

computable formulations simulation procedures Model generation and model fitting 3/5 1/52/5 parameter i parameter j

past present stationaryexpansioncollapse MD (simulation) AFS (direct form) history bottleneck Model expectations – Demography

Marth et al., PNAS 2003 our conclusions from the marker density data are confounded by the unknown ethnicity of the public genome sequence best model is a bottleneck shaped population size history data fit very good at each length examined (4-16 kb) present N1=6,000 T1=1,200 gen. N2=5,000 T2=400 gen. N3=11,000 Model fitting in BAC marker density data we looked at allele frequency data from ethnically defined samples

present N1=20,000 T1=3,000 gen. N2=2,000 T2=400 gen. N3=10,000 model consensus: bottleneck The frequency spectrum in European samples How general are these observations?

European data African data bottleneck modest but uninterrupted expansion African spectra tell a different story Marth et al., Genetics, in press

African dataEuropean data contribution of the past to alleles in various frequency classes average age of polymorphism Predictions – Age of polymorphisms

* LD measures the strength of allelic association between two markers Predictions – Linkage disequilibrium*

Severity of a European bottleneck

African-American spectra – Admixture? African spectrum European spectrum

Daly et al., Nature Genetics, 2001 Haplotype structure – Haplotype blocks These predictions agree with experimental observations from other labs, most notably with the presence of regions of strong allelic association, termed “haplotype blocks”, evident primarily in European samples. a few frequent haplotypes (e.g. 10% min. frequency) make up the majority of all observed haplotypes (e.g. > 80%) block

The HapMap initiative 1. Frequent haplotypes can be used as markers for functional variants 2. Significant marker reduction possible The promise HapMap Initiative: map haplotype blocks across the entire human genome Questions of generality within and across human populations patterns in reference samples patterns in clinical samples ?

Predictions – Haplotype structure Going back to our own studies, we predict haplotype block size under African demographic history as roughly half the European size (consistent with observations) To what degree do “blocks” coincide? We have to analyze the spatial relationships between the polymorphic structure of different populations We examine this question from the standpoint of demographic history (an obvious candidate to cause population specific differences)

The “true” history of all human populations is interconnected We study these relationships with models of population subdivision “African history”“European history” “migration” The genealogy of samples from different populations are connected through the shared part of our past Polymorphic markers (some shared, some population-specific) and haplotypes are placed into a common frame of reference Connecting ethnic demographies

European African monomorphicrarecommon monomorphic 0.0 % 19.9 % 13.2 % 2.3 % 1.0 % rare 43.4 % 43.7 % 11.5 % 11.0 % 4.6 % 7.4 % common 10.2 % 4.2 % 4.4 % 6.0 % 6.6 % 13.4 % shared SNPs observation in UW PGA data SNPs private to African samples SNPs private to European samples SNPs common in both populations alleles often have different frequencies in different populations our simple model of subdivision captures the qualitative dynamics we now have the tools to start evaluating and guiding the design for variation resources that are general for all populations Predictions – Joint allele frequency