Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida.

Slides:



Advertisements
Similar presentations
Imputation for GWAS 6 December 2012.
Advertisements

Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Supplementary Figure S1 Distribution of observed (blue) and Poisson expected (red) standard deviation of human-chimpanzee divergence over different window.
Recombination and genetic variation – models and inference
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Single Nucleotide Polymorphism And Association Studies
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
The HAP webserver: Tools for the Discovery of Genetic Basis of Human Disease HYUN MIN KANG Computer Science and Engineering University of California, San.
MALD Mapping by Admixture Linkage Disequilibrium.
Methods and challenges in the analysis of admixed human genomes Simon Gravel Stanford University.
Signatures of Selection
The role of variation in finding functional genetic elements Andy Clark – Cornell Dave Begun – UC Davis.
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
Workshop in Bioinformatics Eran Halperin. The Human Genome Project “What we are announcing today is that we have reached a milestone…that is, covering.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
CSB Efficient Computation of Minimum Recombination With Genotypes (Not Haplotypes) Yufeng Wu and Dan Gusfield University of California, Davis.
Picking SNPs Application to Association Studies Dana Crawford, PhD SeattleSNPs PGA University of Washington March 20, 2006.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
RECOMBINOMICS: Myth or Reality? Laxmi Parida IBM Watson Research New York, USA.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
What does it mean, in practice? 100%. Members of our community are only slightly less different from us than members of distant populations 85% 100%
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Molecular & Genetic Epi 217 Association Studies
CS177 Lecture 10 SNPs and Human Genetic Variation
Informative SNP Selection Based on Multiple Linear Regression
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
California Pacific Medical Center
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Motivations to study human genetic variation
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
© Vipin Kumar IIT Mumbai Case Study 2: Dipoles Teleconnections are recurring long distance patterns of climate anomalies. Typically, teleconnections.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
How Accurate is Pure Parsimony Haplotype Inferencing
Estimating Recombination Rates
Tracing the Route of Modern Humans out of Africa by Using 225 Human Genome Sequences from Ethiopians and Egyptians  Luca Pagani, Stephan Schiffels, Deepti.
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection
Brian K. Maples, Simon Gravel, Eimear E. Kenny, Carlos D. Bustamante 
Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes  Matthieu Deschamps, Guillaume Laval,
Volume 173, Issue 1, Pages e9 (March 2018)
Catarina D. Campbell, Nick Sampas, Anya Tsalenko, Peter H
Pier Francesco Palamara, Laurent C. Francioli, Peter R
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Trevor J. Pemberton, Chaolong Wang, Jun Z. Li, Noah A. Rosenberg 
SNPs and CNPs By: David Wendel.
Yu Zhang, Tianhua Niu, Jun S. Liu 
Volume 152, Issue 8, Pages (June 2017)
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
Presentation transcript:

Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Recall: IRiS Identification of Recombinations in Sequences IRiS is a computational method developed with biological insight detects evidence of historical recombinations minimizes number of recombinations in Ancestral Recombinational Graph (ARG)

Recotypes recombination edge mutation edge extant sequence Two chromosomes share a recombination if the junction is co-inherited.

Recotypes Two chromosomes share a recombination if the junction is co-inherited. r1 ab

Recotypes Two chromosomes share a recombination if the junction is co-inherited. r1 r2 abc

Recotypes Two chromosomes share a recombination if the junction is co-inherited. r1 r2 abc r1r2… a10 b10 c01 …

Validity of inferred recombinations Comparison with sperm typing Computer simulated recombinations

in vitro Chr 1 near MS32 minisatellite Jeffreys et al UK semen donor of North European origin - Sperm typing - LDhat and Phase (200 SNPs) IRiS LDhat Phase sperm typing HapMap 2 CEU population similar SNP density

in silico HapMap 3 X chromosome data Select 2 chromosomes at random. Pick a random breakpoint. Create a new chromosome. Check if it is unique, add to the dataset. Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes

in silico HapMap 3 X chromosome data Select 2 chromosomes at random. Pick a random breakpoint. Create a new chromosome. Check if it is unique, add to the dataset. Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes

in silico HapMap 3 X chromosome data Select 2 chromosomes at random. Pick a random breakpoint. Create a new chromosome. Check if it is unique, add to the dataset. Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes

in silico HapMap 3 X chromosome data Select 2 chromosomes at random. Pick a random breakpoint. Create a new chromosome. Check if it is unique, add to the dataset. Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes

in silico HapMap 3 X chromosome data Select 2 chromosomes at random. Pick a random breakpoint. Create a new chromosome. Check if it is unique, add to the dataset. Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes

in silico HapMap 3 X chromosome data Select 2 chromosomes at random. Pick a random breakpoint. Create a new chromosome. Check if it is unique, add to the dataset. Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes IRiS recombination detected?

in silico HapMap 3 X chromosome data Select 2 chromosomes at random. Pick a random breakpoint. Create a new chromosome. Check if it is unique, add to the dataset. Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes IRiS recombination detected? 69% recombinations detected All detected recombinations detect the correct sequence No false positives

Recombinomics Strong population structure Agreement with traditional methods  FST vs. recombinational distance More informative than SNPs  STRUCTURE  PCA

Regions 18 regions selected from HapMap 3 X-chromosome in males (to avoid phasing errors) 50 KB away from known CNV and SD (to avoid genotyping errors) 50 KB away from genes (to avoid selection) at least 80 SNPs Chromosomes: LWK(43), MKK (88), YRI (88), ASW (42), GIH (42), CHB (40), CHD (21), JPT(25), MEX(21), CEU (74), TSI (40)

Analysis For each region IRiS inferred recotypes for each chromosome  5166 recombinations were inferred  3459 co-occurred in at least two chromosomes r1r2r3r4r5r6…r3459 LK LK : LK MK : TI Chromosome Recombination

Analysis For each region IRiS inferred recotypes for each chromosome  5166 recombinations were inferred  3459 co-occurred in at least two chromosomes r1r2r3r4r5r6…r3459 LK LK : LK MK : TI Chromosome Recombination Recotype

Agreement with LDhat number of recombinations inferred by IRiS recombination rate inferred by LDhat Spearman correlation = pvalue < Each point represents a short haplotype segment in HapMap CEU population

Agreement with LDhat number of recombinations inferred by IRiS recombination rate inferred by LDhat Spearman correlation = pvalue < Each point represents a short haplotype segment in HapMap CEU population Correlation in hotspots  2 = pvalue<6x10 -10

Recombinational distance between populations Two populations genetically closer will share a higher number of recombinations Recombinational distance Correlation between FST distance and recombinational distance for the 18 region [0.35 – 0.75 ] with pvalues < = R A + R B -R AB R AB D AB MDS All regions combined stress=6.1% 1 -

PCA of population data r1r2r3r4r5r6…r3459 LK LK : LK MK : TI Recall recotypes

PCA of population data r1r2r3r4r5r6…r3459 LK LK : LK MK : TI Recall recotypes r1r2r3r4r5r6…r3459 LK MK : TI

PCA of population data r1r2r3r4r5r6…r3459 LK MK : TI The first two PCs capture 66.4% of the variance

PCA of recotypes  more on this later

Recotypes vs. SNPs Due to ascertainment bias gene diversity does not reflect population structure Percentage of variance SNPsRecotypes Across groups9%6% Within groups4%1% Within populations 87%93% Normalized comparison linearly scaled to [0,1] using 21 samples per population in agreement with Lewontin 72 results similar to Conrad 07

from SNPs to haplotypes to recotypes (a STRUCTURE comparison ) K=2 SNPs haplotypes recotypes

from SNPs to haplotypes to recotypes (a STRUCTURE comparison ) K=3 SNPs haplotypes recotypes

from SNPs to haplotypes to recotypes (a STRUCTURE comparison ) K=4 SNPs haplotypes recotypes

from SNPs to haplotypes to recotypes (a STRUCTURE comparison ) K=5 SNPs haplotypes recotypes

Africa within global genetic variation Avg. Number of recombinations in 21 random chromsomes Out of Africa hypothesis Founder’s effect minority African specific component Structure k=4

Genetic variation within Africa Maasai specific minor component Structure k=5  Subsaharan Maasai are distinct among Africans. Parra 98  African-American exhibit stronger recombinational affinity with African populations than European populations. (Parra 98)

Genetic variation outside Africa Structure k=5  Outside Africa, Gujarati and Japanese exhibit the highest and lowest number of recombinations respectively.  Gujarati Indians show intermediate position between Europeans and East Asians. Avg. Number of recombinations in 21 random chromsomes

Venturing outside the X-chromosome Benefits  The bigger picture  More regions and hence more information Challenges  Higher number of recombinations makes the picture murkier  Phasing errors

Regions 81 regions selected from HapMap 3 50 KB away from known CNV and SD (to avoid genotyping errors) 50 KB away from genes (to avoid selection) at least 200 SNPs 25 samples per population (each sample has two chromosomes)

Analysis For each region IRiS inferred recotypes for each chromosome  recombinations were inferred merged For each sample the two recotypes were merged. SNPsrecotypes PCA plots

Quantifying population structure PCA and by k nearest neighbors is used to predict population of every sample AfricansNon- Africans MKK LKK YRIASW GIHE. AsianMEXEuropean CHB+CHDJPTCEUTSI (4,3)(4,3) (0,7)(0,7) (3,13)(8,13) Perfectly classified classified with errors Misclassification by (recotypes, SNPs)

East Asian population Recotypes are more informative of underlying population structure. SNPsrecotypes PCA plots

in conclusion … Recotypes show strong agreement with in silico and in vetro recombination rates estimates are highly informative of the underlying population structure provide a novel approach to study the recombinational dynamics