Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.

Slides:



Advertisements
Similar presentations
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Advertisements

Molecular Evolution. Morphology You can classify the evolutionary relationships between species by examining their features Much of the Tree of Life was.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
MALD Mapping by Admixture Linkage Disequilibrium.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
KEY CONCEPT A population shares a common gene pool.
Population and Speciation
Evolution of Populations
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
Non-Mendelian Genetics
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
Gene Hunting: Linkage and Association
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
INTRODUCTION TO ASSOCIATION MAPPING
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Copyright © 2004 Pearson Prentice Hall, Inc. Chapter 7 Multiple Loci & Sex=recombination.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Evolution and Population Genetics
The PTC story....
Common variation, GWAS & PLINK
KEY CONCEPT A population shares a common gene pool.
Genetic Linkage.
Gil McVean Department of Statistics
MULTIPLE GENES AND QUANTITATIVE TRAITS
Population genetics Dr Gavin Band
Signatures of Selection
COALESCENCE AND GENE GENEALOGIES
15-2 Mechanisms of Evolution
Genetic Linkage.
Recombination (Crossing Over)
Fossils provide a record of evolution.
KEY CONCEPT A population shares a common gene pool.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Linkage: Statistically, genes act like beads on a string
Patterns of Linkage Disequilibrium in the Human Genome
Conclusions of Hardy-Weinberg Law
KEY CONCEPT A population shares a common gene pool.
MULTIPLE GENES AND QUANTITATIVE TRAITS
KEY CONCEPT A population shares a common gene pool.
KEY CONCEPT A population shares a common gene pool.
The ‘V’ in the Tajima D equation is:
Population Genetics.
Lecture 2: Basic Population Genetics
Evolutionary Mechanisms
The Evolution of Populations
Vineet Bafna/Pavel Pevzner
The coalescent with recombination (Chapter 5, Part 1)
Genetic Drift, followed by selection can cause linkage disequilibrium
Genetic Linkage.
Mechanisms of Evolution
Natural Selection Genetic Drift Gene Flow Mutation Recombination
CATALYST Recall and Review: How do these terms relate to DNA?
Haplotypes at ATM Identify Coding-Sequence Variation and Indicate a Region of Extensive Linkage Disequilibrium  Penelope E. Bonnen, Michael D. Story,
What evidence do we have for evolution? (5)
Jonathan K. Pritchard, Joseph K. Pickrell, Graham Coop  Current Biology 
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Gene Variation Chapter 6.6.
15.2 Mechanisms of Evolution
Reminder The AP Exam registration is open in Naviance. The Exam is on Monday, May 13. I’ll let you know when the next test/homework will be.
First, let’s talk about the word THEORY…
Evolution of Populations
Population Genetics: The Hardy-Weinberg Law
Presentation transcript:

Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data or will be in the near future and we are going to want to relate that information to phenotypes. In statistics it’s important to understand the process that generated the data (population genetics goes further back in time). Population genetics is the study of how genes evolve through time There is a large and fascinating body of science on the topic Can’t go through all of it so focus on that most relevant to genomic epidemiology

...or “what processes led to the data we’re analysing?” Population Genetics ...or “what processes led to the data we’re analysing?” As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data or will be in the near future and we are going to want to relate that information to phenotypes. In statistics it’s important to understand the process that generated the data (population genetics goes further back in time). Population genetics is the study of how genes evolve through time There is a large and fascinating body of science on the topic Can’t go through all of it so focus on that most relevant to genomic epidemiology

Imagine we collect and sequence some samples... ATAGAAAGACCAGACTCCATCGCTAGCAGCTACGCTAGAGTTA N samples ATTGAAAGACCATACTCCATCGCTAGCAGC-ACGCTAGAGTTA ATAGAAAGACCAGACTCCATCGCAAGCAGC-ACCCTAGCGTTA ATAGAAAGACCAGACTCCATCGCAAGCAGCTACGCTAGAGTTA . . .

ATAGATAGACCATACTGCATCGCAAGCAGCTACGCTAGCGTTA Imagine we collect and sequence some samples... Reference sequence ATAGATAGACCATACTGCATCGCAAGCAGCTACGCTAGCGTTA ATAGAAAGACCAGACTCCATCGCTAGCAGCTACGCTAGAGTTA ATTGAAAGACCATACTCCATCGCTAGCAGC-ACGCTAGAGTTA ATAGAAAGACCAGACTCCATCGCAAGCAGC-ACCCTAGCGTTA ATAGAAAGACCAGACTCCATCGCAAGCAGCTACGCTAGAGTTA

ATAGATAGACCATACTGCATCGCAAGCAGCTACGCTAGCGTTA Imagine we collect and sequence some samples... ATAGATAGACCATACTGCATCGCAAGCAGCTACGCTAGCGTTA .....A......G...C......T..............A.... ..T.............C......T......-.......A.... As discussed yesterday there are many types of genetic variation. But to allow us to talk generally about the processes we are going to simplify the process assuming that at each polymorphism there are two alleles that segregate and that they are result from a single ancestral mutation event. .....A......G...C.............-..C......... .....A......G...C.....................A.... Insertion / deletion polymorphism SNPs

Outline Population genetic processes Measuring correlations between alleles Recombination Differences between populations Going to try and go through four areas I’ll mostly be speaking with respect to human diversity, but the concepts are fundamental to surveys of genetic variation from all species. Please do ask questions and interject.

ATAGATAGACCATACTGCATCGCAAGCAGCTACGCTAGCGTTA Genetic variation ATAGATAGACCATACTGCATCGCAAGCAGCTACGCTAGCGTTA .....A......G...C......T..............A.... ..T.............C......T......-.......A.... .....A......G...C.............-..C......... .....A......G...C.....................A.... ...or, in cartoon form: As discussed yesterday there are many types of genetic variation. But to allow us to talk generally about the processes we are going to simplify the process assuming that at each polymorphism there are two alleles that segregate and that they are result from a single ancestral mutation event.

Two chromosomes (= haplotypes) carried by each individual Genetic variation Two chromosomes (= haplotypes) carried by each individual As discussed yesterday there are many types of genetic variation. But to allow us to talk generally about the processes we are going to simplify the process assuming that at each polymorphism there are two alleles that segregate and that they are result from a single ancestral mutation event.

24 haplotypes (12 individuals) 100 SNPs on chromosome 20 Utah residents, ancestrally Northern and Western European So as an example lets look at some real data from a small region of Human chromosome 20 the HapMap project. There are clearly differences between in the patterns of diversity between these two groups, but what generated them? Yoruba from Ibadan, Nigeria

Population genetic processes Genetic drift Mutation Recombination Natural selection It is the combination of these fundamental processes that generate diversity within a population TO understand these processes we need to think about how genes evolve through time.

Genetic Drift population current generation N N-1 N-2 ... Lets start with thinking about what happens at a single locus

Genetic Drift current generation Lets start with thinking about what happens at a single locus

Genetic Drift current generation Dominic’s example of populating a whole city There is an important role of chance in the process

Genetic drift Genetic drift creates correlations between alleles current generation N generations ago . . Genetic drift creates correlations between alleles As Dominic’s gene’s speed through the population by chance the haplotype that he passed on would remain intact and reduce genetic diversity. Through this chance process ultimately the whole population would consist of a single haplotype. Why doesn’t this happen?

Recombination Paternal (father) Maternal (mother) Recombination A picture of the mechanism of recombination Recombination No recombination

Recombination breaks down the correlation between alleles . . Recombination breaks down the correlation between alleles

Thinking backwards in time As chromosome are passed from one generation to the next patterns of diversity evolve When we take data from the present we need to think about the past. What are the ancestral processes that generated the data? It is perhaps more natural to think backwards in time

Ancestral history Present day

Ancestry of the population Present day

Ancestry of sample Present day

Ancestry of sample The probability that two chromosomes share a common ancestor in the previous generation is 1/2N

Ancestral processes 2μ 2r 1/2N Mutation Recombination Coalesce 2μ 2r 1/2N If two chromosome coalesce before they incur a mutation or recombination event then they will be identical

Genetic diversity The probability that two individuals share a common ancestor in the previous generation is 1/2N The expected time to two individuals coalesce is 2N The probability two chromosomes are identical (by descent) is: Important thing here is that the probability depends on the (effective) population size 2N. Higher population size => longer coalescence times => lower probability of identity by descent (for a region of a given size.)

Large and small ancestral populations In large populations we have to go further back in time to time to find the common ancestor Consequently there is more opportunity for Mutation, increasing genetic diversity Recombination, decreasing correlation between alleles

Human population history The recent migration of European from Africa has lead to small effective population sizes

24 haplotypes (12 individuals) 100 SNPs on chromosome 20 Utah residents, ancestrally Northern and Western Europe We’ve explained a good deal in this picture. (Probably a good time to pause.) Yoruba from Ibadan, Nigeria

Natural Selection When a beneficial mutation arises it spreads quickly through the population generating strong correlations between alleles

Natural Selection Big differences in the patterns of diversity between populations can be generated by natural selection

Differences between populations Big differences in the patterns of diversity between populations can be generated by natural selection

Population genetics Genetic drift generates correlations between alleles Recombination breaks them down The ancestral population size and history determines the amount of diversity and how it is structured Natural selection can generate strong differences between populations

Measuring correlations In genetics correlation between alleles is called linkage disequilibrium (LD) There are several measures of LD Understanding LD in natural populations is important for genomic epidemiology

Linkage equilibrium A B AB Ab a b ab aB Independence between the two loci. The expected frequency of the AB haplotype is just the product of the marginal allele frequencies. Haplotype frequencies are determined by SNP allele frequencies (they are in equilibrium)

Linkage disequilibrium AB Ab aB ab Haplotype frequencies differ from those expected if the SNPs are independent (they are in disequilibrium)

Measuring LD D ≈ 0 when near linkage equilibrium D ≠ 0 when there is linkage disequilibrium Two measures

Haplotypes and LD 1 2 3 4 r2 is less than one unless SNP A is a perfect surrogate of SNP B in the sample D’ statistic less than one if and only if all four haplotypes are present in sample So D’ is 1 unless visible recombination has occurred

Recombination and physical distance Correlations decay with distance (due to recombination)

Looking at patterns of LD High r2 Low r2 Assume similar physical spacing LD patterns are complicated

Recombination clusters along chromosomes Studies have shown that recombination is not uniform along chromosomes

Recombination hotspots Recombination hotspots occur through out the genome

Hotspots and haplotypes Hotspots can break down correlations over short distances

Hotspots and haplotypes Recombination hotspots lead to regions of strong correlation separated by regions of low LD

LD and Recombination There are lots of ways to measure LD Recombination is not uniform along chromosomes Much of the recombination happens in hotspots and these demark breakdown in correlations Correlations do persist across hot spots

Differences between populations The overall pattern of LD is conserved The different ancestral histories lead to different levels of LD

Differences between populations The overall pattern of LD is conserved The different ancestral histories lead to different levels of LD

Population structure in Africa There is evidence for widespread population structure across Africa

Population structure in Africa Add population differences between groups from the same region

Maasai in Kinyawa, Kenya 24 haplotypes (12 individuals) 100 SNPs on chromosome 20 Luhya in Webuye, Kenya Maasai in Kinyawa, Kenya

Differences in patterns of LD An experiment: Take genome-wide SNP data collected from a European population (A) Take each SNP and find the SNPs which is most correlated with it (and remember how correlated it is) Go to another European population (B) and compare the correlation between the two SNPs in the new population (Measure correlation as r2)

Differences in patterns of LD Across Europe Within Kenya We will look at this in the practical

Summary Different ancestral histories have led to different patterns of diversity Natural selection can generate strong differences in haplotype patterns Population structure across Africa, and between groups in Africa, will lead to differences in the structure of LD

Genetic drift Allele frequencies change by chance over time

Genetic diversity 180 haplotypes (90 individuals) from Luhya in Webuye, Kenya typed at 6856 SNPs in 10 Mb region on chromosome 20