Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Slides:



Advertisements
Similar presentations
Sampling distributions of alleles under models of neutral evolution.
Advertisements

Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Atelier INSERM – La Londe Les Maures – Mai 2004
Signatures of Selection
Detection of domestication genes and other loci under selection.
Are we still evolving? Mapping sites of selection in the human genome Simon Myers.
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
Molecular evolution:   how do we explain the patterns of variation observed in DNA sequences? how do we detect selection by comparing silent site substitutions.
14 Molecular Evolution and Population Genetics
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Tracing the dispersal of human populations By analysis of polymorphisms in the Non-recombining region of the Human Y Chromosome Underhill et al 2000 Nature.
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Scott Williamson and Carlos Bustamante
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Reseach Training Presentation By Yanhong Zhao Department of Evolutionary Functional Genomics, Uppsala University, Sweden Supervisor: Prof. Ulf Lagercrantz.
KEY CONCEPT A population shares a common gene pool.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Hidenki Innan and Yuseob Kim Pattern of Polymorphism After Strong Artificial Selection in a Domestication Event Hidenki Innan and Yuseob Kim A Summary.
KEY CONCEPT A population shares a common gene pool.
KEY CONCEPT A population shares a common gene pool.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Chapter 11 Biology Textbook
Section 2: Applying Darwin’s Ideas
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
PHYLOGENETICS CONTINUED TESTS BY TUESDAY BECAUSE SOME PROBLEMS WITH SCANTRONS.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Deviations from HWE I. Mutation II. Migration III. Non-Random Mating IV. Genetic Drift A. Sampling Error.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Selectionist view: allele substitution and polymorphism
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Lecture 20 : Tests of Neutrality
NEW TOPIC: MOLECULAR EVOLUTION.
Molecular evolution Part I: The evolution of macromolecules.
Genomics of Adaptation
Can genes help explain our evolution? - What type of changes (regulatory or structural mutations?) - How many genes are involved?
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Quantitative Genetics and Genetic Diversity Bruce Walsh Depts of Ecology & Evol. Biology, Animal Science, Biostatistics, Plant Science Footprints of Diversity.
Single Nucleotide Polymorphisms (SNPs) By Amira Jhelum Rahul Shweta.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Modelling evolution Gil McVean Department of Statistics TC A G.
If we are all the same species (Homo sapien), why don’t we all look the same?
Evolution of populations Ch 21. I. Background  Individuals do not adapt or evolve  Populations adapt and evolve  Microevolution = change in allele.
11.1 Genetic Variation Within Population KEY CONCEPT A population shares a common gene pool.
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
LECTURE 9. Genetic drift In population genetics, genetic drift (or more precisely allelic drift) is the evolutionary process of change in the allele frequencies.
Learning Target: Evolution of Populations Ch – 11. 2, pp
Of Sea Urchins, Birds and Men
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Signatures of Selection
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
The Neutral Theory M. Kimura, 1968
15-2 Mechanisms of Evolution
Detection of the footprint of natural selection in the genome
Testing the Neutral Mutation Hypothesis
Jonathan K. Pritchard, Joseph K. Pickrell, Graham Coop  Current Biology 
Presentation transcript:

Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008

Overview Given a DNA sequences how do we know when natural selection has occurred? Different methods of answering this question How does having the entire genome available change this?

Natural Selection Introduction

Natural Selection Introduction

Natural Selection Introduction

Natural Selection What sort of artifacts would this leave within the genome? Introduction

Natural Selection Introduction The frequency of the long gene increases from one generation to the next. It eventually reaches 100%, or fixation.

Natural Selection Gene Perspective Introduction Same process at the gene level Let the yellow dot represent the advantageous allele It begins at a small frequency (.125 in this case)

Natural Selection Gene Perspective Introduction During selection The allele has risen in frequency! Because of linkage, the nearby alleles have also risen in frequency

Natural Selection Gene Perspective Introduction The allele has reached fixation! As time goes on the nearby genes will slowly begin to reach fixation as well Diversity has been lost

Natural Selection Gene Perspective Introduction Effect of Selection on the Genome Next Challenge: How did this effect differ from non-selection?

Neutral Theory (N.T.) Problem: Need to distinguish natural selection Therefore: Need a null hypothesis Solution: Create model that approximates neutral evolution Introduction Kimura, 1960s

N.T. & Genetic Drift Most variation is neutral with respect to selection Therefore most changes in frequency are due to genetic drift Introduction

N.T. & Genetic Drift A neutral gene has an equal probability of increasing or decreasing in frequency in the next generation Introduction

N.T. & Mutation New alleles are introduced a constant rate (at a particular point) To think about: How will this help us search for selection? Introduction

N.T. & Mutation Introduction

N.T. & Mutation Introduction

N.T. & Mutation Introduction

N.T. & Recombination Recombination occurs at a near- constant rate at a given position Introduction

Testing the N. T. How would natural selection differ from these assumptions? Introduction

“ Positive Natural Selection in the Human Lineage” P. C. Sabeti, S. F. Schaffner, B. Fry, J. Lohmueller, P. Varilly, Shamovsky, A. Palma, T. S. Mikkelsen, D. Altshuler, E. S. Lander

Testing for Selection Sabeti et al. Review of current state of genomic selection Five statistical tests which use divergence from neutral theory to test for selection Ideas? Functional Alteration, Decreased Diversity, High Derived Alleles, Population Differences, Long Haplotypes

Sabeti et al. I. Functional Alteration Get a section of genome, and compare synonymous vs. non-synonymous mutations between two species Definition of synonymous mutation

I. Functional Alteration Sabeti et al. Silent/ Synonymous Non-Synonymous

I. Functional Alteration Sabeti et al. Long time scale, because it is an interspecies metric Limited value--only finds ongoing or recurrent selection Use a Ka/Ks statistical test, or McDonald- Kreitman

II. Decreased Diversity Sabeti et al. Way of detecting a selective sweep Requires you know ancestral gene, derived genes A derived gene is one that is a descendent of the ancestral one-it can be inferred using comparison to others species

II. Decreased Diversity Sabeti et al. The two small bars represent mutations. They are derived genes of the blue ancestor gene.

II. Decreased Diversity Sabeti et al. After the selective sweep the frequency of the derived alleles has jumped vis-a-vis the ancestral gene

II. Decreased Diversity Sabeti et al. A real example: derived alleles in red

II. Decreased Diversity Sabeti et al. Key idea: need to have ancestral genes present The genes must not have reached fixation! The pattern will be that of normal diversity of alleles but with skewed distribution of variation Statistical Tests: Tajima’s D, Fu and Li’s D*

III. New Alleles (AKA High Frequency of Derived Alleles) Another technique for detecting selective sweep Gene ‘hitch-hiking’ Limited diversity because of fixation Key idea: low frequency of new genes, but high diversity of rare alleles Sabeti et al.

III. New Alleles (AKA High Frequency of Derived Alleles) Sabeti et al. Gene has reached fixation Low diversity in this region compared to other regions

III. New Alleles (AKA High Frequency of Derived Alleles) Sabeti et al. Next mutations slowly increase the diversity Because they are all new the frequency remains low

III. New Alleles (AKA High Frequency of Derived Alleles) Sabeti et al. As more time progresses, any pre- selective sweep alleles die out, and diversity is replace by many derived alleles

III. New Alleles (AKA High Frequency of Derived Alleles) Sabeti et al. Real world example: Red dots indicate rare alleles

III. New Alleles (AKA High Frequency of Derived Alleles) Sabeti et al. Key Idea: The genes will have reached fixation and decreased diversity The diversity will all be in the form of rare alleles (because they are new) Statistical Test: Fay and Wu’s H

Comparing Methods The difference between decreased diversity and increased frequency of new alleles? Sabeti et al. Vs.

IV. Population Differences Requires population split Disproportionate shift in gene frequencies Limited utility Sabeti et al.

IV. Population Differences Sabeti et al.

IV. Population Differences Sabeti et al. Tall Tree Island

IV. Population Differences Sabeti et al.

IV. Population Differences Sabeti et al. Two separated populations--specific gene will show disproportionate shift in frequency with respect to the other genes Limited to cases where there are two populations Statistical Test: F(st), P(excess)

V. Long Haplotypes Based on Linkage Disequilibria (LD) Long Haploblock and high frequency Sabeti et al.

V. Long Haplotypes Under neutral conditions, a new allele has low frequency and high linkage disequilibrium Sabeti et al.

V. Long Haplotypes As time goes on and the neutral allele increases in frequency recombination erodes the L.D. Sabeti et al.

V. Long Haplotypes Sabeti et al.

Genome-Wide Scanning Better estimation of background rate Helps to confirm previous studies Suggests future areas of research MORE POWER Sabeti et al.

Genome-Wide Scanning SNP: Single Nucleotide Polymorphisms (excludes other types of mutations) that occur at > 1% frequency SNPs are the basis of many genome wide analyses Sabeti et al.

“Forces Shaping the Fastest Evolving Regions in the Human Genome” K. S. Pollard, S. R. Salama, B. King, A. D. Kern, T. Dreszer, S. Katzman, A. Siepel, J. S. Pedersen, G. Bejerano, R. Baertsch, K. R. Rosenbloom, J. Kent, D. Haussler

Background Exploits the very recent sequencing of the chimp and human genome Uses the rate of allele replacement as test for selection Assumption is that highly changing parts of the genome have been under selective pressure Pollard et al.

Idea Take chimp and mouse genome, find common regions Compare these regions to human genome Pollard et al.

Method Part I First half: Find conserved regions. Use sequence tests to look for regions of 100bp with 96% similarity Pollard et al.

Results Part I

Conclusion: These areas represent genes with deep functionality

Method Part II Pollard et al. Search human genome for conserved regions

Method Part II Pollard et al. For every region that doesn’t match up, label Human Accelerated Region

Formal Description Pollard et al.

Results Part II Found 202 Human Accelerated Regions in total These were regions where there had been rapid evolution in the past 5 million years But evolution doesn’t mean selection Pollard et al.

Possible Explanations Relaxation of negative selection -- ruled out because the rate of neutral evolution is slower for 201/202 HARs Natural selection Sudden change in mutation rate Pollard et al.

But was it Selection? Pollard et al.

A Digression Biased Gene Conversion: Tendency to replace misaligned nucleotides with GC In all but two of the HARs there was no evidence of a selective sweep but significant evidence of GC favored replacement Pollard et al.

A Digression New Paper suggests BGC hotspots change for species Conserved areas may suddenly become a BGC hotspot, explaining the HAR’s high BGC rates Adaptation or biased gene conversion: Extending the null hypothesis of molecular evolution, Galtier & Duret 2007 Pollard et al.

General Implications Illustrates utility of genome wide approached-- by using the full genome to establish a background rate, signals stand out of noise Weaknesses: approach did not take into account failure to meet the assumption of neutral theory (mutation rate) Pollard et al.

“Global Landscape of Recent Inferred Darwinian Selection for Homo Sapiens” E. Wang, G. Kodama, P. Baldi, and R. K. Moyzis

Background Ever growing catalog of SNPs for human populations SNP data can be used to construct haplotype maps Can screen whole genome for haplotype outlier Wang et al.

Idea Take only homozygotes Bin the alleles together Calculate the L.D. for each allele Wang et al.

Idea Wang et al.

Formalized Description Wang et al.

Description of the Formalized Description Wang et al. Expected decay of LD for a allele of a specific frequency

Description of the Formalized Description Wang et al.

Description of the Formalized Description Wang et al. Selective sweep will be more resistant to decay

Description of the Formalized Description Wang et al. Normalize with respect to the sigmoidal curve

Advantages of Method By using the whole genome can track not only for L. D. but the exponential decay of L.D. over distance. This helps to distinguish selective sweeps from other demographic shifts such as bottlenecks Wang et al.

Results Wang et al.

Results Wang et al. “Darwin’s Fingerprint”: Using different datasets from different populations, certain areas show consistent evidence of selection

Discussion Wang et al. Compare regions to known gene functions Six groups predominate Test was well designed Limited detection: Genes cant be at fixation

Overall Conclusions It all comes down to statistics. What are the null assumptions? What are the alternate assumptions? Genome-wide scans improve by allowing us to exploit this elegant statistical method in new ways Improved data for null hypothesis Increased volume to potential candidates Wang et al.

Thank You!