CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner www.cse.ucsd.edu/classes/sp05/cse291.

Slides:



Advertisements
Similar presentations
The Evolution Of Populations
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Alleles = A, a Genotypes = AA, Aa, aa
Sampling distributions of alleles under models of neutral evolution.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
THE EVOLUTION OF POPULATIONS
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Sickle Cell Anemia.
14 Molecular Evolution and Population Genetics
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Session VII Basic Biological Principles Previous sessions covered fundamental concepts of biology, genetics and molecular biology of genes in individual.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Population Genetics: Populations change in genetic characteristics over time Ways to measure change: Allele frequency change (B and b) Genotype frequency.
CSE182-L18 Population Genetics. Perfect Phylogeny Assume an evolutionary model in which no recombination takes place, only mutation. The evolutionary.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
CSE182-L17 Clustering Population Genetics: Basics.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Chapter 23 Population Genetics © John Wiley & Sons, Inc.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Population Genetics Learning Objectives
Broad-Sense Heritability Index
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
14 Population Genetics and Evolution. Population Genetics Population genetics involves the application of genetic principles to entire populations of.
Chapter 23 Notes The Evolution of Populations. Concept 23.1 Darwin and Mendel were contemporaries of the 19 th century - at the time both were unappreciated.
PowerPoint Slides for Chapter 16: Variation and Population Genetics Section 16.2: How can population genetic information be used to predict evolution?
E QUILIBRIA IN POPULATIONS CSE280Vineet Bafna Population data Recall that we often study a population in the form of a SNP matrix – Rows.
Genetic Linkage. Two pops may have the same allele frequencies but different chromosome frequencies.
CS177 Lecture 10 SNPs and Human Genetic Variation
CSE280Vineet Bafna CSE280a: Algorithmic topics in bioinformatics Vineet Bafna.
E QUILIBRIA IN POPULATIONS CSE280Vineet Bafna Population data Recall that we often study a population in the form of a SNP matrix – Rows.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Copyright © 2008 Pearson Education Inc., publishing as Pearson Benjamin Cummings Chapter 23 The Evolution of Populations.
Great Dane x Mexican Chihuahua F1 Big (Great Danes) 3 Big : 1 Small.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Population and Evolutionary Genetics
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Populations: defining and identifying. Two major paradigms for defining populations Ecological paradigm A group of individuals of the same species that.
Vineet Bafna CSE280A CSE280Vineet Bafna. We will cover topics from Population Genetics. The focus will be on the use of algorithms for analyzing genetic.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
1.Stream A and Stream B are located on two isolated islands with similar characteristics. How do these two stream beds differ? 2.Suppose a fish that varies.
Objective: Chapter 23. Population geneticists measure polymorphisms in a population by determining the amount of heterozygosity at the gene and molecular.
CSE280Vineet Bafna In a ‘stable’ population, the distribution of alleles obeys certain laws – Not really, and the deviations are interesting HW Equilibrium.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Evolution of Populations. Individual organisms do not evolve. This is a misconception. While natural selection acts on individuals, evolution is only.
Evolution of Populations
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
1,3, ,
Population Genetics Measuring Evolutionary Change Over Time.
Lecture 3 - Concepts of Marine Ecology and Evolution II 3) Detecting evolution: HW Equilibrium Principle -Calculating allele frequencies, predicting genotypes.
Equilibria in populations
Common variation, GWAS & PLINK
MULTIPLE GENES AND QUANTITATIVE TRAITS
Evolution and Populations –Essential Questions p
Population Genetics.
L4: Counting Recombination events
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
MULTIPLE GENES AND QUANTITATIVE TRAITS
Basic concepts on population genetics
The Evolution of Populations
Vineet Bafna/Pavel Pevzner
Outline Cancer Progression Models
The Evolution of Populations
Presentation transcript:

CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner

Topics  Population Genetics  Genome Duplication Problem  Molecular Evolution  Student Presentations  Critical overview of a field  Research Projects

Population Genetics  Individuals in a species (population) are phenotypically different.  Often these differences are inherited (genetic).  Studying these differences is important!  Q:How predictive are these differences?

Population Structure  377 locations (loci) were sampled in 1000 people from 52 populations.  6 genetic clusters were obtained, which corresponded to 5 geographic regions (Rosenberg et al. Science 2003)  Genetic differences can predict ethnicity. Africa EurasiaEast Asia America Oceania

Scope of these lectures  Basic terminology  Key principles  HW equilibrium  Sources of variation  Linkage  Coalescent theory  Recombination/Ancestral Recombination Graph  Haplotypes/Haplotype phasing  Population sub-structure  Medical genetics basis: Association mapping/pedigree analysis

Alleles  Genotype: genetic makeup of an individual  Allele: A specific variant at a location  The notion of alleles predates the concept of gene, and DNA.  Initially, alleles referred to variants that described a measurable phenotype (round/wrinkled seed)  Now, an allele might be a nucleotide on a chromosome, with no measurable phenotype.  Humans are diploid, they have 2 copies of each chromosome.  They may have heterozygosity/homozygosity at a location  Other organisms (plants) have higher forms of ploidy.  Additionally, some sites might have 2 allelic forms, or even many allelic forms.

Hardy Weinberg equilibrium  Consider a locus with 2 alleles, A, a  p (respectively, q) is the frequency of A (resp. a) in the population  3 Genotypes: AA, Aa, aa  Q: What is the frequency of each genotype If various assumptions are satisfied, (such as random mating, no natural selection), Then P AA =p 2 P Aa =2pq P aa =q 2

Hardy Weinberg: why?  Assumptions:  Diploid  Sexual reproduction  Random mating  Bi-allelic sites  Large population size, …  Why? Each individual randomly picks his two chromosomes. Therefore, Prob. (Aa) = pq+qp = 2pq, and so on.

Hardy Weinberg: Generalizations  Multiple alleles with frequencies  By HW,  Multiple loci?

Hardy Weinberg: Implications  The allele frequency does not change from generation to generation. Why?  It is observed that 1 in 10,000 caucasians have the disease phenylketonuria. The disease mutation(s) are all recessive. What fraction of the population carries the disease?  Males are 100 times more likely to have the “red’ type of color blindness than females. Why?  Conclusion: While the HW assumptions are rarely satisfied, the principle is still important as a baseline assumption, and significant deviations are interesting.

What causes variation in a population?  Mutations (may lead to SNPs)  Recombinations  Other genetic events (gene conversion)

Single Nucleotide Polymorphisms Infinite Sites Assumption: Each site mutates at most once

Short Tandem Repeats GCTAGATCATCATCATCATTGCTAG GCTAGATCATCATCATTGCTAGTTA GCTAGATCATCATCATCATCATTGC GCTAGATCATCATCATTGCTAGTTA GCTAGATCATCATCATCATCATTGC

STR can be used as a DNA fingerprint  Consider a collection of regions with variable length repeats.  Variable length repeats will lead to variable length DNA  Vector of lengths is a finger-print loci individuals

Recombination

What if there were no recombinations?  Life would be simpler  Each sequence would have a single parent  The relationship is expressed as a tree.

The Infinite Sites Assumption The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa. Some phenotypes could be linked to the polymorphisms Some of the linkage is “destroyed” by recombination

Infinite sites assumption and Perfect Phylogeny  Each site is mutated at most once in the history.  All descendants must carry the mutated value, and all others must carry the ancestral value i 1 in position i 0 in position i

Perfect Phylogeny  Assume an evolutionary model in which no recombination takes place, only mutation.  The evolutionary history is explained by a tree in which every mutation is on an edge of the tree. All the species in one sub-tree contain a 0, and all species in the other contain a 1. Such a tree is called a perfect phylogeny.  How can one reconstruct such a tree?

The 4-gamete condition  A column i partitions the set of species into two sets i 0, and i 1  A column is homogeneous w.r.t a set of species, if it has the same value for all species. Otherwise, it is heterogenous.  EX: i is heterogenous w.r.t {A,D,E} i A 0 B 0 C 0 D 1 E 1 F 1 i0i0 i1i1

4 Gamete Condition  4 Gamete Condition  There exists a perfect phylogeny if and only if for all pair of columns (i,j), either j is not heterogenous w.r.t i 0, or i 1.  Equivalent to  There exists a perfect phylogeny if and only if for all pairs of columns (i,j), the following 4 rows do not exist (0,0), (0,1), (1,0), (1,1)

4-gamete condition: proof  Depending on which edge the mutation j occurs, either i 0, or i 1 should be homogenous.  (only if) Every perfect phylogeny satisfies the 4- gamete condition  (if) If the 4-gamete condition is satisfied, does a prefect phylogeny exist? i0i0 i1i1 i

Handling recombination  A tree is not sufficient as a sequence may have 2 parents  Recombination leads to loss of correlation between columns

Linkage (Dis)-equilibrium (LD)  Consider sites A &B  Case 1: No recombination  Pr[A,B=0,1] = 0.25  Linkage disequilibrium  Case 2: Extensive recombination  Pr[A,B=(0,1)=0.125  Linkage equilibrium AB AB

Measures of LD  Consider two bi-allelic sites with alleles marked with 0 and 1  Define  P 00 = Pr[Allele 0 in locus 1, and 0 in locus 2]  P 0* = Pr[Allele 0 in locus 1]  Linkage equilibrium if P 00 = P 0* P *0  D = abs(P 00 - P 0* P *0 ) = abs(P 01 - P 0* P *1 ) = …

LD over time  With random mating, and fixed recombination rate r between the sites, Linkage Disequilibrium will disappear  Let D (t) = LD at time t  P (t) 00 = (1-r) P (t-1) 00 + r P (t-1) 0* P (t-1) *0  D (t) = P (t) 00 - P (t) 0* P (t) *0 = P (t) 00 - P (t-1) 0* P (t-1) *0 (HW)  D (t) =(1-r) D (t-1) =(1-r) t D (0)

LD over distance  Assumption  Recombination rate increases linearly with distance  LD decays exponentially with distance.  The assumption is reasonable, but recombination rates vary from region to region, adding to complexity  This simple fact is the basis of disease association mapping.

LD and disease mapping  Consider a mutation that is causal for a disease.  The goal of disease gene mapping is to discover which gene (locus) carries the mutation.  Consider every polymorphism, and check:  There might be too many polymorphisms  Multiple mutations (even at a single locus) that lead to the same disease  Instead, consider a dense sample of polymorphisms that span the genome

LD can be used to map disease genes  LD decays with distance from the disease allele.  By plotting LD, one can short list the region containing the disease gene DNNDDNDNNDDN LD

LD and disease gene mapping problems  Marker density?  Complex diseases  Population sub-structure