Genomics in Tree Breeding and Forest Ecosystem Management ----- Module 3 – Population Genetics Our third module is about genes in populations. Some would argue the Father of Population Genetics was Wilhelm Weinberg, a German physician who seemed equally comfortable with mathematics. You might recognize the name in association with the “Hardy-Weinberg Principle”. Mendel actually made the first statement of a population’s genetic constitution under a system of mating (genotypic proportions following inbreeding). The Hardy-Weinberg “law” is an application of the same principle to cross breeding populations mating at random! We will discuss it at some length in this module. However, the real beginnings of the discipline might well be attributed to some early plant and animal breeders. In the 1700s folks like Fairchild (England), Mather (US), and Koreuter (Germany) published on plant breeding (artificial hybridization, backcrosses and F2s) and Bakewell (England) laid the groundwork for many modern breeds of livestock, revealing the value of inbreeding to fix traits. Darwin (1859) and Mendel (1865) were influenced by these pioneer breeders. Darwin reflected on the effectiveness of artificial selection and extended the concept to natural selection. Mendel concentrated on the nature of hereditary determination. From these foundations were formed two schools of study: The quantitative or biometric school, led by Galton and Pearson, and those who supported the Inbreeding/crossbreeding principles of Mendel (Bateson and Castle). The differences between the groups were largely over the size of effects that were evolutionarily important. The Darwinians felt that small changes over great lengths of time were the essence of evolution. The breeders felt simple changes (allelic forms) could result in huge and immediate differences. Population genetics developed the means by which evolutionary forces could be modeled and described. Nicholas Wheeler & David Harry – Oregon State University
Population genetics Population genetics is the study of genetic differences within and among populations of individuals, and how these differences change across generations In the classic view, it is the study of the amount and distribution of genetic variation in populations and species, and how it got that way Population genetics describes the mechanics of how evolution takes place In the narrow sense, population genetics is a simple extension of Mendelism. It is the study of simply inherited traits, and how allele and genotype frequencies vary. As the discipline has evolved, it has taken on broader perspective. We can not talk about complex traits in natural or artificial populations without introducing quantitative genetic concepts. For our treatment, we will try to keep them separate. The development of population genetic theory rested largely on the shoulders of three giants in the field: R. A. Fisher, Sewell Wright, and J.B.S. Haldane. Ronald Fisher published the first of 28 papers on population genetics in 1918, with a paper that introduced the concept of the ANOVA. It was titled: The Correlation Between Relatives on the Supposition of Mendelian Inheritance. Sewell Wright first published on guinea pigs in 1908, but produced generalized theories of inbreeding and crossbreeding, path coefficients and much more into the 60’s. In 1924 John Haldane worked out the theory of selection by considering a single gene subject to selection and mutation. These pioneers were closely followed by a cadre brilliant scientist who drove the modern synthesis of genetics, populations genetics and evolutionary theory, including the likes of Julian Huxley, Ernst Mayer, G. Ledyard Stebbins, George Gaylord Simpson, and perhaps most influential, Theodosius Dobzhansky. However, it wasn’t until the 1970’s, with the discovery of plentiful genetic markers, that the science of population genetics moved from being largely theoretical to emphatically empirical, a trend that continues today. Photo Credit: http://www.eco-pros.com/biodiversity-genetic.htm
Why study genes in populations? In natural populations: Adaptation – the ability to survive and exploit an environmental niche – involves the response of populations, not individuals In breeding populations: Genetic gain – improving the average performance of populations for desired breeding objectives – depends on selecting and breeding parents with the best genetic potential The question that drove naturalists such as Charles Darwin, Alfred Russell Wallace, and Henry Bates, in the mid-1800’s, was “How do species come into existence?” Population genetic theory provided the answer. The concept of divergence describes the accumulated genetic differences between populations that lead to isolation and speciation. For plant breeders, the mission is to increase the frequencies of favorable alleles at relevant loci in select populations. 3
Population genetics addresses many topics How genetically diverse is a species or population? Contrast diversity in populations that differ in life-history traits, pop size, breeding structure, etc Are different populations closely related to one another? Monitor diversity for conservation purposes What is the potential for inbreeding depression? What is the minimum viable population size from a genetic standpoint? How is genetic variation maintained? Which genes/alleles are responsible for phenotypic variation? How are species related (phylogenetics) and how did they acquire their current distribution (biogeography)? Population genetics provides empirical models to predict genetic behavior of organisms. For instance, with known allele frequencies we can predict what genotypes are present in a population and at what frequencies. We can ask questions such as: Are all genotypes equally likely to survive and reproduce? Are mating frequencies independent of genotype? Is the population stratified in some way, (e.g. by proximity, size, or the timing of natural events?) To what extent does mating occur with individuals outside the immediate area? 4
What do population geneticists typically measure? Populations are groups of individuals whose relatedness and population structure are usually unknown. For single genes we are typically interested in the number and frequency of alleles that exist in a population, and the related genotypic frequencies of individuals. From the latter, we can calculate observed heterozygosity, or the proportion of individuals in a population that have 2 different alleles at a given locus. Loci that have more than one allele are considered polymorphic. Populations may be described by the proportion of loci that are polymorphic. Of course, this measure is a bit of a moving target and depends on many things like 1) how you define a locus, 2) which loci you measure, 3) size of population sampled, 4) definition of what constitutes a polymorphic locus (recall earlier discussion about frequency of most common allele) and so on. Image Credit: Glenn Howe, Oregon State University
The Hardy-Weinberg Principle The frequencies of alleles and genotypes in a population will remain constant over time (given certain assumptions which describe a static, or non-evolving population) The frequencies of alleles and genotypes can be described mathematically, where p and q are the frequencies of the alleles A1 and A2 The HW principle is an elegant little equation that tells us a great deal. For instance, in a two allele system, given the frequency of one of those alleles we can predict the frequencies of all three genotypes (the two homozygotes and the heterozygote) in a population. We can use allele frequencies to provide a quantitative measure of variation among populations. We can calculate an expected heterozygosity (He) for a locus. The model is easily extended to 3 or more alleles by simply adding additional terms such as described above for each allele and genotypic class. Expected heterozgosity in such a condition is simply the sum of frequencies for all heterozygous classes.
Random mating restores HW proportions each generation You should convince yourself of the HW model by working through some of the algebra. It may help to remember that in a two allele system (A1 and A2), the frequency of the two alleles may be described as p and q, or p and 1-p. So the HW Equilibrium model describes how the frequencies of allele and genotypes will behave in populations, from generation to generation, in the absence of several evolutionary forces. Why is it important to understand gene/allele freuencies? As Hartl notes in his “Primer of Population Genetics” (2000), a very practical application of the HWP is that of DNA typing for forensic purposes. How can forensic scientists be so sure they have identified someone unambiguously based on DNA evidence? Let’s just do a little experiment. Take a piece of paper and follow along. (Pause). Human forensic scientists use genetic markers that have many alleles, often 20 or more, and they use a panel of 13 loci. Let’s try something less overwhelming. Let’s assume we have 5 loci, each with 5 alleles, all of which occur at the same frequency of 0.20. At any one locus, for any given allele, the probability of being a homozygote is 0.22, or 0.04 (4 percent). The probability of any given heterozygote at any given locus is 2pq or 2*0.2*0.2, : 8%. You should convince yourself that there are 10 heterozygous genotypes and 5 homozygous genotypes at any given locus which should add up to 100 percent. Now, ask what the probability is that two individuals would have the same genotype at 5 different loci. If we assume both individuals are heterozygous at all 5 loci, then the answer should be 8% to the fifth power (that is, 8%*8% *8% done 5 times). The result is 3.27 times 10 to the minus 6, or the two individuals would share the same 5 locus genotype 3.2 times in a million. Rather unlikely. The probability declines even more if you throw in a homozygote or two. With 13 loci, each with 20 or more alleles, the probabilities become vanishingly small. Image Credit: White et al. 2007, Forest Genetics Fig. 5.1 7
HW equilibrium conditions For Hardy-Weinberg equilibrium to exist, a number of assumptions must be met. For instance, the population under consideration must Be random mating (translation = all possible pairings of mates are equally likely) Be infinitely large (translation = sampling with replacement) Have no selection (which biases genotype frequencies) Have no migration (since all alleles must be sampled from the same pool) Have no mutation (which introduces new variants) Obviously, such “ideal” populations rarely (if ever) exist Still, minor violations of assumptions generally have little impact The conditions necessary to meet HW equilibrium are a reflection of those evolutionary forces that change allele frequencies: selection, migration, mutation, and drift. Equally important is the issue of random mating among individuals in a population. In the real world, the conditions for HW equilibrium seldom exist, but in most natural populations they are generally rather closely estimated. Consequently, HW is often found not to be violated. Such is not the case in breeding populations where population sizes can be small, individuals chosen for breeding may represent a subset of relatives, and matings are typically non-random. In fact, the goal of breeding programs is to change allele frequencies, so HW should be violated if the program is successful. 8
HW : Non-random mating When individual genotypes do not mate randomly, HW equilibrium proportions are not observed among the offspring We’ll look at two kinds of non-random mating Population substructure/admixture Inbreeding (mating among related individuals) For most of the remainder of this model we will be discussing the factors that cause deviation from HW equilibrium conditions. Having a clear understanding of these factors is essential to interpreting observed conditions in natural and artificial populations and in prescribing management strategies to meet targets, whether they be for genetic conservation planning or breeding for disease resistance. We begin by discussing violations of random mating requirements. This is particularly relevant to those seeking to conduct meaningful association genetics tests, for it has been found that population substructure can significantly bias findings. 9
HW : Population admixture Consider mixing individuals from non-interbreeding subpopulations (e.g. alligator lizards from Washington and Idaho) Even if each subpopulation is in HW, the admixed group is not (p1 ≠ p2) The admixed group will appear to have too many homozygotes This situation is called the Wahlund effect Image Credit: Hartl, 2000, Fig. 2.6 Admixture of two or more HWE populations with differing allele frequencies produces a mixed population that has a deficiency of heterozygous genotypes relative to the frequency expected with HWE for a single population with the average allele frequencies of the two independent populations. Conversely, there will be too many homozygous genotypes. This phenomenon is known as the Wahlund effect or principle. You can convince yourself by simply creating two populations with different allele frequencies for a single locus and calculating HWE values for each, and for a hypothetical population with the average gene frequencies of the two populations. 10
Population structure: Wahlund’s effect Wahlund’s effect: As long as allele frequencies vary among subpopulations, even if each subpopulation exhibits HW proportions, then more homozygotes will be observed than would be expected based on the allele frequency of the metapopulation The relative increase in homozygosity is proportional to the variance in allele frequencies among subpopulations, as measured by F (where 0 ≤ F ≤ 1) F is commonly known as Wright’s fixation index and may be most simply interpreted as F = 1 – (Hobs / Hexp ), where the values represent observed and expected levels of heterozygosity As Hartl notes in his text, the Wahlund effect provides a rather interesting paradox in population genetics (page 73; third edition). “..Inbreeding exists in the metapopulation composed of the aggregate of subpopulations, even though each subpopulation itself is undergoing random mating and is in HWE. The reason for the paradox is that the population as a whole is not undergoing random mating. There is remote inbreeding because matings occur only within subpopulations, and because these subpopulations are finite in size, relatively speaking, the level of inbreeding as measured by Fst gradually builds up.” The smaller and more isolated the subpopulations, the higher the Fst value. The result is a hierarchical population structure. FST is simply the correlation of randomly chosen alleles within the same sub-population relative to that found in the entire population. It is often expressed as the proportion of genetic diversity, due to allele frequency differences, found among populations. In wide-ranging wind-pollinated conifer species, the fixation index is often quite small (less than 5%) while inbreeding species may sport FST values in excess of 50%. A value of 0 implies identical allele frequencies in subpopulations while the upper limit of 1 implies subpopulations share no common alleles for that locus (Hamrick and Godt, 1990).
Inbreeding Inbreeding (mating among relatives) increases homozygosity relative to HW Rate is proportional to degree of relationship Distant cousin < first cousin < half-sib < full-sib < self Recurrent inbreeding leads to a build-up of homozygosity, and a corresponding reduction in heterozygosity Inbreeding affects genotype frequences, but not allele frequencies How does inbreeding affect deleterious recessive alleles? The Wahlund effect reflects the fact that within finite populations, individuals will share alleles that are identical by descent (called IBD) simply by chance, regardless of whether the individuals are “closely” related or not. This remote type of inbreeding is distinguished from close inbreeding that might be defined by matings among cousins or sibs, or even selfing. Close inbreeding drives populations toward homozygosity quite rapidly. The more closely related, the faster the process, as is shown in the following slide. Interestingly, inbreeding does not alter allele frequencies in the absence of selection. All bets are off if there is strong selection against an allele as is the case with deleterious recessive alleles – in that case, the alleles are quickly lost from the population. 12
Inbreeding and homozygosity F reflects a proportional reduction in heterozygosity, and a build-up of genetic relatedness. HW implies F = 0. With recurrent selfing, F goes to 1 Figure Credit: White et al. 2007, Forest Genetics Fig. 5.6 The inbreeding coefficient, F, represents a proportional reduction in heterozygosity. The inbreeding coefficient F is not equivalent to Fst , which is a measure of genetic divergence between sub-populatons. This figure, borrowed from the White et al. text entitled “Forest Genetics” demonstrates the rate at which heterozygosity is lost, and homozygosity is gained, with successive generations of equivalent inbreeding. For instance, under the straight selfing model, as might be imposed in corn breeding, it takes about 6 generations to reach complete homozygosity at any given locus. 13
Inbreeding depression Inbreeding often leads to reduced vitality (growth, fitness) Deleterious recessive alleles are made homozygous Outcrossing species are more likely to suffer higher inbreeding depression Image Credit: White et al. 2007, Forest Genetics. Fig. 5.7 For outcrossing tree species, such as those typically found in forest conditions, inbreeding is generally a bad thing. It leads to something called inbreeding depression, which is typically manifest in reduced survival, growth, and adaptation to heterogeneous environments. In the photo shown above, again taken from the White et al text, the row of pitiful trees in the middle represent a group of progeny derived from a selfing of one individual while rows on either side are progeny of out-crossed parents. At age 33, the survival of this Douglas-fir selfed family was only 39% that of the outcrossed offspring, clearly a reflection that this selfed parent carried a number of deleterious or even lethal alleles that were protected in the heterozygous condition, but exposed in the homozygotes. A few attempts have been made in conifers to conduct several cycles of inbreeding followed by outcrossing in hopes of exhibiting hybrid vigor, such as that found in corn, but with little, if any, success demonstrated. The genetic load in most trees is simply too great for that to be effective. Conversely, there are a few tree species such as Torrey and Red Pines, that are naturally found to possess very little genetic variability of any kind. Presumably, they tolerate inbreeding quite well. The interpretation here is that these species survived a very restrictive bottleneck event which we shall describe shortly. 14
Evolutionary forces change allele frequencies Mutation a random heritable change in the genetic material (DNA) – ultimate source of all new alleles Migration (gene flow) the introduction of new alleles into a population via seeds, pollen, or vegetative propagules Random genetic drift the random process whereby some alleles are not included in the next generation by chance alone Natural selection the differential, non-random reproductive success of individuals that differ in hereditary characteristics We saw previously that HW conditions are met if there are no significant evolutionary forces at play. In the real world such a static condition would likely never be found, though possibly closely approximated. There are four major evolutionary forces that can change allele frequencies in populations: mutation, migration, drift and selection. We will discuss each of these individually in the following slides. 15
Mutation Mutations are the ultimate source of genetic variation on which other evolutionary forces act (e.g., natural selection) Mutations at any one locus are rare, but with sufficient time, cumulative effects can be large Heritable changes in DNA sequence alter allele frequencies as new alleles are formed Effects on populations – Mutations promote differentiation (but effects are gradual in the absence of other evolutionary forces) Genetic variation begins with mutation. By definition, a mutation is a heritable change in the DNA sequence of an organism. Mutations may lead to changes in allele frequencies, though it is not a given they will do so. Let’s talk about mutations just a bit. Mutations may occur in somatic cells; in which case they are not passed on to the next generation but do affect new daughter cells, or they occur in germ cells and are passed along to the progeny. Though somatic mutations are important, leading to some types of cancer for instance, they are of little evolutionary consequence. Mutations of any type are rare: for any given locus the rate of mutation may be 1 in a thousand to 1 in 100 million per generation, but with sufficient time and population size, the cumulative effects can be large. Mutations are often found to occur in hot spots in the genome. We know the accumulation of SNPs in loblolly pine vary greatly among genes; some have virtually no SNPs over the expressed sequence region while others may have one every 50 bases. Many mutations appear to have no effect at all, occurring in intergenic regions or at redundant codon sites (3rd position), for instance. Most mutations are likely to be deleterious, and are quickly lost from the population or are retained at very low frequencies, usually in the heterozygous condition. However, mutations that confer an advantage in some manner, given luck, can establish in population, increase in frequency, and slowly spread throughout the population. 16
Gene flow: Migration of alleles Gene flow – the movement of alleles among populations Movement may occur by individuals (via seed) or gametes (via pollen) between populations Effects on populations – gene flow hinders differentiation. It is a cohesive force which tends to bind populations together The movement of alleles among populations of organisms is referred to as gene flow. It is the process by which new mutations that have successfully established in a subpopulation are distributed more widely in the species. For most forest trees we typically think of gene flow as the movement of pollen or seed across the spatial landscape, though for some species that commonly propagate vegetatively, such as poplar or willow, gene flow may occur by broken branches floating down a stream and embedding in the bank to root and grow. High rates of gene flow among populations hinders differentiation and encourages homogeneity. Such is typically the case in most widespread, wind-pollinated conifers in the Northern Hemisphere, though for some species, with few, disjunct populations such as foxtail pine, gene flow may be severely hindered. Low migration encourages divergence among populations – a condition that leads to speciation, ultimately. Image Credit: Glenn Howe, Oregon State University
Migration rates Modest migration rates will prevent divergence of populations The absolute number of migrants per generation affects Fst, the fixation index, independent of subpopulation size Surprisingly, little migration is required to keep populations from diverging. The absolute number of migrants (mN) per generation need be little more than two to keep variation among populations very low. Severely restricted gene flow, however, quickly leads to population divergence resulting from Wahlund’s effect and something called random genetic drift, our next topic. Figure Credit: Hartl, 2000, Figure 2.5
Genetic drift Drift reflects sampling in small populations Subgroups follow independent paths Allele frequencies vary among subgroups Frequencies in the metapopulation remain relatively stable How does F behave? Image Credit: Hartl & Jones, 2001, Fig. 17.29 Genetic drift refers to the changes in gene frequency that occur by chance alone due to sampling in small populations. The two figures shown here help illustrate the meaning of drift. In Figure 17.29a you can follow 12 lines, each of which represents a hypothetical population of 8 diploid individuals that start the experiment with equal numbers of A and a alleles at a single locus. By chance alone, these subpopulations tend to fixation for one allele or the other (4 subpopulations for each allele in this experiment). A peek at figure B suggests that the overall metapopulation has not changed significantly in allele frequency despite the huge fluctuations in individual subpopulations. Obviously, drift will lead to an increased fixation index in most subpopulations and divergence among them. 19
Random genetic drifts: Bottlenecks Bottleneck effect: A type of genetic drift that occurs when a population is severely reduced in size such that the surviving population is no longer genetically representative of the original population Effects on populations – Drift promotes differentiation S. Wright effect? Gullick! The bottleneck effect is an extreme example of drift, such that the founding population no longer genetically represents the original population. Consider situations where relatively rare, endemic populations of some organism (let’s use the New Zealand black stork as an example) are caught in cyclonic storm while migrating. Only 10 birds survive. Allele frequencies of some loci will be dramatically affected right off, and all loci will subsequently suffer from inbreeding even if the population rebounds. Another special case of drift is something called the Founder Effect. Imagine a tropical island, void of coconut palms until two coconuts are brought to the island by Polynesians in canoes. Those two seed may have as many as 4 alleles at a given locus, but in all probability, they may only have one or two. Some alleles may simply not be present. The bottleneck condition is sometimes referred to as the Sewell Wright Effect, because of his treatment of the subject in his writings in the 30’s. It may well have been called the Gullick effect. Gullick was an American missionary in the Pacific islands (1832-1923) who studied land snails in different deep valleys that had seemingly random differentiation among races. Vernon Kellogg published a paper in 1907 detailing the Gullick work. Sewell Wright read the paper in 1910 and later developed the concept in an important 1931 publication. Bottleneck and Founder effects have resulted in many interesting stories in human genetics.
Natural selection Natural selection First proposed by Charles Darwin in mid- 1800’s. The differential reproductive success of individuals that differ in hereditary characteristics Not all offspring survive and reproduce Some individuals produce more offspring than others (mortality, disease, bad luck, etc) Offspring differ in hereditary characteristics affecting their survival (genotype and reproduction are correlated) Individuals that reproduce pass along their hereditary characteristics to the next generation Favorable characteristics become more frequent in successive generations Effects on populations: Promotes differentiation between populations that inhabit dissimilar environments Hinders differentiation between populations that inhabit similar environments We conclude our discussion of evolutionary forces with several slides on natural selection. Charles Darwin is widely credited with developing the concept of natural selection, as put forth in his book entitled “On the Origin of Species” in 1859. This concept was formulated over 20 years following his famous sea voyages around the globe. He was greatly influenced by the work of plant and animal breeders and their successes with artificial selection. He was also influenced by the theory of uniformitarianism in geology which promoted the idea that simple, weak forces could act continuously over long periods of time to produce radical changes in the Earth's landscape. And of course, he was not alone in conceiving this process. Others with whom he was familiar, most notably Alfred Russell Wallace, were developing their own versions of the concept. The essential elements of the theory of natural selection are noted in this slide. Interestingly, and unlike the other forces which tend to influence differentiation between or among populations in one direction only, selection may act in bidirectionally. 21
Relative fitness: Key considerations Which genotype has the largest relative fitness? Determines the direction in which allele frequencies will change Are fitness differences large or small? Determines rate of change over generations – fast or slow What is the fitness of the heterozygote compared to either homozygote? Reflects dominance Complete (heterozygote identical to either homozygote) No dominance (additive, heterozygote is intermediate) Partial (heterozygote more closely resembles one homozygote) Dominance influences how selection “sees” heterozygotes Affects rate of change across generations A central concept in natural selection is that of “fitness”. It describes the capability of an individual of certain genotype to reproduce, and usually is equal to the proportion of the individual's genes in all the genes of the next generation. If a specific genotype confers an advantage, either in survival or reproductive capacity, it is likely to increase in frequency in the population. Maynard Smith defined fitness in the following way: "Fitness is a property, not of an individual, but of a class of individuals”. A class would be defined as those individuals with a given genotype. Thus the phrase ’expected number of offspring’ means the average number of progeny produced by a given class, not the number produced by some one individual. Fitness may be discussed in terms of absolute or relative fitness. Absolute fitness (wabs) of a genotype is defined as the ratio between the number of individuals with that genotype after selection to those before selection. It is calculated for a single generation and may be calculated from absolute numbers or from frequencies. When the fitness is larger than 1.0, the genotype increases in frequency; a ratio smaller than 1.0 indicates a decrease in frequency. Relative fitness is quantified as the average number of surviving progeny of a particular genotype compared with the average number of surviving progeny of competing genotypes after a single generation. The fitness of the heterozygote relative to that of either homozygote describes the concept of dominance (gene action). Thus, we define gene action as being additive, dominant, or partially dominant – important concepts in that dominance influences how selection sees genotypic effects. 22
Gene action: Additive vs. dominance This slide gives a nice visual illustration of how gene action is related to fitness. Genotype fitness is defined using two parameters: the selection coefficient, noted here as (s), and the degree of dominance, noted here as h. In this illustration, allele 1 is assumed to have a relative fitness of 1 (except in the case of overdominance). For the purely additive gene action, the A2A2 homozygote has a selection coefficient of s against it, so its relative fitness is 1-s. The heterozygote falls halfway between the homozygotes. In the case of partial dominance, the heterozygote may slide anywhere along that line depending on h. For complete dominance, there is no selective advantage to being homozygous or heterozygous for allele 1. The overdominance situation is interesting in that the heterozygote is more fit than either homozygote, each of which may have different selection coefficients. Image Credit: Falconer and Mackay, 1996 Quantitative Genetics (Fig. 2.1)
Dominance and rate of change This illustration, borrowed from Hartl (2000) shows how quickly a new mutation, with a selective advantage (relative fitness of 1 for the homozygous condition of the favorable allele vs 1 - 0.05 for the other homozygote) will alter allele frequencies in a population given alternative gene action. It is interesting to note that allele frequency nears, but does not obtain fixation under the dominant gene action model. Why? Though the recessive allele in such a model has a lower relative fitness, it is not selected against in the heterozygous condition and therefore persists. Alternatively, if the recessive allele has the higher relative fitness, it seems to take forever to get traction and increase in frequency. This is because the frequency of homozygous recessives will stay vanishingly small unless some other force is imposed, such as assortative mating. Figure Credit: Hartl, 2000, Figure 2.11 24
Selection: Numerical example You may wish to work through this exercise, as provided by White et al. 2007 to convince yourself of the concepts of absolute and relative fitness. From: White et al. 2007, Table 5.3 25
Natural selection: Fitness and selection Fitness: The relative contribution an individual (genotypic class) makes to the gene pool of the next generation Selection is often described by its predominant effect on a population. Types of selection commonly discussed today include those illustrated here. Directional selection will move a population in one direction, as the name implies. Diversifying selection drives a population to be bimodal, or perhaps to have several fitness optima. The opposite of diversifying selection is stabilizing selection, which continues to drive a population to a single point, with decreasing population variability. Image Credits: Alan Harvey, Georgia Southern University; http://www.bio.georgiasouthern.edu/bio-home/harvey/
What if selection is weak or absent? We’ve already seen that mutation can supply new variation that selection may act upon Most mutations are deleterious and are lost, but rarely, advantageous mutations can occur What about mutations that cause no effect either way? The neutral theory of evolution pertains to alleles that confer no difference in relative fitness – as if selection is oblivious to them The neutral theory of molecular evolution states that the vast majority of evolutionary change at the molecular level is caused by random drift of selectively neutral mutants. The theory was introduced by Motoo Kimura in the late 1960s and early 1970s, and although it was received by some as an argument against Darwin’s theory of evolution by natural selection, Kimura maintained (and most evolutionary biologists agree) that the two theories are compatible: "The theory does not deny the role of natural selection in determining the course of adaptive evolution" (Kimura, 1983). However, the theory attributes a large role to genetic drift (Wikipedia, 2010). Indeed, as genomics has provided new tools for simultaneously estimating selection effects of large numbers of SNPs, it appears that most of them behave in a neutral manner. This will be discussed more in future modules. 27
Population genetics: A final concept Linkage disequilibrium (LD, also called gametic phase disequilibrium) Conceptually – LD is a correlation in allelic state among loci Numerically Expected haplotype (gamete) frequency is the product of the two allele frequencies, i.e. f(AB) = f(A) x f (B) If f(AB) = f(A) x f (B), then LD = 0 If f(AB) ≠ f(A) x f (B), then LD ≠ 0 LD may arise from factors such as Recent mutations Historical selection (hitchhiking effect) Population admixture Recombination causes LD to decay over generations LD plays a major role in association genetics As we did in the previous module, we close with a brief discussion of linkage disequilibrium. The relevance of the concept is notable for our future discussions of association genetics. The following slide provides a more illustrative description of what LD looks like. 28
A numeric example of LD Determine allele frequencies Ask whether f(A) x f(B) = f(AB) Repeat for f(Ab), f(aB), and f(ab) Linkage disequilibrium (LD) reflects this difference This example introduces the concept of linkage disquilibrium (LD), which we’ll explore more thoroughly during later modules. LD is a confusing term because it relates to observed and predicted gamete frequencies, rather than to linkage per se. This numerical example contrasts two different gene pairs. The linked pair on the left has genes A and B drawn with solid red or blue chromosome segments. The unlinked pair on the right also has two genes, C and D, drawn with solid or stippled brown or green. Note that all four genes have two alleles each, uppercase (blue) and lowercase (red). Also note that in this example, the upper case alleles for the linked and unlinked pairs have the same frequencies, that is f(A) = f(C) while f(B) = f(D). For each gene, the frequency of the lowercase (red) allele is simply 1 minus the frequency of the uppercase (blue) allele. In many instances, it turns out that frequencies of the various gamete types (four each for the linked or unlinked gene pairs) can be predicted by muliplying the allele frequencies for each allele. For the linked AB gametes, this is f(A) x f(B) = 0.7 x 0.6 = 0.42. LD is zero whenever observed and predicted gamete frequencies are equal. For simplicity in our example, the linked gene pairs (left) have the same allele frequencies as the unlinked gene pairs (right), the “No LD” gamete frequencies are as shown in the left-hand column of the table, with cell values reflecting the product of the appropriate two-locus allele frequencies. Let’s also take a look at a couple of situations in which population gamete frequencies are measured in some direct way (e.g. by examining haploid tissues such as pollen). The center column (Higher LD) shows a relatively larger discrepancy between observed gamete frequences and those predicted from the product of allele frequencies. And note that one of the four gamete types is missing. In the right-hand “Lower LD” column, we see a relatively smaller discrepancy between observed and “predicted” gamete frequencies. The actual calculations, which we will explore later, depend only on the extent to which the actual observed gamete frequencies differ from the two-locus expectation (“predicted”), derived from the product of the two allele frequencies. LD estimates are identical regardless of whether the gene pairs are linked or unlinked. The concept of LD is particularly important in association genetics so we will come back to it again. For now, it is sufficient to recognize LD’s existence and to understand that it does not depend on linkage per se.
Some concluding remarks The central themes of population genetics remain How much genetic diversity is there? How is it distributed? How did it get that way? The foundation of population genetics, identifying, and quantifying genetic diversity, is no longer constrained by the lack of genetic markers. We can now measure diversity in literally thousands of genes simultaneously, and study how it is distributed Molecular population genetics The development of new genomic tools like high throughput sequencing and SNP detection and genotyping, has seen parallel development of new tools to fathom how the genetic diversity we are measuring got that way. These many analytical tools open the door for evaluating the selective footprints left on populations. Collectively, these new approaches are known as the relatively new discipline of molecular population genetics. We will dedicate another module to this topic later in the course. Suffice it to say that population geneticists have never had so many tools and data to study. The same can be said for plant and animal breeders. We now can envision dissecting the genetic architecture of virtually any trait in any species of interest, hindered largely by financial concerns alone. In Module 4, we move our discussion to quantitative genetics and the study of complex traits. As should be apparent by now, the line between Mendelian, population and quantitative genetics is becoming increasingly blurred. 30
Citations in this module Falconer, D. S. and T. F. C. Mackay. 1996. Introduction to Quantitative Genetics. (4th Ed). Longman Group Ltd. Essex, England. Hamrick, J. L., and M.J.W. Godt. 1990. Allozyme diversity in plant species. p. 43-63. In Brown, A.H.D., Clegg, M. T., Kahler, A. L., and B.S. Weir (ed.) Plant population genetics, breeding, and genetic resources. Sinauer Associates, Sunderland, MA. Hartl, D. L. 2000. A primer of population genetics. Sinauer Associates, Sunderland, MA. Hartl, D. L., and E. W. Jones. 2001. Genetics: Analysis of genes and genomes, 5th edition. Jones and Barlett, Sudbury, MA. Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, New York. White, T. L, Adams, W. T., and D. B. Neale. 2007. Forest genetics. CAB International, Oxfordshire, United Kingdom. Wikipedia. Available Online at: . http://en.wikipedia.org/wiki/Neutral_theory_of_molecular_evolution (verified 25 February, 2011)
Thank You. Conifer Translational Genomics Network Coordinated Agricultural Project www.pinegenome.org/ctgn