Human Gene Mapping & Disease Gene Identification Lecture: 7 Human Gene Mapping & Disease Gene Identification
Overview Whether a disease is inherited in a recognizable mendelian pattern or just occurs at a higher frequency in relatives of affected individuals, the genetic contribution to disease must result from genotypic differences among family members.
The Human Genome Project, has provided geneticists with a complete list of all human genes, knowledge of their location and structure, and a catalogue of some of the millions of variants in DNA sequence found among individuals in different populations. Some of these variants are common, others are rare, and still others differ in frequency among different ethnic groups. Whereas some variants clearly have functional consequences, others are neutral. For most, their significance for human health and disease is unknown.
Two fundamental approaches to disease gene identification: linkage analysis, is family-based. Linkage analysis takes explicit advantage of family pedigrees to follow the inheritance of a disease over a few generations by looking for consistent, repeated inheritance of a particular region of the genome whenever disease is passed on in a family.
Association analysis, is population-based Association analysis, is population-based. Association analysis does not depend explicitly on pedigrees but instead looks for increased or decreased frequency of a particular allele or set of alleles in a sample of affected individuals taken from the population, compared with a control set of unaffected people.
How Does Gene Mapping Contribute to Medical Genetics? Disease gene mapping has immediate clinical application by providing information about a gene's location that can be used to develop indirect linkage methods for use in prenatal diagnosis, pre-symptomatic diagnosis, and carrier testing Disease gene mapping is a critical first step in identifying a disease gene. Mapping the gene focuses attention on a limited region of the genome in which to carry out a systematic analysis of all the genes so we can find the mutations or variants that contribute to the disease (known as positional cloning).
Positional cloning of a disease gene provides an opportunity to characterize the disorder as to the extent of: locus heterogeneity, the spectrum of allelic heterogeneity, the frequency of various disease-causing or predisposing variants in various populations, the penetrance and positive predictive value of mutations, the fraction of the total genetic contribution to a disease attributable to the variant at any one locus, and the natural history of the disease in asymptomatic at-risk individuals. Positive predictive value (PPV) The probability of an individual having a particular disorder following a positive test, i.e. the proportion of true positive results relative to the total number of positive tests. As the prevalence of a disease increases, the PPV also increases as each individual is more likely to have the disorder.
Characterization of a gene and the mutations in it furthers our understanding of disease pathogenesis development of specific and sensitive diagnosis by direct detection of mutations, population-based carrier screening to identify individuals at risk for disease in themselves or their offspring, development of cell and animal models, drug therapy to prevent or ameliorate disease or to slow its progression, and treatment by gene replacement
Independent Assortment and Homologous Recombination in Meiosis The effect of recombination on the origin of various portions of a chromosome. Because of crossing over in meiosis, the copy of the chromosome the boy (generation III) inherited from his mother is a mosaic of segments of all four of his grandparents' copies of that chromosome.
Since homologous chromosomes look identical under the microscope, we must be able to differentiate them in order to trace the grandparental origin of each segment, and to determine if and where recombination has occurred. Genetic marker: any characteristic located at the same position on a pair of homologous chromosomes and allows distinguishing them. Millions of genetic markers are now available that can be genotyped by PCR.
Alleles at Loci on Different Chromosomes Assort Independently Independent assortment of alleles at two loci, 1 and 2, when they are located on different chromosomes. Assume that alleles D and M were inherited from one parent, d and m from the other Half (50%) of gametes will be parental (DM or dm) and half (50%) will be non-parental (dM or Dm).
Gametes containing DM or dm are non- recombinant Assume D and M are paternally derived and d and m are maternally derived. Gametes containing DM or dm are non- recombinant Alleles at Loci on the Same Chromosome Assort Independently if at Least One Crossover Occurs Between Them in Every Meiosis Note: Genes that reside on the same chromosome are said to be syntenic
recombinant chromosome If crossing over occurs at least once in the segment between the loci, the resulting chromatids may be either nonrecombinant or Dm and dM, which are not the same as the parental chromosomes; such a nonparental chromosome is therefore a recombinant chromosome
The ratio of recombinant to nonrecombinant genotypes will be, on average, 1 : 1, just as if the loci were on separate chromosomes and assorting independently
Recombination Frequency and Map Distance Crossing over between homologous chromosomes in meiosis is shown in the quadrivalents on the left. Crossovers result in new combinations of maternally and paternally derived alleles on the recombinant chromosomes present in gametes. If no crossing over occurs in the interval between loci 1 and 2, only parental (nonrecombinant) allele combinations, DM and dm, occur in the offspring. If one or two crossovers occur in the interval between the loci, half the gametes will contain a nonrecombinant and half the recombinant combination. The same is true if more than two crossovers occur between the loci.
Assortment of alleles at two loci, 1 and 2, when they are located on the same chromosome. A, The loci are far apart and at least one crossover between them is likely to occur in every meiosis. B, The loci are so close together that crossing over between them is very unlikely. C, The loci are close together on the same chromosome but far enough apart that crossing over occurs in the interval between the two loci only in some meioses and not in others. The smaller the recombination frequency, the closer together two loci are.
A common notion for recombination frequency is θ, where θ varies from 0 (no recombination at all) to 0.5 (independent assortment). (as a proportion, not a percentage)
Detecting the recombination events between loci requires that (1) a parent be heterozygous (informative) at both loci and (2) we know which allele at locus 1 is on the same chromosome as which allele at locus 2. Alleles on the same homologue are in coupling (or cis), whereas alleles on the different homologues are in repulsion (or trans).
Effect of Heterozygosity and Phase on Detecting Recombination Events Possible phases of alleles M and m at a marker locus with alleles D and d at a disease locus
Co-inheritance of the gene for an autosomal dominant form of retinitis pigmentosa, RP9, with marker locus 2 and not with marker locus 1. Only the mother's contribution to the children's genotypes is shown. The mother (I-1) is affected with this dominant disease and is heterozygous at the RP9 locus (Dd) as well as at loci 1 and 2. She carries the A and B alleles on the same chromosome as the mutant RP9 allele (D). The unaffected father is homozygous normal (dd) at the RP9 locus as well as at the two marker loci (AA and BB); his contributions to his offspring are not considered further.
All three affected offspring have inherited the B allele at locus 2 from their mother, whereas the three unaffected offspring have inherited the b allele. Thus, all six offspring are nonrecombinant for RP9 and marker locus 2. However, individuals II-1, II-3, and II-5 are recombinant for RP9 and marker locus 1, indicating that meiotic crossover has occurred between these two loci.
Linkage and Recombination Frequency Linkage is the term used to describe a departure from the independent assortment of two loci, or, in other words, the tendency for alleles at loci that are close together on the same chromosome to be transmitted together, as an intact unit, through meiosis. Analysis of linkage depends on determining the frequency of recombination as a measure of how close different loci are to each other on a chromosome. If two loci are so close together that θ= 0 between them, they are said to be tightly linked; if they are so far apart that θ= 0.5, they are assorting independently and are unlinked.
Suppose that among the offspring of informative meioses (i. e Suppose that among the offspring of informative meioses (i.e., those in which a parent is heterozygous at both loci), 80% of the offspring are non-recombinant and 20% are recombinant. At first glance, the recombination frequency is therefore 20% (θ= 0.2). the accuracy of this measure of depends on the size of the family used to make the measurement.
The map distance between two loci is a theoretical concept that is based on real data, the extent of observed recombination, θ, between the loci. Map distance is measured in units called centimorgans (cM), defined as the genetic length over which, on average, one crossover occurs in 1% of meioses. Therefore, a recombination fraction of 1% ( θ= 0.01) translates approximately into a map distance of 1 cM As the map distance between two loci increases, however, the frequency of recombination we observe between them does not increase proportionately (Fig. 10-7). This is because as the distance between two loci increases, the chance that the chromosome carrying these two markers could undergo more than one crossing over event between these loci also increases. As a rule of thumb, recombination frequency begins to underestimate true genetic distance significantly once rises above 0.1.
The relationship between map distance in centimorgans and recombination fraction,θ. Recombination fraction (solid line) and map distance (dotted line) are nearly equal, with 1 cM = 0.01 recombination, for values of genetic distance below 10 cM, but they begin to diverge because of double crossovers as the distance between the markers increases. The recombination fraction approaches a maximum of 0.5 no matter how far apart loci are; the genetic distance increases proportionally to the distance between loci.
Genetic maps and physical maps To measure true genetic map distance between two widely spaced loci accurately, therefore, one has to use markers spaced at short genetic distances in the interval between these two loci and add up the values of θ between the intervening markers. (Fig. 10-8). For example, human chromosome 1 is the largest human chromosome in physical length (283 Mb) and also has the greatest genetic length, 270 cM (0.95 cM/Mb); the q arm of the smallest chromosome, number 21, is 30 Mb in physical length and 62 cM in genetic length (∼2.1 cM/Mb). Overall, the human genome, which is estimated to contain about 3200 Mb, has a genetic length of 3615 cM, for an average of 1.13 cM/Mb. Furthermore, the ratio of genetic distance to physical length is not uniform along a chromosome as one looks with finer and finer resolution at recombination versus physical length. since the values of θ between pairs of closely neighboring markers will be good approximations of the genetic distances between them
diagram showing how adding together short genetic distances, measured as recombination fraction,θ , between neighboring loci A, B, C, and so on allows accurate determination of genetic distance between the two loci A and H located far apart. The value of between A and H is not an accurate measure of genetic distance.
Sex Differences in Map Distances Just as male and female gametogenesis shows sex differences in the types of mutations and their frequencies, there are also significant differences in recombination between males and females. Across all chromosomes, the genetic length in females, 4460 cM, is 72% greater than the genetic distance of 2590 cM in males, and it is consistently about 70% greater in females on each of the different autosomes. The reason for increased recombination in females compared with males is unknown, although one might speculate that it has to do with the many years that female gamete precursors remain in meiosis I before ovulation.
Linkage Equilibrium and Disequilibrium When a disease allele first enters the population (by mutation or a founder), the particular set of alleles at markers linked to the disease locus constitutes a disease-containing haplotype The degree to which this haplotype will persist as such over time depends on probability of recombination
The speed with which recombination will move disease allele onto a new haplotype is the product of two main factors: The number of generations, and therefore the number of opportunities for recombination The frequency of recombination between the loci A third factor, selection for or against a particular haplotype, but its effect has been difficult to prove in humans
With each generation, meiotic recombination exchanges the alleles that were initially present at polymorphic loci on a chromosome on which a disease-associated mutation arose ( ) for other alleles present on the homologous chromosome. Over many generations, the only alleles that remain in coupling phase with the mutation are those at loci so close to the mutant locus that recombination between the loci is very rare. These alleles are in linkage disequilibrium with the mutation and constitute a disease-associated haplotype. B, Affected individuals in the current generation (arrows) carry the mutation ( ) in linkage disequilibrium with the disease-associated haplotype (filled-in solid blue symbols). Depending on the age of the mutation and other population genetic factors, a disease-associated haplotype ordinarily spans a region of DNA of a few kb to a few hundred kb. Alleles in linkage disequilibrium with the mutation and constitute a disease-associated haplotype
The shorter the time since the disease allele appeared and the smaller the value of θ, the greater is the chance that the disease-containing haplotype will persist intact. With longer time periods and greater values of θ, shuffling will go to completion and the allele frequencies for marker alleles in the haplotype that includes the disease allele will come to equal the frequencies of these marker alleles in all chromosomes in the population i.e., alleles in the haplotype will have reached equilibrium.
The Haplotype Map (HapMap) One of the biggest human genomics efforts to follow completion of the sequencing is a project designed to create a haplotype map (HapMap) of the genome. The goal of the HapMap project is to make LD measurements between a dense collection of millions of single nucleotide polymorphisms (SNPs) throughout the genome. To accomplish this goal, geneticists collected and characterized millions of SNP loci, developed methods to genotype them rapidly and inexpensively, and used them, one pair at a time, to measure LD between neighboring markers throughout the genome. The measurements were made in samples that included both unrelated population samples and samples containing one child and both parents, obtained from four geographically distinct groups: a primarily European population, a West African population, a Han Chinese population, and a population from Japan
The study showed that: 1) More than 90% of all SNPs are shared among such geographocally disparate populations, with allele frequencies that are quite similar in the different populations This finding indicates that most SNPs are old and predate the waves of emigration out of East Africa that populated the rest of the world Differences in allele frequencies in a small fraction of SNPs may be the result of either genetic drift/founder effect or selection in localized geographical regions after migration out of Africa. Are all SNPs exist in the non-coding DNA?
Such SNPs, termed ancestry informative markers, are used in studies of human origin, migration and gene flow. In forensic investigations, to determine the likely ethnic background when the only available evidence is DNA
2) When pairwise measurements of LD were made for neighboring SNPs across the genome, contiguous SNPs can be grouped into clusters of varying size in which SNPs in any one cluster shows high levels of LD with each other. These clusters of SNPs in high LD, located across segments of a few kb to a few dozen Kb are termed LD blocks. The sizes of LD blocks are not identical in all populations. African populations have smaller blocks as compared to other populations.
A 145-kb region of chromosome 4 containing 14 SNPs A 145-kb region of chromosome 4 containing 14 SNPs. In cluster 1, containing SNPs 1 through 9, five of the 29 = 512 theoretically possible haplotypes are responsible for 98% of all the haplotypes in the population, reflecting substantial linkage disequilibrium among these SNP loci. Similarly, in cluster 2, only three of the 24 = 16 theoretically possible haplotypes involving SNPs 11 to 14 represent 99% of all the haplotypes found. In contrast, alleles at SNP 10 are found in linkage equilibrium with the SNPs in cluster 1 and cluster 2.
3) Pairwise measurements of recombination between closely neighboring SNPs revealed that the ratio of map distance to base pairs was not constant (~1 cM/Mb). Instead ranged from far below 0.01 cM/Mb to more than 60 cM/Mb. - This indicates that rate of recombination between polymorphic markers which was thought to be uniform is, in fact, the result of an averaging of “hotspots” of recombination interspersed among regions of little or no recombination.
B, A schematic diagram in which each box contains the pairwise measurement of the degree of linkage disequilibrium between two SNPs (e.g., the arrow points to the box, outlined in black, containing the value of D' for SNPs 2 and 7). The higher the degree of LD, the darker the color in the box, with maximum D' values of 1.0 occurring when there is complete LD. Two LD blocks are detectable, the first containing SNPs 1 through 9, and the second SNPs 11 through 14. In the first block, pairwise measurements of D' reveal LD. A similar level of LD is found in block 2. Between blocks, the 14-kb region containing SNP 10 shows no LD with neighboring SNPs 9 or 11 or with any of the other SNP loci. Below is a graph of the ratio of map distance to physical distance (cM/Mb) showing that a recombination hotspot is present in the region around SNP 10 between the two blocks, with values of recombination that are 50- to 60-fold above the average of approximately 1.13 cM/Mb for the genome.