Download presentation
Presentation is loading. Please wait.
Published byIris Tucker Modified over 8 years ago
1
Genome Biology and Biotechnology 4. The variable human genome Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute for Biotechnology (VIB) University of Gent International course 2005
2
Summary ¤Sequence Variations in the Human Genome ¤Haplotype structure of the sequence variations in the human genome –linkage disequilibrium in the human genome –haplotype blocks in the human genome ¤The haplotype map of the human genome –Map of all the genetic variations in the human population
3
Sequence Variations in the Human Genome ¤Most human sequence variation (>90%) results from –SNPs (single nucleotide polymorphisms) –SNPs are the result of very rare replication errors in which a wrong base remains incorporated in the newly synthesized strand ¤Human sequence variation is responsible for –Phenotypic variation between individuals –Influencing the risk of common human diseases
4
The International HapMap Consortium et. al., Nature 437, 1299 (2005) Root causes of common human diseases ¤Causes of human diseases are largely unknown –preventative measures are generally inadequate –available treatments are seldom curative ¤Family history is one of the strongest risk factors for nearly all diseases –cardiovascular disease, cancer, diabetes, autoimmunity, psychiatric illnesses and many others –inherited genetic variation has an important role in the pathogenesis of disease ¤Identifying the causal genes and variants represents an important step towards –improved prevention, diagnosis and treatment of disease
5
The International HapMap Consortium et. al., Nature 437, 1299 (2005) Heritable human diseases ¤Rare highly heritable 'mendelian' disorders –> 1000 genes have been identified –variation in a single gene causes disease ¤Common human diseases –are thought to be due to the combined effect of many different susceptibility DNA variants interacting with environmental factors –have proven much more challenging to study
6
The International HapMap Consortium et. al., Nature 437, 1299 (2005) Common human diseases ¤Studies of common diseases: 2 broad classes –family-based linkage studies across the entire genome linkage analysis has low power except when –a single locus explains a substantial fraction of disease –population-based association studies of candidate genes association studies examine only a small fraction of the 'universe' of sequence variation in each patient ¤Comprehensive search for genetic influences –examining all genetic differences in a large number of affected individuals and controls complete genome resequencing systematically test common genetic variants
7
The International HapMap Consortium et. al., Nature 437, 1299 (2005) Common genetic variants ¤Common genetic variants –explain much of the genetic diversity in our species –a consequence of the historically small size and shared ancestry of the human population ¤Common variants with an important role in disease HLA: autoimmunity and infection APOE4: Alzheimer's disease, lipids Factor VLeiden: deep vein thrombosis PP: G: encoding PPAR; type 2 diabetes KCNJ11: type 2 diabetes PTPN22: rheumatoid arthritis and type 1 diabetes CTLA4: autoimmune thyroid disease, type 1 diabetes NOD2: inflammatory bowel disease complement factor H: age-related macular degeneration RET: Hirschsprung disease
8
Sequence Variations in the Human Genome ¤Most human sequence variation (>90%) results from –SNPs (single nucleotide polymorphisms) –SNPs occur on average every 1,000 bases when the sequences of two human individuals are compared –Remainder of the human sequence variation is attributable to insertions or deletions of one or more bases repeat length polymorphisms Rearrangements ¤SNPs are well suited to automated, high-throughput and low cost genotyping –SNPs are binary and can thus easily be typed –SNPs have a low rate of recurrent mutation –SNPs are present at sufficient density for comprehensive genetic analysis
9
High throughput SNP Genotyping Methods ¤Primer extension –Primer designed adjacent to the SNP, extended and the extension product analyzed Fluorescence Mass spectrometry ¤Oligonucleotide ligation –Ligation requires perfect base pairing of the terminal nucleotides ¤Array-based hybridization –high density Affymetrix microarrays 25-mer oligonucleotides are perfectly suited to discriminate SNP alleles Latest product 500.00 SNP array A/C T/G
10
Haplotype structure of the sequence variations ¤Human genetic diversity appears to be –Limited: a small number of common polymorphisms explain the bulk of the observed variation, i.e. are found in most individuals in the population haplotypes –Structured: specific combinations of alleles – haplotypes – are observed at closely linked sites recombination Haplotype 1 Haplotype 2SNP Haplotype 3
11
Haplotype Structure of the Sequence Variations ¤At a macroscopic scale (chromosome), linkage equilibrium –recurrent recombination results in complete linkage equilibrium random combinations of SNPs Recombination1 st generation Random assortment of SNPs N generationsN recombination events
12
Haplotype Structure of the Sequence Variations ¤At a microscopic scale (gene) linkage disequilibrium –Non-random recombination results in linkage disequilibrium Non-random combinations of SNPs: haplotype blocks 1 st generation N generations Haplotype blocks
13
Linkage disequilibrium in the human genome ¤Landmark paper presenting –a systematic analysis of the extent of linkage disequilibrium in the human genome –a large-scale experiment to measure linkage disequilibrium (LD) in 19 randomly selected genomic regions in United States population of north-European descent Nigerian population Reich et. al., Nature 411, 199 (2001)
14
Reprinted from: Reich et. al., Nature 411, 199 (2001) Experimental Approach ¤Selected 19 high-frequency or common SNPs in genes as core SNPs –High-frequency SNPs tend to be common in all populations, facilitating cross-population comparisons –Linkage disequilibrium around common alleles can be measured with a modest sample size of 80–100 chromosomes –Linkage disequilibrium around common alleles represents a 'worst case' scenario Such alleles are generally old and there has been ample historical opportunity for recombination to break down ancestral haplotypes
15
Reprinted from: Reich et. al., Nature 411, 199 (2001) Experimental Approach ¤High frequency SNPs were identified at various distances from the core SNPs –Re-sequenced regions of ~ 2 kb at 0, 5, 10, 20, 40, 80 and 160 kb from the core SNP in 44 unrelated individuals from Utah Identified a total of 272 'high frequency' polymorphisms –Measured linkage disequilibrium between two SNPs using the classical statistic D‘ D’ = observed linkage/maximal linkage: P ab /(P a,P b ) Core SNP 40201080160
16
Reprinted from: Reich et. al., Nature 411, 199 (2001) Observed Linkage Disequilibrium ¤Linkage disequilibrium has a half-length of ~ 60 kb –linkage disequilibrium extends much (10-fold) further than previously predicted
17
Reprinted from: Reich et. al., Nature 411, 199 (2001) Why does linkage disequilibrium extend so far? ¤Long-range linkage disequilibrium can be explained by –an extreme founder effect or population bottleneck A period when the population was so small that a few ancestral haplotypes gave rise to the present day haplotypes ¤Linkage disequilibrium in different populations –short-range linkage disequilibrium is general in sub-Saharan African populations –long-range linkage disequilibrium is typical for northern Europeans a severe bottleneck in the European population could have generated the linkage disequilibrium
18
Origin of linkage disequilibrium? ¤The bottleneck could be specific to northern Europe –Europe was substantially depopulated during the Last Glacial Maximum (30,000–15,000 years ago), and subsequently recolonized by a small number of founders Long range linkage disequilibrium would be absent in other non- African populations ¤The bottleneck is more global –Result of the dispersal of the modern humans from Africa 50,000 years ago Long-range linkage disequilibrium would then be present in a variety of non-African populations Reprinted from: Reich et. al., Nature 411, 199 (2001)
19
High-resolution Haplotype Structure in the Human Genome ¤Landmark paper presenting –High-resolution analysis of the haplotype structure across 500 kb region on chromosome 5q31 Genotyped 103 common SNPs in 129 trios from a European-derived population –Low marker density of 1 SNP roughly every 5 kb –First high-resolution picture of the patterns of genetic variation across a large genomic region Daly et. al., Nature Genet. 29, 229 (2001)
20
Block-like Haplotype Diversity at 5q31 Reprinted from: Daly et. al., Nature Genet. 29, 229 (2001)
21
Block-like Haplotype Diversity at 5q31 ¤The common SNPs are arranged in haplotype blocks –span up to 100 kb –contain multiple (five or more) common SNPs –have only a few (2–4) haplotypes, which account for the majority of chromosomes (>90%) in the sample show no evidence of being derived from one another by recombination Reprinted from: Daly et. al., Nature Genet. 29, 229 (2001)
22
Block-like Haplotype Diversity at 5q31 ¤The haplotype blocks are separated by intervals –in which several independent historical recombination events seem to have occurred ¤The historical recombination events are clustered –multiple exchanges between most blocks –little or no recombination within blocks. –The clustering of recombination events is suggestive of local hotspots of recombination Reprinted from: Daly et. al., Nature Genet. 29, 229 (2001) Historical recombination events
23
Implications of Haplotype blocks ¤Once the haplotype blocks are identified –they can be treated as alleles in genome-wide association studies to find medically relevant variation Holy grail of pharmacogenetics –a subset of SNPs haplotype tag SNPs – htSNPs - can be used to uniquely distinguish the common haplotypes in each block A subset of all the SNPs is sufficient for whole-genome association anlysis Reprinted from: Daly et. al., Nature Genet. 29, 229 (2001)
24
Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 ¤Landmark paper presenting –the haplotype structure of chromosome 21 –Used high-density oligonucleotide arrays, in combination with somatic cell genetics To identify the common SNPs on human chromosome 21 To directly observe the haplotype structure defined by these SNPs This structure reveals blocks of limited haplotype diversity in which more than 80% of a global human sample can typically be characterized by only three common haplotypes Patil et. al., Science, 294: 1719 (2001)
25
Experimental Approach ¤Discovered chr 21 SNPs and determined the haplotype structure using –ultra high-density oligonucleotide arrays –in combination with somatic cell genetics ¤SNPs discovery –Using a public panel of 24 ethnically diverse individuals African, Asian, and Caucasian –Physically separated the two chr 21 copies from each individual using a rodent-human somatic cell hybrid technique –Analyzed 20 independent copies of chromosome 21 ¤Since SNPs are characterized on haploid copies –they directly reveal haplotypes –The SNPs of chromosome 21 reveal numerous haplotype blocks Reprinted from: Patil et. al., Science, 294: 1719 (2001)
26
Haplotype Block Defined by 14 Common SNPs Block of consecutive common SNPs Nucleotide position on chrom. 21 15/20 individual chromosomes major allele minor allele Haplotype blocks123456 Reprinted from: Patil et. al., Science, 294: 1719 (2001)
27
Haplotype Block: selection of tag SNPs haplotype patterns1234 SNPs for genotyping 4 common haplotypes Reprinted from: Patil et. al., Science, 294: 1719 (2001)
28
Inventories of human genome sequence variation ¤The first inventory of SNPs was made by –The public Human Genome Project (HGP) 971,077 candidate SNPs were identified as sequence differences in regions of sequence overlap between large-insert clones –The SNP Consortium (TSC) – a public/private consortium Discovered using a publicly available panel of 24 ethnically diverse individuals 1,023,950 candidate SNPs identified by shotgun sequencing of genomic fragments and aligning to the genome sequence ¤First inventory (2001) comprised 1,4 million SNPs –Average density of one SNP every 1.91 kb –SNPs primarily in regions surrounding genes estimate 60,000 exonic SNPs in the collection The International SNP Map Working Group, Nature 409, 928 (2001)
29
Human genome sequence variation ¤It is estimated that in the world's human population –about 10 million “common” SNPs With a minor allele frequency of 1% or more one variant per 300 bases on average –these 10 million common SNPs constitute 90% of the variation in the world population –The remaining 10% of the variation is due A large number of SNPs that are rare in the population These may represent another 30 million SNPs ¤Next frontier in the human genome –Complete inventory of the common SNPs –Complete map of the common SNPs: The HapMap project The International SNP Map Working Group, Nature 409, 928 (2001)
30
The International HapMap Project ¤The goal of the International HapMap Project –determine the common patterns of DNA sequence variation in the human genome and –make this information freely available in the public domain. ¤The HapMap will –allow the discovery of sequence variants that affect common disease –will facilitate development of diagnostic tools –will enhance our ability to choose targets for therapeutic intervention The International HapMap Consortium, Nature 426, 789 - 796 (2003)
31
The International HapMap Project ¤Determine haplotype patterns across the genome –5 million common sequence variants genotyped in 270 DNA samples from populations of Africa, Asia and Europe –Common SNPs are found in all populations Project includes several populations from different geographic locations –Yoruba, Japanese, Chinese individuals and individuals with ancestry from Northern and Western Europe ¤Genotyping strategy –Phase I initial round of genotyping of 1.00.000 SNPs in the 270 DNA samples –completed December 2004 –Phase II genotyped 5 million SNPs at ~ 1-kilobase intervals in 270 individuals –Completed November 2005
32
The International HapMap Project ¤The extent of association between nearby markers –varies dramatically across the genome –the patterns of association must be empirically determined for efficient selection of tag SNPs. ¤On the basis of empirical studies it is estimated that –most of the information about genetic variation represented by the 10 million common SNPs in the population could be provided by genotyping 200,000 to 1,000,000 tag SNPs across the genome –Thus, a substantial reduction in the amount of genotyping can be obtained with little loss of information, by using knowledge of the LD present in the genome.
33
Perspectives ¤For the full potential of the HapMap to be realized –The genotyping technology must become more cost efficient, and the analysis methods must be improved –Pilot studies with other populations must be completed to confirm that the HapMap is generally applicable ¤Genome-wide association projects must establish –carefully phenotyped sets of affected and unaffected individuals for many common diseases in a way that preserves confidentiality retains detailed clinical and environmental exposure data ¤Careful attention must also be paid to the ethical issues that –will be raised by the HapMap and the studies that will use it –challenge to avoid misinterpretations or misuses of results from studies that use the HapMap
34
Whole-Genome Patterns of Common DNA Variation in Three Human Populations ¤Paper presents –Whole-genome patterns of common human DNA variation by genotyping 1,586,383 SNPs in 71 Americans of European, African, and Asian ancestry –Different approach to represent the structure of genetic variation LD bins: clusters of tightly linked SNPs Hinds et. al., Science. 307: 1072-1079 (2005)
35
Reprinted from: Hinds et. al., Science. 307: 1072-1079 (2005) Extended LD bin and haplotype block structure around the CFTR gene
36
Conclusion ¤The 1,5 Million SNPs capture –most common human genetic variation as a result of linkage disequilibrium strong correlation among common SNP alleles that define haplotypes ¤Strong correlation between –extended regions of linkage disequilibrium –functional genomic elements ¤First generation haplotype map provides a tool for –exploring the causal role of common human DNA variation in complex human traits –investigating the nature of genetic variation within and between human populations. Reprinted from: Hinds et. al., Science. 307: 1072-1079 (2005)
37
A haplotype map of the human genome ¤Paper presents –A map of >1 million SNPs for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations –The data document the generality of recombination hotspots a block-like structure of linkage disequilibrium low haplotype diversity substantial correlations of SNPs with many of their neighbours The International HapMap Consortium et. al., Nature 437, 1299 (2005)
38
Number of SNPs in dbSNP over time ¤Public database dbSNP (http://www.ncbi.nlm.nih.gov/SNP/)http://www.ncbi.nlm.nih.gov/SNP/ –October of 2005: 10,4 million RefSNP clusters –4,8 million validated SNPs
39
The International HapMap Consortium et. al., Nature 437, 1299 (2005) Genealogical relationships among haplotypes
40
The International HapMap Consortium et. al., Nature 437, 1299 (2005) Length of LD spans
41
The International HapMap Consortium et. al., Nature 437, 1299 (2005) Conclusions ¤The phase I haplotype map documents the generality of –block-like structure of linkage disequilibrium –low haplotype diversity –recombination hotspots –substantial correlations of SNPs with many of their neighbours ¤Important application of the HapMap data is –make possible comprehensive, genome-wide association studies –Identify the root causes of common deseases
42
Recommended reading ¤Human Haplotype Map –The Structure of Haplotype Blocks in the Human Genome Daly et. al., Nature Genet. 29, 229 (2001) –The human HapMap project The International HapMap Consortium, Nature 426, 789 - 796 (2003)The International HapMap Consortium, Nature 426, 789 - 796 (2003) –Haplotype map of the human genome The International HapMap Consortium et. al., Nature 437, 1299 (2005)The International HapMap Consortium et. al., Nature 437, 1299 (2005)
43
Further reading ¤Sequence variations in the human genome –A map of human genome sequence variation The International SNP Map Working Group, Nature 409, 928 (2001) ¤Haplotype structure of the sequence variations in the human genome –Linkage disequilibrium in the human genome Reich et. al., Nature 411, 199 (2001) –The Structure of Haplotype Blocks in the Human Genome Patil et. al., Science, 294: 1719 (2001) –First generation human haplotype map Hinds et. al., Science. 307: 1072-1079 (2005)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.