Download presentation
Presentation is loading. Please wait.
1
Workshop in Bioinformatics Eran Halperin
2
The Human Genome Project “What we are announcing today is that we have reached a milestone…that is, covering the genome in…a working draft of the human sequence.” “But our work previously has shown… that having one genetic code is important, but it's not all that useful.” “I would be willing to make a predication that within 10 years, we will have the potential of offering any of you the opportunity to find out what particular genetic conditions you may be at increased risk for…” Washington, DC June, 26, 2000
3
The Vision of Personalized Medicine Genetic and epigenetic variants + measurable environmental/behavioral factors would be used for a personalized treatment and diagnosis
4
Example: Warfarin An anticoagulant drug, useful in the prevention of thrombosis.
5
Warfarin was originally used as rat poison. Optimal dose varies across the population Genetic variants (VKORC1 and CYP2C9) affect the variation of the personalized optimal dose. Example: Warfarin
6
Association Studies
7
AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGTC AGAGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCGACATGATAGCC AGAGCAGTCGACAGGTATAGCCTACATGAGATCAACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCAACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGTC AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AGAGCAGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCAGTCGACATGTATAGTCTACATGAGATCAACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AGAGCAGTCGACATGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGTC AGAGCCGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCAACATGATAGCC AGAGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGTCTACATGAGATCAACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGTC Cases: Controls: Associated SNP Where should we look? SNP = Single Nucleotide Polymorphism Usually SNPs are bi-allelic
8
Published Genome-Wide Associations through 6/2009, 439 published GWA at p < 5 x 10 -8 NHGRI GWA Catalog www.genome.gov/GWAStudies
10
Environmental Factors Genetic Factors Complex disease Multiple genes may affect the disease. Therefore, the effect of every single gene may be negligible.
11
How does it work? For every pair of SNPs we can construct a contingency table: AGTotal Casesabn Control s cdn
12
Results: Manhattan Plots
13
The curse of dimensionality – corrections of multiple testing In a typical Genome-Wide Association Study (GWAS), we test millions of SNPs. If we set the p-value threshold for each test to be 0.05, by chance we will “find” about 5% of the SNPs to be associated with the disease. This needs to be corrected.
14
Bonferroni Correction If the number of tests is n, we set the threshold to be 0.05/n. A very conservative test. If the tests are independent then it is reasonable to use it. If the tests are correlated this could be bad: –Example: If all SNPs are identical, then we lose a lot of power; the false positive rate reduces, but so does the power.
15
Data
16
HUJI 2006 International consortium that aims in genotyping the genome of 270 individuals from four different populations.
17
HUJI 2006 - Launched in 2002. - First phase (2005): ~1 million SNPs for 270 individuals from four populations - Second phase (2007): ~3.1 million SNPs for 270 individuals from four populations - Third phase (ongoing): > 1 million SNPs for 1115 individuals across 11 populations
18
Other Data Sources Human Genome Diversity Project –50 populations, 1000 individuals, 650k SNPs POPRES –6000 individuals (controls) Encode Project –Resequencing, discovery of new SNPs 1000 Genomes project dbGAP
19
Haplotypes
20
Haplotypes Can 1,000,000 SNPs tell us everything? No, but they can still tell us a lot about the rest of the genome. –SNPs in physical proximity are correlated. –A sequence of alleles along a chromosome are called haplotypes.
21
Haplotype Data in a Block (Daly et al., 2001) Block 6 from Chromosome 5q31
22
LD structure
23
Phasing - haplotype inference Cost effective genotyping technology gives genotypes and not haplotypes. Haplotypes Genotype A C CG A C G T A ATCCGA AGACGC ATACGA AGCCGC Possible phases: AGACGA ATCCGC …. mother chromosome father chromosome
24
Haplotype Frequencies via Perfect Phylogeny 00000 01000 01001 11100 11110 p4p4 p3p3 p1p1 p5p5 p2p2 11000 p 1, p 2, p 3, p 4, p 5 - can be computed from the genotypes/pools by counting. Haplotype frequencies are given by f 01000 =p 2 -p 1 -p 5 4 3 1 5 2 [Kirkpatrick, Santos, Karp, H.]
25
25 1??11? ?100?? 1?0??? 10?11? 11?11? 1100?? 0100?? 100??? 110??? 1??11? 1100?? 0100?? 1?0??? 10011? 11111? 11000? 01001? 10011? 11000? Inferring Haplotypes From Trios Parent 1 Parent 2 Child 122112 210022 120222 Assumption: No recombination
26
Population Substructure Imagine that all the cases are collected from Africa, and all the controls are from Europe. –Many association signals are going to be found –The vast majority of them are false; Why ??? Different evolutionary forces: drift, selection, mutation, migration, population bottleneck.
27
Natural Selection Example: being lactose telorant is advantageous in northern Europe, hence there is positive selection in the LCT gene different allele frequencies in LCT
28
Genetic Drift Even without selection, the allele frequencies in the population are not fixed across time. Consider the following case: –We assume Hardy-Weinberg Equilibrium (HWE), that is, individuals are mating randomly in the population. –We assume a constant population size, no mutation, no selection
29
Genetic Drift: The Wright-Fisher Model Generation 1 Allele frequency 1/9
30
Genetic Drift: The Wright-Fisher Model Generation 2 Allele frequency 1/9
31
Genetic Drift: The Wright-Fisher Model Generation 3 Allele frequency 1/9
32
Genetic Drift: The Wright-Fisher Model Generation 4 Allele frequency 1/3
33
Genetic Drift: The Wright-Fisher Model
35
Ancestral population
36
migration
37
Ancestral population Genetic drift different allele frequencies
38
Population Substructure Imagine that all the cases are collected from Africa, and all the controls are from Europe. –Many association signals are going to be found –The vast majority of them are false; What can we do about it?
39
Jakobsson et al, Nature 421: 998-103
40
Principal Component Analysis Dimensionality reduction Based on linear algebra Intuition: find the ‘most important’ features of the data
41
Principal Component Analysis Plotting the data on a one dimensional line for which the ‘spread’ is maximized.
42
Principal Component Analysis In our case, we want to look at two dimensions at a time. The original data has many dimensions – each SNP corresponds to one dimension.
43
HapMap Populations 43 CEU ASW CHB CHD GIH JPT LWK MEX MKK TSI YRI
44
HapMap PCA 1-2 44
45
HapMap PCA 1-3 45
46
HapMap PCA 1,2,4 46
47
Ancestry Inference: To what extent can population structure be detected from SNP data? What can we learn from these inferences? Novembre et al., 2008
48
Ancestry inference in recently admixed populations 100% Percent racial admixture Individual subjects 1-90 Puerto Rican Population (GALA study, E. Burchard) European Native American African
49
Recombination Events Copy 1 Copy 2 child chromosome Probability r i for recombination in position i.
50
Recently Admixed Populations Aftergeneration 1 After generation 1
51
Recently Admixed Populations Aftergeneration2 After generation 2
52
Recently Admixed Populations After generation 10
53
Chromosome WRecombination IndicatorsgGenerations ZAncestral statesrRecombination rate XAllelesαAdmixture fraction p,qAllele frequencies
54
Overall Accuracy
55
Applications: Population genetics (admixture events, recombination events, selection forces, migration patterns) Potential applications in personalized medicine Finding new associations (through admixture mapping) 55
56
Admixture Mapping
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.