Human Genetics, part I Liisa Kauppi (Keeney lab)

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Chapter 11 Mendel & The Gene Idea.
Basics of Linkage Analysis
GGAW - Oct, 2001M-W LIN Study Design for Linkage, Association and TDT Studies 林明薇 Ming-Wei Lin, PhD 陽明大學醫學系家庭醫學科 台北榮民總醫院教學研究部.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis)
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Broad-Sense Heritability Index
14 Population Genetics and Evolution. Population Genetics Population genetics involves the application of genetic principles to entire populations of.
A gene is composed of strings of bases (A,G, C, T) held together by a sugar phosphate backbone. Reminder - nucleotides are the building blocks.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004.
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Gene Hunting: Linkage and Association
A basic review of genetics Dr. Danny Chan Associate Professor Assistant Dean (Faculty of Medicine) Department of Biochemistry Department of Biochemistry.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
The International Consortium. The International HapMap Project.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Hardy Weinberg Equilibrium. What is Hardy- Weinberg? A population is in Hardy-Weinberg equilibrium if the genotype frequencies are the same in each generation.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Single Nucleotide Polymorphisms (SNPs
Extending Mendelian Genetics
Genomic Analysis: GWAS
Common variation, GWAS & PLINK
Genetic Linkage.
Mendel and the Gene Idea
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Chapter Seven: Extending Mendelian Genetics
Migrant Studies Migrant Studies: vary environment, keep genetics constant: Evaluate incidence of disorder among ethnically-similar individuals living.
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Genetic Linkage.
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Patterns of Linkage Disequilibrium in the Human Genome
Gene Linkage and Genetic Mapping
Mapping Quantitative Trait Loci
Linkage, Recombination, and Eukaryotic Gene Mapping
Genome-wide Associations
Linking Genetic Variation to Important Phenotypes
Chapter 7 Multifactorial Traits
Exercise: Effect of the IL6R gene on IL-6R concentration
Genetic Linkage.
Association Analysis Spotted history
Class Notes #8: Genetic Disorders
Balanced Translocation detected by FISH
The Genetic of Earwax Wet earwax is a dominant allele!
Medical genomics BI420 Department of Biology, Boston College
Linkage Analysis Problems
Medical genomics BI420 Department of Biology, Boston College
Presentation transcript:

Human Genetics, part I Liisa Kauppi (Keeney lab) Mapping Mendelian and complex diseases - Linkage mapping in pedigrees - Association mapping in populations

Genes and Environment “Natural” mutants only Heritability: first degree relatives of a patient at greater risk For type I diabetes, l = 15 (6%/0.4%) Twin studies: Adopted (separated in infancy) Fraternal vs. identical twins Biological vs. non-biological siblings

Genes and Human Disease Polygenic, Reduced penetrance Asthma Osteoporosis Schizophrenia Infectious disease Height Body weight HARD COMPLEX/MULTIFACTORIAL DISEASE “Common disease” High penetrance, Single gene Cystic fibrosis Pure environment Blood type EASY MENDELIAN Snakebite Language

Polymorphic markers are needed for disease mapping Microsatellites Tandem arrays of simple repeats, for example (CA)n, n=15…27 MULTI-ALLELIC A Single nucleotide polymorphisms (SNPs) Abundant, perhaps 1 every 300 bp - RFLPs G BI-ALLELIC Mostly non-coding

B allele has frequency p b allele has frequency q p + q = 1 Genotype frequencies: Hardy-Weinberg equation B allele has frequency p b allele has frequency q p + q = 1 p (B) q (b) p2 (BB) pq (Bb) q2 (BB) p2 (BB) + 2pq (Bb) + q2 (bb) = 1 Hardy-Weinberg equilibrium

How are recessive traits maintained in a population? HWE of allele frequencies: p2 + 2pq+ q2= 1 Hypothetical example: in Sardinia, 1 in 5 individuals have straight hair This trait is determined by a single gene and it is recessive. S allele = curly hair, s allele = straight hair Frequency of s/s homozygotes is 0.2 Frequency of s allele is 0.45 (0.2) Frequency of S allele is 1 - 0.45 = 0.55 Gametes for next generation: S s 0.552=0.3 0.55 x 0.45 = 0.25 0.452=0.2 Frequencies of genotypes and alleles remain unchanged from one generation to the next.

HWE allows calculations of carrier frequencies for recessive traits (with caution) Example: Cystic fibrosis, alleles CF and cf Incidence 1/2000 births p2 + 2pq+ q2= 1 Frequency of cf/cf homozygotes is 0.0005 Frequency of cf allele is 0.022 (0.0005) Frequency of CF allele is 1- 0.022 = 0.978 Frequency of CF/cf heterozygotes is 2 x 0.978 x 0.022 = 0.043

So what if genotypes at a locus are not in HWE? Suggests that assumptions are not met Example: heterozygote deficit could arise from recent admixture p2 + 2pq+ q2= 1 Population 1 Population 2 B freq 0.9 b freq 0.1 B freq 0.1 b freq 0.9 n=1000 0.81+0.18+0.01 0.01+0.18+0.81 810+180+10 10+180+810 n=2000 B freq 0.5 b freq 0.5 0.25+0.5+0.25 500+1000+500 HWE expected 820+360+820 observed

Departure from HWE (heterozygote excess): the Prion protein gene and human disease PRNP gene linked to prion diseases e.g. CJD, kuru A common polymorphism, M129V, influences the course of these diseases: the MV heterozygous genotype is protective Kuru acquired from ritual cannibalism was reported (1950s) in the Fore people of Papua New Guinea, where it caused up to 1% annual mortality Departure from Hardy-Weinberg equilibrium for the M129V polymorphism is seen in Fore women over 50 (23/30 heterozygotes, P = 0.01)

Linkage studies - recombination in a family how often are 2 loci separated by meiotic recombination? 2 loci on same chromosome II Informative and uninformative meioses Family based designs test sharing of alleles among all affected and unaffected individuals and compare the probability of transmission if linked to the affection status compared with no linkage. III NR R Recombination fraction  is 2/6=0.33

Recognizing recombinants does the disease segregate with this marker? 1 I 2 5 1 6 2 1 3 4 3 1 3 2 4 1 4 2 II 6 Family based designs test sharing of alleles among all affected and unaffected individuals and compare the probability of transmission if linked to the affection status compared with no linkage. III NR R Recombination fraction  is 1/6=0.167

Recognizing recombinants Often samples are missing 2 1 3 4 3 1 3 2 4 1 4 2 I II Family based designs test sharing of alleles among all affected and unaffected individuals and compare the probability of transmission if linked to the affection status compared with no linkage. III NR R OR Recombination fraction  is 1/6=0.167 or 5/6=0.833

Recognizing recombinants Tracing additional family members can help II 2 1 3 4 3 1 3 2 4 1 4 2 1 5 1 6 5 6 Family based designs test sharing of alleles among all affected and unaffected individuals and compare the probability of transmission if linked to the affection status compared with no linkage. III NR R But are these identical by descent?

Requires a precise genetic model Which marker is the disease locus closest to? Lod scores Logarithm of odds (Lod) score Z Likelihood of loci being linked Likelihood of loci not being linked Z = log For the example pedigree with 1/6 recombinants: (1 - 0.167)5 x 0.167 (0.5)6 Z = log = 0.632 Family based designs test sharing of alleles among all affected and unaffected individuals and compare the probability of transmission if linked to the affection status compared with no linkage. Lod scores between -2 and +3 are inconclusive Below -2  exclusion Above +3  linkage Requires a precise genetic model

Which marker is the disease locus closest to? Multi-point lod scores chr 3p12-14 Waardenburg syndrome type 2 After Hughes et al. (1994) Nature Genet 7, 509-512

Multifactorial diseases (no simple Mendelian inheritance pattern) Sib-pair analysis 2 1 3 4 3 2 4 2 3 2 3 1 4 1 Number of shared parental alleles 2 1 1/4 1/2 probability Family based designs test sharing of alleles among all affected and unaffected individuals and compare the probability of transmission if linked to the affection status compared with no linkage.

Affected sib-pairs Which loci do the affected sibs share more often than expected by chance? 3 4 2 1 3 4 2 1 Number of shared parental alleles Number of shared parental alleles Family based designs test sharing of alleles among all affected and unaffected individuals and compare the probability of transmission if linked to the affection status compared with no linkage. 3 2 3 2 2 3 2 3 2 2 3 1 1

Detecting linkage in pedigrees can be complicated… One can attempt to identify recombination hotspots by analysing marker segregation in families. -family, samples from parents and offspring -stretches of DNA on homologous chromosomes from the father and the mother -each of the children inherit one chromosome from the father and one from the mother -occasionally, a child will inherit a chromosome that has undergone meiotic recombination The problem with this approach is that -

… and you need lots of meioses! -recombination events, even in hotspots, are rare -one would have to analyse a huge number of parent-to-offspring transmissions in order to find that one chromosome that has undergone crossover We use an alternative approach -

Association mapping in a population Cases vs. controls HLA-DR4 allele (UK) General population Rheumatoid arthritis patients 36% 78% Seek correlation between genotype and phenotype Allele B is associated with disease D if people who have D also have B more often than predicted from B’s frequency To test every polymorphism is too expensive

Linkage disequilibrium (LD) measures association between two alleles Mutation creates new variants A G A G T Initially, the new allele is in LD with nearby alleles LD value = 1 Recombination reshuffles existing variation A G T X looking at how recombination maintains genetic variability: First, mutation creates new variation. Here are shown tracts of DNA on two homologous chromosomes, with already one base difference between them. This kind of variation is referred to as SNPs. When mutation creates another SNP site, the two alleles on the same chromosome are initially always found together. If this chromosome is successful and spreads in the population, then we speak of LD between the two SNP sites. What recombination does is from one generation to the next, it reshuffles the combinations of alleles at different loci. If XOs occur in this interval, LD between the two loci is disrupted, and in a population not only this kind of chromosomes are found, but also these. LD diminishes If enough crossovers take place, the loci are in “free association” Commonly used LD measures: D’ and r2

Haplotypes are sets of markers inherited as a “package” meiotic recombination creates novel haplotypes Markers form haplotype blocks in the population instead of detecting recombinants in children, we detect recombinant molecules directly from sperm DNA This in principles gives you an unlimited number of meiosis to study You have millions and millions of sperm carrying the progenitor haplotypes Among them, you have the the rare recombinants, which are fished out using allele-specific PCR methods, that I’ll describe in a moment

LD is a measure of allelic association in a population 2 SNP loci on the same chromosome C/G A/T A G C T < 4 combinations -> LD T G Conversely: all 4 combinations -> low or no LD But also: population history, drift, selection…

Disease haplotypes shorten from one generation to the next

Recombination hotspots are key in shaping haplotype blocks Perhaps at least 90% of crossovers take place at highly localized hotspots HLA class II Recombination activity Haplotype blocks Kauppi et al. (2004) Nat Rev Genet 5, 413-424

How do you extract haplotypes from genotype data? C/G Blood DNA A T C G or ? Other family members A T C G Other individuals in population

HapMap project Examines haplotypes in four populations Data just released: A haplotype map of the human genome, Nature 437, 1299-1320 HapMap project Examines haplotypes in four populations DNA samples: 270 people in total Yoruba (Nigeria): 30 parent-child trios Whites with North and West European ancestry (USA): 30 trios Japan: 45 unrelated individuals China: 45 unrelated individuals Identify “haplotype tag SNPs” to minimize genotyping effort >3,500,000 SNPs typed in total

Limited within-block diversity Example: a 8.5-kb long block on chr 2, 36 SNPs typed In principle, could give rise to 236 different haplotypes Only seven different haplotypes found among 120 European chromosomes

Recombination hotspots are widespread and account for LD structure 7q21 The International HapMap Consortium

Pairwise tagging A T G A G A G C T C T C G C G C A C Tags: SNP 1 SNP 3 2 G/C 3 T/C 4 5 A/C 6 Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: A T G A G A G C T C T C G C G C A C So here’s just an illustration of tagging based on pairwise r2. 6 SNPs with 3 groups of SNPs that have high r2. This means we can pick only 3 tags without losing any information. These tags are also the tests in the association analysis. high r2 high r2 high r2 After Carlson et al. (2004) AJHG 74:106

The Common-Disease Common-Variant Hypothesis Says disease-predisposing variants will exist at relatively high frequency (i.e. >1%) in the population. are ancient alleles occurring on specific haplotypes. detectable in a case-control study using tagging SNPs. Alternative hypothesis says disease-predisposing alleles are sporadic new mutations, perhaps around the same genes, on different haplotypes. families with history of the same disease owe their condition to different mutations events.

Does same phenotype mean same genotype? Coding SNPs, nonsynonymous or synonymous “Regulatory” SNPs

Common Gene Variation in Complex Disease Case-control studies, comparing the frequencies of common gene variants can identify susceptibility and protective alleles Some have multiple identified genes (*) Phenotype IDDM* Alzheimer dementia Deep venous thrombosis Colorectal cancer NIDDM Gene HLA APOE F5 APC PPAR Variant DR3,4 E4 Leiden 3920A 12A

Other types of variation may also have a role in complex disease common copy number polymorphisms large scale rearrangements, deletions and insertions microsatellite expansions, small insertion/deletions etc.