Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.

Slides:



Advertisements
Similar presentations
Multiple Choice Questions
Advertisements

Mapping genes with LOD score method
Genetic Heterogeneity Taken from: Advanced Topics in Linkage Analysis. Ch. 27 Presented by: Natalie Aizenberg Assaf Chen.
Gene Frequency and LINKAGE Gregory Kovriga & Alex Ratt.
Unit 5 Genetics Terry Kotrla, MS, MT(ASCP)BB. Terminology  Genes  Chromosomes  Autosome  Sex chromosome  Locus  Alleles  Homozygous  Heterozygous.
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Lecture for Tuesday September 23, 2003 What’s due? CH2 problem set Assignments: CH4 problems: 1-5, 8, 10, 11, 14, 16, 17, 21, 22 What’s due Thursday 9/25?
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
GGAW - Oct, 2001M-W LIN Study Design for Linkage, Association and TDT Studies 林明薇 Ming-Wei Lin, PhD 陽明大學醫學系家庭醫學科 台北榮民總醫院教學研究部.
Chapter 2: Hardy-Weinberg Gene frequency Genotype frequency Gene counting method Square root method Hardy-Weinberg low Sex-linked inheritance Linkage and.
Human Genetics Genetic Epidemiology.
Joint Linkage and Linkage Disequilibrium Mapping
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.
Robust and powerful sibpair test for rare variant association
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
11.4 Hardy-Wineberg Equilibrium. Equation - used to predict genotype frequencies in a population Predicted genotype frequencies are compared with Actual.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Population Genetics is the study of the genetic
A gene is composed of strings of bases (A,G, C, T) held together by a sugar phosphate backbone. Reminder - nucleotides are the building blocks.
Karri Silventoinen University of Helsinki Osaka University.
Family-Based Association Tests
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Population Genetics: Chapter 3 Epidemiology 217 January 16, 2011.
Chapter 3 – Basic Principles of Heredity. Johann Gregor Mendel (1822 – 1884) Pisum sativum Rapid growth; lots of offspring Self fertilize with a single.
Lecture 5: Major Genes, Polygenes, and QTLs
AP Biology Lab 7: Genetics (Fly Lab). AP Biology Lab 7: Genetics (Fly Lab)  Description  given fly of unknown genotype use crosses to determine mode.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
INTRODUCTION TO ASSOCIATION MAPPING
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
1 Population Genetics Definitions of Important Terms Population: group of individuals of one species, living in a prescribed geographical area Subpopulation:
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 15: Linkage Analysis VII
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Allele Frequencies: Staying Constant Chapter 14. What is Allele Frequency? How frequent any allele is in a given population: –Within one race –Within.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
A Transmission/disequilibrium Test for Ordinal Traits in Nuclear Families and a Unified Approach for Association Studies Heping Zhang, Xueqin Wang and.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Gene mapping by association 3/4/04 Biomath/HG 207B/Biostat 237.
Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering.
Please feel free to chat amongst yourselves until we begin at the top of the hour.
AP Biology Heredity PowerPoint presentation text copied directly from NJCTL with corrections made as needed. Graphics may have been substituted with a.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
MULTIPLE GENES AND QUANTITATIVE TRAITS
Part 2: Genetics, monohybrid vs. Dihybrid crosses, Chi Square
Migrant Studies Migrant Studies: vary environment, keep genetics constant: Evaluate incidence of disorder among ethnically-similar individuals living.
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
MULTIPLE GENES AND QUANTITATIVE TRAITS
Error Checking for Linkage Analyses
Lecture 4: Testing for Departures from Hardy-Weinberg Equilibrium
Lecture 10: QTL Mapping II: Outbred Populations
Lecture 9: QTL Mapping II: Outbred Populations
Linkage Analysis Problems
Association Design Begins with KNOWN polymorphism theoretically expected to be associated with the trait (e.g., DRD2 and schizophrenia). Genotypes.
Presentation transcript:

Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk

REVIEW Case-Control – Derivation VIII

CORRECTION Case-Control – Hypothesis Testing  Recall that the trait allele frequencies are set in stone to calculate the trait prevalence K.  Model 1 (HWE, no LE): There are 2n distinct haplotypes, thus there are 2n-2 degrees of freedom.  Restricted Model 0 (HWE, LE): There are n distinct alleles, thus there are n – 1 degrees of freedom.  2(lnL 1 – lnL 2 ) with n – 1 degrees of freedom tests for LE under the assumption of HWE.  Calculate the mle for model 1 with a modified EM.

Estimating Genetic Parameters   = p 1, p 2, f 11, f 12, f 22 are genetic parameters underlying the theoretical distribution of genotypes in the case-control approach.  When the genetic model and thus  are unknown, then one resorts to contingency tables.  Can the data be used to estimate  ?

Estimating Genetic Parameters  One could estimate the haplotype frequencies h 1i, h 2i simultaneously with the genetic parameters .  Then, 2[lnL(h 1i, h 2i,  ) – lnL(q i,  )] is a statistic for testing linkage equilibrium without conditioning on known genetic parameters.  However, the G statistics above has an unknown distribution because when there is linkage equilibrium, then the marker locus and disease locus are independent and L(q i,  ) is actually independent of .

Spurious Associations (4.6.4)  Population subdivision, or any of the other causes of linkage disequilibrium we discussed last time, can cause spurious associations, i.e. linkage disequilibrium not caused by tight linkage.  Population subdivision is probably the most common source of spurious associations.  Other sources of spurious association cannot be accommodated so easily, except to know your population and know what is greater than “normal” association in this population.

Population Subdivision – Identifying Subpopulations  Identify subpopulations where matings occur randomly. These are subpopulations which will differ in trait and marker allele frequencies. Sometimes, a priori information is available about subpopulations in which these allele frequencies differ.  Often subdivide by ethnicity, location, religion, social class, and age.

Population Subdivision - Sampling Designs  Sample only from one identified subdivision.  Match case and control by subdivision.  In complex traits, there may be multiple loci associated with a disease, and these loci may vary between subpopulations. Which sampling scheme do you recommend?

Hidden Population Stratification  One cannot anticipate all sources of spurious association.  Internal checks may indicate presence of remaining spurious association. Test HWE on individual markers. Test markers on different chromosomes for spurious association. Trait loci that associate tightly with multiple distant markers are a sign of trouble.

Using Families – Removing Spurious Association  The effect of spurious association can be removed by comparing the chromosomes of affected children to their relatives.  The most common relative to use? Parents.  This does NOT mean that we are returning to family-based linkage analysis. As you will see, we still use information from multiple generations of recombination.

Moving to Biallelic Model linkage equilibrium linkage disequilibrium

TDT – Assumptions  Depends on the presence of linkage disequilibrium at the population level.  Assumes random mating.

TDT – Genetic Model  AD Allele Frequencies P(A) = p A P(a) = 1 – p A P(D) = p D P(d) = 1 - p D Linkage Disequilibrium D AB = h AD - p A p D

TDT – Haplotype Frequencies

TDT – The Test  Assume we randomly sample affected individuals and then genotype that individual and his/her two parents for marker A.  Take those families where the parents are heterozygous for the marker.  Record the data as transmitted and nontransmitted alleles. A table as shown on the next slide is typically used.

TDT – The Table Nontransmitted AaTotal Transmitted A-t 12 - at Total--2N2N N is the number of affected children sampled.

TDT – Filling the Table Aa AA n 12 += _____ n 21 += _____

TDT – Filling the Table Aa n 12 += _____ n 21 += _____

TDT – Statistic

TDT – Derivation Nontransmitted Transmitted Under H 0 the expected frequencies are equal.

TDT – Example  Search for Insulin-Dependent Diabetes Mellitus (IDDM) (Spielman et al. 1993).  94 families included in study  62 families had heterozygous parents at a marker on chromosome 11 with possible alleles “1” and “X”.  78 “1” alleles were transmitted to affected children = 46 “X” alleles were transmitted to affected children.

TDT – Example (cont) Nontransmitted 1XTotal Transmitted X46-- Total--124

TDT - Power  How do we calculate the power of a TDT test? Make assumptions

TDT – Power (cont)  Statistical power is given by

TDT – Power (cont)  Power increases with sample size (number affected children).  Power increases with as recombination fraction decreases.  Power increases as linkage disequilibrium in population increases.  Power increases as trait allele frequency decreases (trait is rare).  Power is only slightly affected by marker allele frequencies.

TDT – Power Compared  TDT has lower power than a simple test for linkage disequilibrium in a random population sample.  TDT loses power by ignoring some of the data (only heterozygous parents considered) and because homozygous parents provide much information about linkage disequilibrium.  Why is TDT used then?

TDT – Advantages  TDT is a test for linkage and linkage disequilibrium, not just linkage disequilibrium.  Linkage disequilibrium from non-linkage sources can only change the genotypes of the parents.  TDT test transmission of heterozygous parents, and only linkage can result in significant result.  TDT can also detect segregation distortion at the marker locus. Another reason to check marker alleles for segregation distortion.

TDT – Advantages (cont) A D a D unlinked A D A d A D a D linked A D a D

Relative Risk Method  Analog to the general disequilibrium test on random population sample when dominant or recessive trait or marker (two genotype classes indistinguishable).  Observe two independent groups, defined by their marker genotype.  Determine the risk of being affected conditional on group P(affected | marker group).  Then, the relative risk is

Relative Risk – Data Group AA or AaaaTotal Status Affectedn 11 n 12 n 1+ Unaffectedn 21 n 22 n 2+ Totaln +1 n +2 2N2N

Relative Risk – Statistic

Relative Risk – Conditional Probabilities

Relative Risk – Null Distribution

Relative Risk – Statistical Test  Chi-squared test for independence on the table.  Likelihood ratio test: 2 degrees of freedom Group AA or AaaaTotal Status Affectedn 11 n 12 n 1+ Unaffectedn 21 n 22 n 2+ Totaln +1 n +2 2N2N

Haplotype Relative Risk ABBC BB case genotype: _____ control genotype: _____

Haplotype-Based HRR (HHRR)  Focus on alleles rather than genotypes.  There are two transmitted and two non-transmitted alleles in every pair of parents with one affected offspring.  Treat the two allele samples as independent case- control samples.

HHRR – II ABBC BB case alleles: _____ control alleles: _____

HHRR – III Untransmitted 12Total Transmitted 1t 11 t 12 t 1+ 2t 21 t 22 t 2+ Totalt +1 t +2 t

HRR & HHRR  Most powerful when linkage is 0.  Both assume random mating when they assume the parents provide an independent control genotype or alleles.  HHRR is more powerful than TDT because it uses information from homozygous parents.  HHRR, is valid test statistic for D AD = 0 and  =0.