Association mapping for mendelian, and complex disorders January 16Bafna, BfB.

Slides:



Advertisements
Similar presentations
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Advertisements

METHODS FOR HAPLOTYPE RECONSTRUCTION
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Dealing With Statistical Uncertainty
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Departments of Medicine and Biostatistics
Sampling distributions of alleles under models of neutral evolution.
Basics of Linkage Analysis
BMI 731- Winter 2005 Chapter1: SNP Analysis Catalin Barbacioru Department of Biomedical Informatics Ohio State University.
MALD Mapping by Admixture Linkage Disequilibrium.
31 January, 2 February, 2005 Chapter 6 Genetic Recombination in Eukaryotes Linkage and genetic diversity.
Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
BCOR 1020 Business Statistics
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Inferences About Process Quality
RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.
5-3 Inference on the Means of Two Populations, Variances Unknown
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Nonparametrics and goodness of fit Petter Mostad
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
E QUILIBRIA IN POPULATIONS CSE280Vineet Bafna Population data Recall that we often study a population in the form of a SNP matrix – Rows.
CS177 Lecture 10 SNPs and Human Genetic Variation
Chapter 16 The Chi-Square Statistic
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
CHI SQUARE TESTS.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
ANOVA, Regression and Multiple Regression March
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Vineet Bafna CSE280A CSE280Vineet Bafna. We will cover topics from Population Genetics. The focus will be on the use of algorithms for analyzing genetic.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
The p-value approach to Hypothesis Testing
Association tests. Basics of association testing Consider the evolutionary history of individuals proximal to the disease carrying mutation.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Genome Wide Association Studies using SNP
CHAPTER 29: Multiple Regression*
Estimating Recombination Rates
The coalescent with recombination (Chapter 5, Part 1)
Outline Cancer Progression Models
Presentation transcript:

Association mapping for mendelian, and complex disorders January 16Bafna, BfB

UG Bioinformatics specialization at UCSD January 16Bafna, BfB

Abstraction of a causal mutation January 16Bafna, BfB

Looking for the mutation in populations January 16Bafna, BfB A possible strategy is to collect cases (affected) and control individuals, and look for a mutation that consistently separates the two classes. Next, identify the gene.

Looking for the causal mutation in populations January 16Bafna, BfB Case Control Problem 1: many unrelated common mutations, around one every 1000bp

Case Control January 16Bafna, BfB

Looking for the causal mutation in populations January 16Bafna, BfB Case Control Problem 2: We may not sample the causal mutation.

How to hunt for disease genes We are guided by two simple facts governing these mutations 1. Nearby mutations are correlated 2. Distal mutations are not January 16Bafna, BfB Case Control

This lecture 1. The bottom line: How do these facts help in finding disease genes? 2. The genetics: why should this happen? 3. The computation 4. Challenge of complex diseases. January 16Bafna, BfB Case Control 1. Nearby mutations are correlated 2. Distal mutations are not

The basics of association mapping Sample a population of individuals at variant locations across the genome. Typically, these variants are single nucleotide polymorphisms (SNPs). Create a new bi-allelic variant corresponding to cases and controls, and test for correlations. By our assumptions, only the proximal variants will be correlated. Investigate genes near the correlated variants. January 16Bafna, BfB Case Control

So, why should the proximal SNPs be correlated, and distal SNPs not? January 16Bafna, BfB

A bit of evolution Consider a fixed population (of chromosomes) evolving in time. Each individual arises from a unique, randomly chosen parent from the previous generation January 16Bafna, BfB Time

Genealogy of a chromosomal population Current (extant) population Time January 16Bafna, BfB

Adding mutations January 16Bafna, BfB Infinite sites assumption: A mutation occurs at most once at a site.

SNPs January 16Bafna, BfB The collection of acquired mutations in the extant population describe the SNPs

Fixation and elimination Not all mutations survive. Some mutations get fixed, and are no longer polymorphic January 16Bafna, BfB

Removing extinct genealogies January 16Bafna, BfB

Removing fixed mutations January 16Bafna, BfB

The coalescent January 16Bafna, BfB

Disease mutation January 16Bafna, BfB We drop the ancestral chromosomes, and place the mutations on the internal branches.

Disease mutation A causal mutation creates a clade of affected descendants. January 16Bafna, BfB

Disease mutation Note that the tree (genealogy) is hidden. However, the underlying tree topology introduces a correlation between each pair of SNPs January 16Bafna, BfB

What have we learnt? The underlying genealogy creates a correlation between SNPs. By itself, this is not sufficient, because distal SNPs might also be correlated. Fortunately, for us the correlation between distal SNPs is quickly destroyed. January 16Bafna, BfB

Recombination January 16Bafna, BfB

Recombination In our idealized model, we assume that each individual chromosome chooses two parental chromosomes from the previous generation January 16Bafna, BfB

Multiple recombination change the local genealogy January 16Bafna, BfB

A bit of evolution Proximal SNPs are correlated, distal SNPs are not. The correlation (Linkage disequilibirium) decays rapidly after 20-50kb January 16Bafna, BfB

BASIC STATISTICS January 16Bafna, BfB

Testing for correlation In the absence of correlation January 16Bafna, BfB

Testing for correlation When correlated January 16Bafna, BfB

Assigning confidence January 16Bafna, BfB X X X X Expected Observed

Assigning confidence January 16Bafna, BfB X X X X Expected Observed

Assigning confidence January 16Bafna, BfB X X Expected Observed X X

STATISTICAL TESTS OF ASSOCIATION January 16Bafna, BfB

Tests for association: Pearson Case-control phenotype: –Build a 3X2 contingency table –Pearson test (2df)= CasesControls mm Mm MM O1O1 O2O2 O3O3 O4O4 O6O6 O5O5 January 16Bafna, BfB

The χ 2 test CasesControls mm Mm MM O1O1 O5O5 O3O3 O4O4 O2O2 O6O6 The statistic behaves like a χ 2 distribution. A p-value can be computed directly January 16Bafna, BfB

Χ 2 distribution properties A related distribution is the F-distribution January 16Bafna, BfB

Likelihood ratio Another way to check the extremeness of the distribution is by computing a (log) likelihood ratio. We have two competing hypothesis. Let N be the total number of observations January 16Bafna, BfB

LLR An LLR value close to 0, implies that the null hypothesis is true. Asymptotically, the LLR statistic also follows the chi-square distribution. January 16Bafna, BfB

Exact test The chi-square test does not work so well when the numbers are small. How can we compute an exact probability of seeing a specific distribution of values in the cells? Remember: we know the marginals (# cases, # controls, January 16Bafna, BfB

Fischer exact test CasesControls mm Mm MM a e cd b f Num: #ways of getting configuration (a,b,c,d,e,f) Den: #ways of ensuring that the row sums and column sums are fixed January 16Bafna, BfB

Fischer exact test Remember that the probability of seeing any specific values in the cells is going to be small. To get a p-value, we must sum over all similarly extreme values. How? January 16Bafna, BfB

Test for association: Fisher exact test Here P is the probability of seeing the exact count. The actual significance is computed by summing over all such tables that are at least this extreme. CasesControls mm Mm MM a e cd b f January 16Bafna, BfB

Test for association: Fisher exact test CasesControls mm Mm MM a e cd b f January 16Bafna, BfB

Continuous outcomes Instead of discrete (Case/control) data, we have real-valued phenotypes –Ex: Diastolic Blood Pressure In this case, how do we test for association January 16Bafna, BfB

Continuous outcome ANOVA Often, the phenotypes are not offered as case- controls but like a continuous variable –Ex: blood-pressure measurements Question: Are the mean values of the two groups significantly different? MMmm January 16Bafna, BfB

Two-sided t-test For two categories, ANOVA is also known as the t-test Assume that the variables from the two sets are drawn from Normal distributions –Different means, equal variances Null hypothesis is that they are both from the same distribution January 16Bafna, BfB

t-test continued January 16Bafna, BfB

Two-sample t-test As the variance is not known, we use an estimate S, defined by The T-statistic is given by Significant deviations from 0 are used to reject the Null hypothesis January 16Bafna, BfB

Two-sample t-test (unequal variances) If the variances cannot be assumed to be equal, we use The T-statistic is given by Significant deviations from 0 are used to reject the Null hypothesis January 16Bafna, BfB

CONFOUNDING ASSOCIATION January 16Bafna, BfB

Confounding association Association tests can be confounded in many ways. We will explore a few of these, at a high level, and point to a few algorithmic problems. January 16Bafna, BfB

Confounding association with population substructure January 16Bafna, BfB If the cases and controls are from different subpopulations, then sites with differing allele frequencies will confound association

The algorithmic problem Given a collection of individual genotypes, separate them into sub-populations. Idea: take markers that are very far apart so that no LD is possible. LD indicates structure. Problem: Partition individuals into sub-populations so that all correlation across pairs of distant markers is minimized. Penalty for increasing sub-populations? January 16Bafna, BfB

Confounding associations with genotypes January 16Bafna, BfB A recombination event Distinct haplotypes can create identical genotypes confounding association

Confounding association with interactions Individually, the markers do not correlate. Together, they perfectly predict genes. Find interacting partners that associate with genes January 16Bafna, BfB

Confounding association with rare variants Not only can we have multiple interacting SNPs, each SNP individually occurs with very low frequency (< 1%). Can you detect associations with rare variants? January 16Bafna, BfB

Other problems January 16Bafna, BfB Can we reconstruct the phylogeny? Useful for computing recombination bounds.

Conclusion As individual genomes are sequenced, the association of variations with phenotypes presents many confounding challenges. Some of these challenges can be modeled as algorithmic problems. Population genetics should be part of a bioinformatics undergraduate curriculum. January 16Bafna, BfB

Thank you Homework (due Monday, March 15) –Describe an algorithm to detect associations of interacting, rare-variants with a complex disease phenotype, in the presence of population substructure in the case-control population. January 16Bafna, BfB