Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick.

Slides:



Advertisements
Similar presentations
Presentation on Probability Distribution * Binomial * Chi-square
Advertisements

Quantitative Skills 4: The Chi-Square Test
What is a χ2 (Chi-square) test used for?
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Goodness-of-Fit Test.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
The Normal Distribution. n = 20,290  =  = Population.
Genome-wide association studies Usman Roshan. SNP Single nucleotide polymorphism Specific position and specific chromosome.
12.The Chi-square Test and the Analysis of the Contingency Tables 12.1Contingency Table 12.2A Words of Caution about Chi-Square Test.
Ch. 28 Chi-square test Used when the data are frequencies (counts) or proportions for 2 or more groups. Example 1.
LARGE SAMPLE TESTS ON PROPORTIONS
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Genome-wide association studies BNFO 601 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick.
(a) (b) CS PI After: , M  5.0 Before: , M  3.0 Before: , M  3.0 Is this difference statistically.
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
BNFO 602 Lecture 2 Usman Roshan. Bioinformatics problems Sequence alignment: oldest and still actively studied Genome-wide association studies: new problem,
Genome-wide association studies Usman Roshan. Recap Single nucleotide polymorphism Genome wide association studies –Relative risk, odds risk (or odds.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Goodness of Fit Test for Proportions of Multinomial Population Chi-square distribution Hypotheses test/Goodness of fit test.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
+ Quantitative Statistics: Chi-Square ScWk 242 – Session 7 Slides.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
Chapter 13 Chi-squared hypothesis testing. Summary.
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
Binomial distribution Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
Binomial Distributions Calculating the Probability of Success.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Testing means, part II The paired t-test. Outline of lecture Options in statistics –sometimes there is more than one option One-sample t-test: review.
Genome-wide association studies Usman Roshan. SNP Single nucleotide polymorphism Specific position and specific chromosome.
Chapter 16 – Categorical Data Analysis Math 22 Introductory Statistics.
Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
The binomial applied: absolute and relative risks, chi-square.
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Test of Goodness of Fit Lecture 43 Section 14.1 – 14.3 Fri, Apr 8, 2005.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Chapter Outline Goodness of Fit test Test of Independence.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Dan Piett STAT West Virginia University Lecture 12.
Probability Distributions, Discrete Random Variables
Chapter 14 Chi-Square Tests.  Hypothesis testing procedures for nominal variables (whose values are categories)  Focus on the number of people in different.
Statistics 300: Elementary Statistics Section 11-2.
The Binomial Distribution.  If a coin is tossed 4 times the possibilities of combinations are  HHHH  HHHT, HHTH, HTHH, THHHH  HHTT,HTHT, HTTH, THHT,
How do you know when your data aren’t “close enough”? …and hand grenades!
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Chapter 11 Chi-Square Procedures 11.1 Chi-Square Goodness of Fit.
Genome-wide association studies
Did Mendel fake is data? Do a quick internet search and can you find opinions that support or reject this point of view. Does it matter? Should it matter?
III. Statistics and chi-square How do you know if your data fits your hypothesis? (3:1, 9:3:3:1, etc.) For example, suppose you get the following data.
1 Outline 1.Count data 2.Properties of the multinomial experiment 3.Testing the null hypothesis 4.Examples.
Chapter 11 Chi Square Distribution and Its applications.
Chapter 13 Section 2. Chi-Square Test 1.Null hypothesis – written in words 2.Alternative hypothesis – written in words – always “different” 3.Alpha level.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Chi-Square (χ 2 ) Analysis Statistical Analysis of Genetic Data.
Chi Square Chi square is employed to test the difference between an actual sample and another hypothetical or previously established distribution such.
Test of Goodness of Fit Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007.
Applied statistics Usman Roshan.
The Chi-square Statistic
Applied statistics Usman Roshan.
The Binomial and Geometric Distributions
Statistical Analysis Chi-Square.
Overview and Chi-Square
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
Bernoulli Trials Two Possible Outcomes Trials are independent.
Applied Statistical and Optimization Models
Presentation transcript:

Genome-wide association studies BNFO 602 Roshan

Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick random humans with and without cancer (say breast cancer) –Perform SNP genotyping –Look for associated SNPs –Also called genome-wide association study

Case-control example Study of 100 people: –Case: 50 subjects with cancer –Control: 50 subjects without cancer Count number of alleles and form a contingency table #Allele1#Allele2 Case1090 Control298

Odds ratio Odds of allele 1 in cancer = a/b = e Odds of allele 1 in healthy = c/d = f Odds ratio of recessive in cancer vs healthy = e/f #Allele1#Allele2 Cancerab Healthycd

Example Odds of allele 1 in case = 15/35 Odds of allele 1 in control = 2/48 Odds ratio of allele 1 in case vs control = (15/35)/(2/48) = 10.3 #Allele1#Allele2 Case1535 Control248

Statistical test of association (P-values) P-value = probability of the observed data (or worse) under the null hypothesis Example: –Suppose we are given a series of coin-tosses –We feel that a biased coin produced the tosses –We can ask the following question: what is the probability that a fair coin produced the tosses? –If this probability is very small then we can say there is a small chance that a fair coin produced the observed tosses. –In this example the null hypothesis is the fair coin and the alternative hypothesis is the biased coin

Binomial distribution Bernoulli random variable: –Two outcomes: success of failure –Example: coin toss Binomial random variable: –Number of successes in a series of independent Bernoulli trials Example: –Probability of heads=0.5 –Given four coin tosses what is the probability of three heads? –Possible outcomes: HHHT, HHTH HTHH, HHHT –Each outcome has probability = 0.5^4 –Total probability = 4 * 0.5^4

Binomial distribution Bernoulli trial probability of success=p, probability of failure = 1-p Given n independent Bernoulli trials what is the probability of k successes? Binomial applet:

Hypothesis testing under Binomial hypothesis Null hypothesis: fair coin (probability of heads = probability of tails = 0.5) Data: HHHHTHTHHHHHHHTHTHTH P-value under null hypothesis = probability that #heads >= 15 This probability is Since it is below 0.05 we can reject the null hypothesis

Null hypothesis for case control contingency table We have two random variables: –X: disease status –A: allele type. Null hypothesis: the two variables are independent of each other (unrelated) Under independence –P(X=case and A=1)= P(X=case)P(A=1) Expected number of cases with allele 1 is –P(X=case)P(A=1)N –where N is total observations P(X=case)=(a+b)/N P(A=1)=(a+c)/N What is expected number of controls with allele 2? Do the probabilities sum to 1? #allele1#allele2 caseab controlcd

Chi-square statistic O i = observed frequency for i th outcome E i = expected frequency for i th outcome n = total outcomes The probability distribution of this statistic is given by the chi-square distribution with n-1 degrees of freedom. Proof can be found at

Chi-square Using chi-square we can test how well do observed values fit expected values computed under the independence hypothesis We can also test for the data under multinomial or multivariate normal distribution with probabilities given by the independence assumption. This would require cumulative distribution functions of multinomial and multi- variate normal which are hard to compute. Chi-square p-values are easier to compute

Case control #allele1#allele2 caseab controlcd E1: expected cases with allele 1 E2: expected cases with allele 2 E3: expected controls with allele 1 E4: expected controls with allele 2 N = a + b + c + d E1 = ((a+b)/N)((a+c)/N) N = (a+b)(a+c)/N E2 = (a+b)(b+d)/N E3 = (c+d)(a+c)/N E4 = (c+d)(b+d)/N Now compute chi-square statistic

Chi-square statistic #Allele1#Allele2 Case1535 Control248 Compute expected values and chi-square statistic Compute chi-square p-value by referring to chi-square distribution