Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.

Slides:



Advertisements
Similar presentations
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
Objectives (BPS chapter 24)
T-Tests.
PSY 307 – Statistics for the Behavioral Sciences
Topic 2: Statistical Concepts and Market Returns
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Lecture 9: One Way ANOVA Between Subjects
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Introduction to Probability and Statistics Linear Regression and Correlation.
Chapter 11: Inference for Distributions
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Today Concepts underlying inferential statistics
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
5-3 Inference on the Means of Two Populations, Variances Unknown
Richard M. Jacobs, OSA, Ph.D.
PSY 307 – Statistics for the Behavioral Sciences
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
AM Recitation 2/10/11.
Inference for regression - Simple linear regression
Hypothesis Testing:.
Correlation and Linear Regression
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Inferential Statistics & Test of Significance
Statistical Analysis Statistical Analysis
Education 793 Class Notes T-tests 29 October 2003.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
More About Significance Tests
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
ANOVA (Analysis of Variance) by Aziza Munir
DATA IDENTIFICATION AND ANALYSIS. Introduction  During design phase of a study, the investigator must decide which type of data will be collected and.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
1 Analysis of Variance Chapter 14 2 Introduction Analysis of variance helps compare two or more populations of quantitative data. Specifically, we are.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Jeopardy Hypothesis Testing t-test Basics t for Indep. Samples Related Samples t— Didn’t cover— Skip for now Ancient History $100 $200$200 $300 $500 $400.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
ANOVA P OST ANOVA TEST 541 PHL By… Asma Al-Oneazi Supervised by… Dr. Amal Fatani King Saud University Pharmacy College Pharmacology Department.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 13 Understanding research results: statistical inference.
Testing Differences in Means (t-tests) Dr. Richard Jackson © Mercer University 2005 All Rights Reserved.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Inferential Statistics Psych 231: Research Methods in Psychology.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Dr Hidayathulla Shaikh. Objectives At the end of the lecture student should be able to – Discuss normal curve Classify parametric and non parametric tests.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
CHAPTER 15: THE NUTS AND BOLTS OF USING STATISTICS.
Regression and Correlation
Hypothesis testing using contrasts
I. Statistical Tests: Why do we use them? What do they involve?
What are their purposes? What kinds?
Presentation transcript:

Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression

Research Question Do nicotine-exposed rats have different X gene expression than control rats in ventral tegmental area?  Design an experiment in which treatment rats (N>2) are exposed to nicotine and control rats (N>2) are exposed to saline.  Collect RNA from VTA, convert to cDNA  Determine the amount of X transcript in each individual.  Perform a test of means considering the variability within each group.

Observed difference between groups May be due to  Treatment  Chance

Hypothesis Testing Null hypothesis: There is no difference between the means of the groups. Alternative hypothesis: Means of the groups are different.

Hypothesis testing You can not accept null hypothesis  You can reject it  You can support it

P-value The ‘P’ stands for probability, and measures how likely it is that any observed difference between groups is due to chance, alone.

P-value there is a significant difference between groups if the P value is small enough (e.g., <0.05). P value equals to the probability of type I error. Type I error: wrongly concluding that there is a difference between groups (false positive). Type II error: wrongly concluding that there is no difference between groups (false negative).

Multiple tests on the same data Expression data on multiple genes from the same individuals Subsets of genes are coregulated thus they are not independent. Such data requires multiple tests.

Why not do multiple t-tests? Or if you do, adjust the p-values Because it increases type I error: a study involving four treatments, there are six possible pairwise comparisons. If the chance of a type I error in one such comparison is 0.05, then the chance of not committing a type I error is 1 – 0.05 = then the chance of not committing a type I error in any one of them is = Cumulative type I error = =0.26

Normal Distribution it is entirely defined by two quantities: its mean and its standard deviation (SD).  The mean determines where the peak occurs and  the SD determines the shape of the curve.

Curves: same mean, different stds

Rules of normal distribution 68.3% of the distribution falls within 1 SD of the mean (i.e. between mean – SD and mean + SD); 95.4% of the distribution falls between mean – 2 SD and mean + 2 SD; 99.7% of the distribution falls between mean – 3 SD and mean + 3 SD.

Most commonly used rule 95% of the distribution falls between mean – 1.96 SD and mean SD If the data are normally distributed, one can use a range (confidence interval) within which 95% of the data falls into.

A sample Samples vary Samples are collected in limited numbers They are representatives of a population. A sample:  E.g., nicotine treated rat RNA

Sample means Consider all possible samples of fixed size (n) drawn from a population. Each of these samples has its own mean and these means will vary between samples. Each sample will have their own distribution, thus their own std.

Population mean The mean of all the sample means is equal to the population mean (  ). SD of the sample means measures the deviation of individual sample means from the population mean (  )

Standard error It reflects the effect of sample size, larger the SE, either the variation is high or sample size is small.

Confidence Intervals a confidence interval gives a range of values within which it is likely that the true population value lies. It is defined as follows:  95% confidence interval (sample mean – 1.96 SE) to (sample mean SE).  a 99% confidence interval (calculated as mean ± 2.56 SE)

T-distribution The t-distribution is similar in shape to the Normal distribution, being symmetrical and unimodal, but is generally more spread out with longer tails. The exact shape depends on a quantity known as the ‘degrees of freedom’, which in this context is equal to  the sample size minus 1.

T-distribution

One-sample t-test Null hypothesis: Sample mean does not differ from hypothesized mean, e.g., 0 (Ho:  =0) A t-statistics (t) is calculated. t is the number of SEs that separate the sample mean from the hypothesized value. The associated P value is obtained by comparison with the t distribution. Larger the t-statistics, lower the probability of obtaining such a large value, thus p is smaller and more significant.

Paired t-test Used with paired data. Paired data arise in a number of different situations,  a matched case–control study in which individual cases and controls are matched to each other, or  A repeat measures study in which some measurement is made on the same set of individuals on more than one occasion

Paired t-test

Two-sample t-test Comparison of two groups with unpaired data.  E.g., comparison of individuals of treatment and those of control for a particular variable.  Now there are two independent populations thus two STDs

Calculation of pooled STD The pooled SD for the difference in means is calculated as follows:

Calculation of pooled SE the combined SE gives more weight to the larger sample size (if sample sizes are unequal) because this is likely to be more reliable. The pooled SD for the difference in means is calculated as follows:

Two sample T-test Comparison of means of two groups based on a t-statistics and its student’s t-distribution.  dividing the difference between the sample means by the standard error of the difference.

T-statistic A P value may be obtained by comparison with the t distribution on n1 + n2 – 2 degrees of freedom. Again, the larger the t statistic, the smaller the P value will be.

Example X-gene exprs.TumorControl # of samples Mean8195 Std1819

Calculation of SD

Calculation of SE

T-statistic t = (95-81)/2.41 = 14/2.41 = 5.81, with a corresponding P value less than  Reject null hypothesis that states that sample means do not differ.

Analysis of Variance ANOVA  A technique for analyzing the way in which the mean of a variable is affected by different types and combinations of factors. E.g., the effect of three different diets on total serum cholesterol

Sample Experiment Variance:

Sum of squares calculations totalwithinbetween

Degrees of freedom

Sources of variation P value of means that at least two of the treatment groups are different.

Multiple Tests Post hoc comparisons between pairs of treatments. Overall type I error rate increases by increasing number of pairwise comparisons. One has to maintain the 0.05 type I error rate after all of the comparisons.

Bonferroni Adjustment 0.05/#of tests Too conservative

NonParametric methods Many statistical methods require assumptions.  T-test requires samples are normally distributed.  They require transformations Nonparametric methods require very little or no assumptions.

Wilcoxon signed rank test for paired data

Wilcoxon signed rank test

Central venous oxygen saturation on admission and after 6 h into ICU. Take the difference between the paired data points.  Patients have SvO2 values on admission and after 6 hours.

Central venous oxygen saturation on admission and after 6 h into ICU. Rank differences regardless of their sign. Give a sign to the ranked differences

Calculate Sum of positive ranks (R+) Sum of negative ranks (R-)

Sum of positive and negative ranks

Critical values for WSR test when n = 10 5

Wilcoxon sum or Mann- Whitney test Wilcoxon signed rank is good for paired data. For unpaired data, wilcoxon sum test is used.

Steps of Wilcoxon rank-sum test

Total drug doses in patients with a 3 to 5 day stay in intensive care unit. Rank all observations in the increasing order regardless of groupings Use average rank if the values tie Add up the ranks Select the smaller value, calculate a p- value for it.

Critical values

Correlation and Regression Correlation quantifies the strength of the relationship between two paired samples. Regression expresses the relationship in the form of an equation. Example: whether two genes, X and Y are coregulated, or the expression level of gene X can be predicted based on the expression level of gene Y.

Product moment correlation r lies between -1 and +1

Age and urea for 20 patients in emergency unit

Scattergram r = 0.62

Confidence intervals around r

Confidence of r

Misuse of correlation There may be a third variable both of the variables are related to It does not imply causation. A nonlinear relationship may exist.

Regression

Method of least squares The regression line is obtained using the method of least squares. Any line y = a + bx that we draw through the points gives a predicted or fitted value of y for each value of x in the dataset. For a particular value of x the vertical difference between the observed and the fitted value of y is known as the deviation or residual. The method least squares finds the values a and b that minimizes the sum of squares of all deviations.

Age and urea level

Residuals

Method of least squares