Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 7 – T-tests Marshall University Genomics Core Facility.

Slides:



Advertisements
Similar presentations
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Advertisements

Inference for Regression
Chapter 14 Comparing two groups Dr Richard Bußmann.
Testing means, part III The two-sample t-test. Sample Null hypothesis The population mean is equal to  o One-sample t-test Test statistic Null distribution.
T-Tests.
Chapter 8 Estimation: Additional Topics
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Lecture 13 – Tues, Oct 21 Comparisons Among Several Groups – Introduction (Case Study 5.1.1) Comparing Any Two of the Several Means (Chapter 5.2) The One-Way.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Chapter Goals After completing this chapter, you should be able to:
BHS Methods in Behavioral Sciences I
Tuesday, October 22 Interval estimation. Independent samples t-test for the difference between two means. Matched samples t-test.
Lecture 9: One Way ANOVA Between Subjects
Chap 9-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 9 Estimation: Additional Topics Statistics for Business and Economics.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 10-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
T-Tests Lecture: Nov. 6, 2002.
Chapter 11: Inference for Distributions
5-3 Inference on the Means of Two Populations, Variances Unknown
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Two-Sample Tests Basic Business Statistics 10 th Edition.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Power and Sample Size IF IF the null hypothesis H 0 : μ = μ 0 is true, then we should expect a random sample mean to lie in its “acceptance region” with.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
Chapter 12: Analysis of Variance
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
AM Recitation 2/10/11.
Hypothesis testing – mean differences between populations
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
Linear Regression Inference
Statistical Analysis Statistical Analysis
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
T-distribution & comparison of means Z as test statistic Use a Z-statistic only if you know the population standard deviation (σ). Z-statistic converts.
Comparing Two Population Means
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Comparing Two Proportions
Pengujian Hipotesis Dua Populasi By. Nurvita Arumsari, Ssi, MSi.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 8 – Comparing Proportions Marshall University Genomics.
Statistical Analysis Topic – Math skills requirements.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Parametric tests (independent t- test and paired t-test & ANOVA) Dr. Omar Al Jadaan.
Confidence intervals and hypothesis testing Petter Mostad
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.4 Analyzing Dependent Samples.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
© Copyright McGraw-Hill 2000
I271B The t distribution and the independent sample t-test.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
The t-distribution William Gosset lived from 1876 to 1937 Gosset invented the t -test to handle small samples for quality control in brewing. He wrote.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Kin 304 Inferential Statistics Probability Level for Acceptance Type I and II Errors One and Two-Tailed tests Critical value of the test statistic “Statistics.
ENGR 610 Applied Statistics Fall Week 7 Marshall University CITE Jack Smith.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Today’s lesson (Chapter 12) Paired experimental designs Paired t-test Confidence interval for E(W-Y)
Statistics 22 Comparing Two Proportions. Comparisons between two percentages are much more common than questions about isolated percentages. And they.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 15: Sample size and Power Marshall University Genomics.
Dependent-Samples t-Test
Lecture Slides Elementary Statistics Twelfth Edition
Kin 304 Inferential Statistics
Presentation transcript:

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 7 – T-tests Marshall University Genomics Core Facility

T-tests T-tests refer to a family of statistical tests, in which a mean, or difference in two means, is assumed to be sampled from a T-distribution As a statistical test, T-tests compute p-values – The probability of seeing a mean, or difference of means, this large, assuming the null hypothesis is true – Interpreting a t-test involves knowing the null hypothesis Marshall University School of Medicine

Types of T-test One-class T-test: – Null hypothesis is that the mean of a set of values is equal to some fixed value Two-class T-tests: – Two groups of values Unpaired T-test – Most common T-test – Null hypothesis is that the values in each group are sampled from distributions with equal means Paired T-test – Each sample in the first group is paired with a sample in the second group – Null hypothesis is that the mean of the difference between each pair is zero Marshall University School of Medicine

Example: One-class T-test Recall our body-temperature data from earlier – n=130 samples of body temperature – Mean m=36.82C, SD s=0.41C We wanted to use this to test the hypothesis that mean body temperature was μ=37C Under the assumptions of the one-class t-test, the value t=(μ-m)/(s/√n) follows a t- distribution with n-1 degrees of freedom For this example, t=5.006 Marshall University School of Medicine

Computing a p-value for a given t To get a p-value for t=5.006, we either use software or a table – Need to know the degrees of freedom – In this case, d.f.=130-1=129 Tables either give the probability value for a given t and df, or critical t- values for given probabilities and df From tables or software, the probability that t<=5.006 is p= We want to know the probability of seeing a result this extreme, assuming the mean is 37C, i.e. assuming t follows a t-distribution with 129 d.f. This is the probability either that t>5.006 or that t< – A two-tailed, one-class t-test P(t>5.006)= = So P(t<-5.006)= and p=2 x = Marshall University School of Medicine

Assumptions for a one-class t-test The one-class t-test is accurate under the following assumptions: – The samples are random (or representative) – The observations are independent – The data are accurate – The data are sampled from a population that is normally distributed Marshall University School of Medicine

One-class T-test and the Confidence Interval We saw earlier how to compute a confidence interval for the mean of this data set – Calculate w=t*s/√n – The confidence interval is from m-w to m+w – t* is the value from the t-distribution for which P(t>t* or t<-t*)=1-confidence e.g. for a 95% confidence interval we want P(t>t* or t<-t*) =0.05 So P(t>t*)= From tables or software, t*= This gives a 95% confidence interval of [36.75, 36.89] Knowing the 95% confidence interval does not contain the null hypothesis value of 37 is equivalent to knowing the p- value is less than 0.05 Marshall University School of Medicine

Two-class unpaired T-test: Example For an example of a two-class, unpaired T-test, consider the GRHL2 expression data we saw earlier from Cieply et al., Cancer Research Compared expression of GRHL2 in different breast cancer cell lines, classified as Basal A, Basal B, or Luminal. – Compare the Basal A expression to the Basal B expression Marshall University School of Medicine

GRHL2 Expression Data Marshall University School of Medicine Basal-ABasal-B Cell lineLog 2 Expression Cell lineLog 2 Expression HCC BT BT HBL HCC HCC HCC HCC HCC Hs 578T HCC MCF10a0.36 HCC MCF12a1.16 HCC MDA-MB HCC MDA-MB HCC MDA-MB MDA-MB MDA-MB SUM-190PT1.48 SUM-1315M SUM-225CWN3.14 SUM-149PT1 SUM-159PT0.581

GRHL2 Expression Data Marshall University School of Medicine

How an unpaired t-test works An unpaired t-test works by computing the difference of the means of the two samples Assuming the null hypothesis – that the difference of the two means is zero – the difference of the sample means, divided by a pooled standard error of the mean, will be distributed with a t-distribution Marshall University School of Medicine

Unpaired t-test for the GRHL2 data Marshall University School of Medicine

Assumptions for the unpaired T-test The unpaired T-test works along essentially the same assumptions as the one-class T-test: – The samples are random or representative – The observations are independent – The data are accurate – The values in the populations are at least approximately normally distributed Additionally, the t-test we used here assumes: – The the populations have the same standard deviation Marshall University School of Medicine

Assumption of equal variances In the Basal A vs Basal B GRHL2 comparison, the Basal B samples have higher SD (0.7859) than the Basal A samples (0.4463) The t-test we ran assumed the samples came from populations with equal variances (i.e. equals standard deviations) A test can be run to see if the data are consistent with the assumption of equal variances – The distribution of the square of the ratio of the standard deviations is known under the assumption that the population variances are equal Marshall University School of Medicine

If the assumption of equal variances is violated A modified t-test can be used, which doesn’t make the assumption of equal variances – Called the “Welch T-test” – Has less power than the standard unpaired t-test As usual, testing your data set in order to decide which test to use can give misleading results – Typically will give over-optimistic p-values In the ideal world, we would run an experiment specifically to determine if the assumption of equal variances holds, then use that to determine how to analyze our real experiment Marshall University School of Medicine

Rules of thumb for the assumption of equal variances Unequal variances will only badly affect the t-test if the number of samples in each group is small and unequal – In other cases the t-test is very robust to violations of this assumption In practice, I do the following: – If the number of samples is equal, I use the regular t-test – If the number of samples in both groups is at least 5, no matter if they are equal, I use the regular t-test – If there is reason to believe the variances should be equal (e.g. if all the variance comes from technical replicates), I use the standard t-test – Otherwise, I use the Welch T-test Marshall University School of Medicine

95% Confidence Intervals and Unpaired t-tests Marshall University School of Medicine

95% Confidence Intervals and unpaired T-tests The unpaired t-test results computed the difference between the means and the 95% confidence interval for that difference – For this example the 95% confidence interval of the difference of the means was [-2.373, ] If this confidence interval doesn’t contain zero, this is equivalent to p<0.05 We can also compute the confidence interval for each mean independently If these confidence intervals do not overlap, then the p value is definitely less than 0.05 – In fact, it must be a lot less… If the confidence intervals do overlap, then p may or may not be less than 0.05 – Cannot deduce anything from the error bars in this case Marshall University School of Medicine

Other error bars and statistical significance What if the bar chart uses SD or SEM for the error bars SD tells us about the amount of scatter in the data – Nothing about the precision with which the mean is measured Overlapping, or non-overlapping, SD error bars have nothing to do with statistical significance SEM measures the precision with which we approximate the mean – But interpretation depends on knowing the sample size – We can deduce the following: If SEM error bars overlap, then the difference is definitely not statistically significant at p=0.05 (in fact, p is much bigger…) If SEM error bars do not overlap, the p may or may not be less than 0.05 Marshall University School of Medicine

Error bars and statistical significance summary Error bar typeConclusion if overlappingConclusion if not overlapping 95% CINo conclusionp<<0.05 SDNo conclusion SEMp>>0.05No conclusion Marshall University School of Medicine

Paired t-tests Paired t-tests are used when the comparison is between samples which are paired between the groups – Before and after treatments on a set of patients Pair the “before value” on patient A with the “after value” on patient A The “before value” on patient B with the “after value” on patient B, etc – Studies in which subjects are recruited to two groups in a matched fashion Match a control patient with a treatment patient based on age, sex, weight, height… Difficult type of study to perform – Twin or sibling studies – Lab experiments in which treated and control samples are handled in parallel Plate cells, divide into two, treat one half and use the other as control Repeat the next day with another plate, etc Marshall University School of Medicine

Example In a recent experiment, we performed expression profiling on a set of eight mice – Should not use t-test here without correcting for multiple hypotheses, but this is a good example for demonstration Four litters of mice were bred, and two male mice selected (at random if necessary) from each litter One mouse from each pair was treated and one was used as a control Analyzing these data with a paired test has the potential to eliminate any litter-litter variation Marshall University School of Medicine

Example data LitterControlTreated Litter Litter Litter Litter Marshall University School of Medicine Actual data are read counts for the gene of interest After sequencing, align all reads to the genome Count the number of reads that align to each gene for each sample

Paired t-test There are two distinct null hypotheses we can make about paired data – Both say “the data is no different between the two groups” – One is that the difference between the values in the groups is zero (“paired t-test”) – The other is that the ratio of the values in the groups is 1 (“ratio paired t-test”) Marshall University School of Medicine

Paired T-test results Marshall University School of Medicine

How a paired t-test works A paired t-test is really just a one-class t-test! – Computes the differences for each pair – And then tests the null hypothesis that those differences are samples from a normal distribution with mean zero The confidence interval is just the confidence interval of the mean differences Marshall University School of Medicine

Ratio paired t-tests The ratio paired t-test tests the null hypothesis that the ratio of the paired values is 1 This is done simply by a mathematical trick: take the log of all the ratios – Then perform a regular paired t-test with the log ratios instead of the differences – Log(1)=0 – Software performing a ratio paired t-test takes care of computing the logs, performing the test, and then transforming the mean difference of log ratios, confidence interval of that mean, etc, back to ratio values Marshall University School of Medicine

Graphing paired data Plotting bar charts, or even column-scatter plots, of paired data does not show the pairing between the data values A better presentation is a connected column scatter plot – Column scatter plot with lines connecting the paired data points Marshall University School of Medicine

Connected column scatter plot Marshall University School of Medicine

t-test summary t-tests are a family of statistical hypothesis tests Generate a p-value – Remember how to interpret! Null hypotheses: – For a one class t-test, the null hypothesis is that the samples are drawn from a normally-distributed population with a specified mean – For an unpaired t-test, the null hypothesis is that the samples are drawn from two normally-distributed populations with equal means – For a paired t-test, the null hypothesis is that the differences between matched values are samples of a normally-distributed population with mean zero Equivalent to a one class t-test on the differences between matched values Marshall University School of Medicine