What we’ll cover today Transformations Inferential statistics

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
T-Tests.
t-Tests Overview of t-Tests How a t-Test Works How a t-Test Works Single-Sample t Single-Sample t Independent Samples t Independent Samples t Paired.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Inferential Stats for Two-Group Designs. Inferential Statistics Used to infer conclusions about the population based on data collected from sample Do.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 9: One Way ANOVA Between Subjects
Topic 3: Regression.
Data Analysis Statistics. Inferential statistics.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Today Concepts underlying inferential statistics
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Business Statistics - QBM117 Statistical inference for regression.
Simple Linear Regression Analysis
Correlation & Regression
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Selecting the Correct Statistical Test
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
T-test Mechanics. Z-score If we know the population mean and standard deviation, for any value of X we can compute a z-score Z-score tells us how far.
Descriptive Statistics e.g.,frequencies, percentiles, mean, median, mode, ranges, inter-quartile ranges, sds, Zs Describe data Inferential Statistics e.g.,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
ANOVA: Analysis of Variance.
I271B The t distribution and the independent sample t-test.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Innovative Teaching Article (slides with auxiliary information: © 2014) James W. Grice Oklahoma State University Department of Psychology.
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
Analyze Of VAriance. Application fields ◦ Comparing means for more than two independent samples = examining relationship between categorical->metric variables.
Chapter 12: Correlation and Linear Regression 1.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
EHS Lecture 14: Linear and logistic regression, task-based assessment
Regression Analysis AGEC 784.
Statistical Data Analysis - Lecture /04/03
Applied Biostatistics: Lecture 2
CHAPTER 10 Comparing Two Populations or Groups
Chapter 11: Simple Linear Regression
Correlation and Regression
This Week Review of estimation and hypothesis testing
Statistical Data Analysis - Lecture10 26/03/03
EHS 655 Lecture 15: Exposure variability and modeling
Applied Statistical Analysis
12 Inferential Analysis.
Central Limit Theorem, z-tests, & t-tests
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
I271B Quantitative Methods
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
Prepared by Lee Revere and John Large
The t distribution and the independent sample t-test
12 Inferential Analysis.
Statistics II: An Overview of Statistics
Inferential Statistics
Data Transformation, T-Tools and Alternatives
Created by Erin Hodgess, Houston, Texas
Presentation transcript:

EHS 655 Lecture 11: Transformations, inferential statistics (t-test, ANOVA)

What we’ll cover today Transformations Inferential statistics t-test ANOVA Review of midterm report requirements

TRANSFORMING VARIABLES Many inferential statistical methods assume data are normally distributed t-test ANOVA Linear regression However, many exposures positive and right-skewed

One solution: log-transform data Yi=ln(xi) where Yi is log-transformed data point xi is original data point ln is natural logarithmic function Natural log (ln) transform of lognormally distributed variable has properties of normal distribution i.e., bell-shaped and symmetric Described by geometric mean (GM) and geometric standard deviation (GSD)

Log transformation Exposure distributions – original and transformed Rappaport and Kupper, 2008

Log transformation

Evaluating lognormal distribution Quantile-quantile plots Untransformed Log-transformed Stata: qnorm varname1

Interpreting log transformed estimates Arithmetic mean of log transformed exposures Arithmetic SD of log transformed exposures Geometric mean Antilog of mean of log-transformed exposures Geometric standard deviation Stata: can use two combinations for transformation ln() or log() and exp() … OR … log10() and 10log10value

Caution about transformation Back-transformed mean ≠ original variable mean GM isn't easily interpreted Proper to run statistical tests on transformed values But often report means in unit of untransformed scale as well “If it ain’t broke, don’t fix it.” Transformation bad if: Distribution more or less symmetrical, few outliers Variances reasonably homogeneous Transformation may be useful Markedly skewed data or heterogeneous variances

INFERENTIAL STATISTICS Descriptive statistics applied to populations are called parameters Inferential statistics apply to samples We’ll focus on two inferential approaches today t-test ANOVA

t-test

t-test Detect differences between means of (normally-distributed) samples Significant t-statistic = means differ Student’s (unpaired) t-test Test hypothesis that means of two samples are equal; null is Stata: ttest varname1, by(groupvar) Paired sample t-test Test whether two measurements on same individual are equal Stata: ttest varname1 == varname2

Things we can do with a t-test Single-sample t-test: identify differences in the mean of a group and a reference value Unpaired t-test: identify differences in mean exposures between two groups Paired-sample t-test: identify differences in exposure before and after an intervention in a group of subjects

Interpreting a single-sample t-test in Stata

Interpreting a t-test between groups in Stata

Interpreting a paired t-test in Stata

ANOVA (ANalysis Of Variance) Technique for assessing how categorical independent variables affect continuous dependent variable Like a t-test generalized to three or more means Tells use whether means from k groups are same or not Null hypothesis:

Things we can do with ANOVA Identify differences in mean exposures between more than two groups Evaluate relationship of within-worker variance within exposure group to between-worker variance Within-worker > between worker = good exposure grouping Within-worker < between worker = poor exposure grouping

ANOVA assumptions Continuous dependent variable Independent variable is 2+ categorical groups Data independent from each other Errors normally distributed Variances same for all groups ANOVA fairly robust for these assumptions But data should not be extremely far off

ANOVA illustrated

Generic ANOVA components

ANOVA – F-test Compares variability in exposure accounted for by predictor variable vs error variability Error variability (mean squared error) measures inherent randomness of observations Large differences between groups = significant F test

F-statistic

F-statistic

Stata ANOVA output Stata: oneway responsevar groupvar Bigger F = significant

Stata ANOVA output

Stata ANOVA output Stata: anova responsevar groupvar Note different output: now get R2, adj R2, RMSE, etc. More in regression lecture

Stata ANOVA output Stata: oneway responsevar groupvar, tabulate Tabulate gives results by group

Why use ANOVA instead of t-test? Could do t-tests for all pairs of predictor variable categories Not a good idea As number of exposure groups grows, so does number of needed pair comparisons Each comparison introduces risk of error ANOVA puts all data into one number (F) and gives one P for null hypothesis

What if I want to know which groups are different Multiple comparisons possible After you run oneway command, use this second command Stata: pwcompare groupvar, effects sort mcompare(tukey)

Multiple comparison ANOVA output

Measure of agreement between categorical and continuous variables Stata: loneway responsevar groupvar Intraclass correlation coefficient = measure of agreement, same scale as Cohen’s kappa

ANOVA in action Enough with words already. Let’s see how ANOVA actually works http://web.utah.edu/stat/introstats/anovaflash.html Stata ANOVA commands: oneway responsevar groupvar Option (to get more detailed output by group) oneway responsevar groupvar, tabulate means standard

Resources Choosing statistical tests http://www.ats.ucla.edu/stat/spss/whatstat/default.htm Stata annotated output from various tests http://www.ats.ucla.edu/stat/AnnotatedOutput/

Review of midterm report

Example of noise exposure calculation requiring transformation Can describe noise exposures (in dBA) across individuals arithmetically In other words, to estimate a group mean for individuals in, say, the same trade, compute arithmetic mean To estimate average noise exposures within individual (in dBA) is computing dose Requires temporary transformation LEQi= 10 log [1/N (10 (TWA1/10) +10 (TWA2/10) + …+ 10 (TWAn/10))] Where N is total number of TWAs used to estimate average LEQ for person i How to operationalize in Stata? Note temporary transformation