Download presentation
Presentation is loading. Please wait.
Published byJacob Payne Modified over 6 years ago
1
EHS 655 Lecture 11: Transformations, inferential statistics (t-test, ANOVA)
2
What we’ll cover today Transformations Inferential statistics
t-test ANOVA Review of midterm report requirements
3
TRANSFORMING VARIABLES
Many inferential statistical methods assume data are normally distributed t-test ANOVA Linear regression However, many exposures positive and right-skewed
4
One solution: log-transform data
Yi=ln(xi) where Yi is log-transformed data point xi is original data point ln is natural logarithmic function Natural log (ln) transform of lognormally distributed variable has properties of normal distribution i.e., bell-shaped and symmetric Described by geometric mean (GM) and geometric standard deviation (GSD)
5
Log transformation Exposure distributions – original and transformed
Rappaport and Kupper, 2008
6
Log transformation
7
Evaluating lognormal distribution
Quantile-quantile plots Untransformed Log-transformed Stata: qnorm varname1
8
Interpreting log transformed estimates
Arithmetic mean of log transformed exposures Arithmetic SD of log transformed exposures Geometric mean Antilog of mean of log-transformed exposures Geometric standard deviation Stata: can use two combinations for transformation ln() or log() and exp() … OR … log10() and 10log10value
9
Caution about transformation
Back-transformed mean ≠ original variable mean GM isn't easily interpreted Proper to run statistical tests on transformed values But often report means in unit of untransformed scale as well “If it ain’t broke, don’t fix it.” Transformation bad if: Distribution more or less symmetrical, few outliers Variances reasonably homogeneous Transformation may be useful Markedly skewed data or heterogeneous variances
10
INFERENTIAL STATISTICS
Descriptive statistics applied to populations are called parameters Inferential statistics apply to samples We’ll focus on two inferential approaches today t-test ANOVA
11
t-test
12
t-test Detect differences between means of (normally-distributed) samples Significant t-statistic = means differ Student’s (unpaired) t-test Test hypothesis that means of two samples are equal; null is Stata: ttest varname1, by(groupvar) Paired sample t-test Test whether two measurements on same individual are equal Stata: ttest varname1 == varname2
13
Things we can do with a t-test
Single-sample t-test: identify differences in the mean of a group and a reference value Unpaired t-test: identify differences in mean exposures between two groups Paired-sample t-test: identify differences in exposure before and after an intervention in a group of subjects
14
Interpreting a single-sample t-test in Stata
15
Interpreting a t-test between groups in Stata
16
Interpreting a paired t-test in Stata
17
ANOVA (ANalysis Of Variance)
Technique for assessing how categorical independent variables affect continuous dependent variable Like a t-test generalized to three or more means Tells use whether means from k groups are same or not Null hypothesis:
18
Things we can do with ANOVA
Identify differences in mean exposures between more than two groups Evaluate relationship of within-worker variance within exposure group to between-worker variance Within-worker > between worker = good exposure grouping Within-worker < between worker = poor exposure grouping
19
ANOVA assumptions Continuous dependent variable
Independent variable is 2+ categorical groups Data independent from each other Errors normally distributed Variances same for all groups ANOVA fairly robust for these assumptions But data should not be extremely far off
20
ANOVA illustrated
21
Generic ANOVA components
22
ANOVA – F-test Compares variability in exposure accounted for by predictor variable vs error variability Error variability (mean squared error) measures inherent randomness of observations Large differences between groups = significant F test
23
F-statistic
24
F-statistic
25
Stata ANOVA output Stata: oneway responsevar groupvar
Bigger F = significant
26
Stata ANOVA output
27
Stata ANOVA output Stata: anova responsevar groupvar
Note different output: now get R2, adj R2, RMSE, etc. More in regression lecture
28
Stata ANOVA output Stata: oneway responsevar groupvar, tabulate
Tabulate gives results by group
29
Why use ANOVA instead of t-test?
Could do t-tests for all pairs of predictor variable categories Not a good idea As number of exposure groups grows, so does number of needed pair comparisons Each comparison introduces risk of error ANOVA puts all data into one number (F) and gives one P for null hypothesis
30
What if I want to know which groups are different
Multiple comparisons possible After you run oneway command, use this second command Stata: pwcompare groupvar, effects sort mcompare(tukey)
31
Multiple comparison ANOVA output
32
Measure of agreement between categorical and continuous variables
Stata: loneway responsevar groupvar Intraclass correlation coefficient = measure of agreement, same scale as Cohen’s kappa
33
ANOVA in action Enough with words already. Let’s see how ANOVA actually works Stata ANOVA commands: oneway responsevar groupvar Option (to get more detailed output by group) oneway responsevar groupvar, tabulate means standard
34
Resources Choosing statistical tests
Stata annotated output from various tests
35
Review of midterm report
36
Example of noise exposure calculation requiring transformation
Can describe noise exposures (in dBA) across individuals arithmetically In other words, to estimate a group mean for individuals in, say, the same trade, compute arithmetic mean To estimate average noise exposures within individual (in dBA) is computing dose Requires temporary transformation LEQi= 10 log [1/N (10 (TWA1/10) +10 (TWA2/10) + …+ 10 (TWAn/10))] Where N is total number of TWAs used to estimate average LEQ for person i How to operationalize in Stata? Note temporary transformation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.