Analysis of Chromium Emissions Data Nagaraj Neerchal and Justin Newcomer, UMBC and OIAA/OEI and Mohamed Seregeldin, Office of Air Quality Planning and.

Slides:



Advertisements
Similar presentations
Divide-and-Conquer and Statistical Inference for Big Data
Advertisements

Week 71 Bootstrap Method - Introduction The bootstrap, developed by Efron in the late 1970s, allows us to calculate estimates in situations where there.
Review bootstrap and permutation
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Biomedical Presentation Name: 牟汝振 Teach Professor: 蔡章仁.
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Lecture 13 – Tues, Oct 21 Comparisons Among Several Groups – Introduction (Case Study 5.1.1) Comparing Any Two of the Several Means (Chapter 5.2) The One-Way.
MARE 250 Dr. Jason Turner Hypothesis Testing III.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Bootstrapping LING 572 Fei Xia 1/31/06.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Inference about a Mean Part II
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Chapter 7 Inferences Regarding Population Variances.
STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor.
Bootstrapping applied to t-tests
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
Empirical Research Methods in Computer Science Lecture 2, Part 1 October 19, 2005 Noah Smith.
Model Building III – Remedial Measures KNNL – Chapter 11.
Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, June 2011 Paul Kershaw University.
Biostatistics IV An introduction to bootstrap. 2 Getting something from nothing? In Rudolph Erich Raspe's tale, Baron Munchausen had, in one of his many.
Chapter 11 Inference for Distributions AP Statistics 11.1 – Inference for the Mean of a Population.
1 Nonparametric Methods III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC.
Estimating Incremental Cost- Effectiveness Ratios from Cluster Randomized Intervention Trials M. Ashraf Chaudhary & M. Shoukri.
Experimental Evaluation of Learning Algorithms Part 1.
12.1 Heteroskedasticity: Remedies Normality Assumption.
Chapter 7 Process Capability. Introduction A “capable” process is one for which the distributions of the process characteristics do lie almost entirely.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Mathematical Model for the Law of Comparative Judgment in Print Sample Evaluation Mai Zhou Dept. of Statistics, University of Kentucky Luke C.Cui Lexmark.
Bootstraps and Jackknives Hal Whitehead BIOL4062/5062.
STATISTICAL METHODS AND DATA MANAGEMENT TOOLS FOR OUTLIER DETECTION IN TRI DATA Dr. Nagaraj K. Neerchal and Justin Newcomer Department of Mathematics and.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Psychology 202a Advanced Psychological Statistics October 1, 2015.
Chapter 8. Process and Measurement System Capability Analysis
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004 V & R 6.6.
Why The Bretz et al Examples Failed to Work In their discussion in the Biometrical Journal, Bretz et al. provide examples where the implementation of the.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Green Belt – SIX SIGMA OPERATIONAL Central Limit Theorem.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
Modern Approaches The Bootstrap with Inferential Example.
Quantifying Uncertainty
DATA ANALYSIS AND MODEL BUILDING LECTURE 7 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Bootstrap – The Statistician’s Magic Wand
Application of the Bootstrap Estimating a Population Mean
Using Bootstrapping to Teach Statistical Concepts
Sampling distribution
Correlation – Regression
When we free ourselves of desire,
R Data Manipulation Bootstrapping
Statistical Inference for the Mean Confidence Interval
Quantifying uncertainty using the bootstrap
CONCEPTS OF ESTIMATION
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Bootstrapping Jackknifing
Techniques for the Computing-Capable Statistician
Presentation transcript:

Analysis of Chromium Emissions Data Nagaraj Neerchal and Justin Newcomer, UMBC and OIAA/OEI and Mohamed Seregeldin, Office of Air Quality Planning and Standards, EPA, RTP

Objective To develop a protocol (methodology) for obtaining confidence bounds for the “Mean Chromium Emissions” for each welding process and rod type combination. Incorporate all the data, including the averages, to the best of our ability.

About The Data Three Welding Processes –GMAW, SMAW, FCAW Three Rod Types –E308, E309, and E316 Multiple Sources of Data –Some report individual measurements –Some report only averages without the original observations. –Units of reporting vary—all are converted to g/kg

Summary Statistics Note: Summary Statistics based only on observations with single measurement.

Combining Rod Types Combine E308+E316 because of the similar technology and small sample size Sample Sizes:

Summary Statistics After Combing Data for Rod Types Note: Summary Statistics based only on observations with single measurement.

Traditional Approaches Assume Normality? –Normality is not a good assumption for this data set at all –Sample sizes are very small for certain combinations –Bounds obtained assuming normality give meaningless results (e.g. negative bounds) when the data does not follow normality 95% Confidence Intervals for the Mean: Note: Summary Statistics based only on observations with single measurement.

Traditional Approaches Transform the data to normality –Optimal transformation for Total Chromium data is different from optimal for Chrom6 data. –It is hard to transform the confidence bounds back to the original scale (mean of the log is not the same log of the mean!) Box-Cox Log-Likelihood Plots:

Traditional Approaches Weighted regression to incorporate the averages

Traditional Approaches Weighted Regression –Estimates have good properties (such as BLUE) in general—not only for normal data –But the confidence bounds are sensitive to the normality assumption, especially when the sample sizes are small as in our case.

Nonparametric Approaches? –Nonparametric approaches usually use ranks. When only averages are reported we completely lose the information regarding ranks. Therefore, means can not be incorporated into nonparametric approaches. Bootstrapping? –Made popular by Bradley Efron in the 1980’s –Efron and Tibshirani (1993) –Millard, S. P. and Neerchal, N. K. (2000) Traditional Approaches

Bootstrapping What is Bootstrapping? –Resampling the observed data –It is a simulation type of method where the observed data (not a mathematical model) is repeatedly sampled for generating representative data sets –Only indispensable assumption is that “observations are a random sample from a single population” –There are some fixes available when the single population assumption is violated as in our case. –Can be implemented in quite a few software packages: e.g. SPLUS, SAS –Millard and Neerchal (2000) gives S-Plus code

Bootstrapping - The Details DataX=(X 1,X 2,X 3,….,X n ) Statistic: T=T(X) rep #1X* 1 =(X* 1,X* 2,X* 3,….,X* n )T* 1 =T(X* 1 ) rep #2X* 2 =(X* 1,X* 2,X* 3,….,X* n )T* 1 =T(X* 1 ) …..……..……. rep #BX* B =(X* 1,X* 2,X* 3,….,X* n )T* 1 =T(X* 1 ) Bootstrapping inference is based on the distribution of the replicated values of the statistic : T* 1,T* 2,….T* B. For example, Bootstrap 95% Upper Confidence Bound based on T is given by the 95 th percentile of the distribution of T*s.

Bootstrapping Single Tests Data Note: Columns in yellow represent the 95% upper confidence bound

Bootstrapping the Combined Data Group the data points according to the number of tests used in reporting the average, within each welding process and rod type combination. Then bootstrap within each such group. i.e. for GMAW and E316: Note: Each color represents a separate group

Bootstrapping - Results Note: Columns in yellow represent the 95% upper confidence bound

Final Remarks Normality assumption is not appropriate for either Total Chromium or Chromium6 data. Weighted regression model can accommodate the averages into the estimates. Bootstrapping the data seems to be a way to ensure that meaningful confidence bounds are obtained More work is needed to study the robustness of Bootstrapping results with respect to some extreme values in the data