Model Comparison for Tree Resin Dose Effect On Termites Lianfen Qian Florida Atlantic University Co-author: Soyoung Ryu, University of Washington.

Slides:



Advertisements
Similar presentations
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Advertisements

Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Objectives (BPS chapter 24)
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Data mining and statistical learning - lecture 6
The Simple Linear Regression Model: Specification and Estimation
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Statistics for Managers Using Microsoft® Excel 5th Edition
BA 555 Practical Business Analysis
Statistics for Managers Using Microsoft® Excel 5th Edition
Missing at Random (MAR)  is unknown parameter of the distribution for the missing- data mechanism The probability some data are missing does not depend.
Nonparametric Regression and Clustered/Longitudinal Data
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
SIMPLE LINEAR REGRESSION
Prediction Methods Mark J. van der Laan Division of Biostatistics U.C. Berkeley
Ch. 14: The Multiple Regression Model building
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
Analysis of Covariance Goals: 1)Reduce error variance. 2)Remove sources of bias from experiment. 3)Obtain adjusted estimates of population means.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Correlation and Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Introduction to Linear Regression and Correlation Analysis
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Chapter 14 Introduction to Multiple Regression
Statistical Methods Statistical Methods Descriptive Inferential
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Regression Regression relationship = trend + scatter
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Time series Decomposition Farideh Dehkordi-Vakil.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 14 Introduction to Multiple Regression
Chapter 4 Basic Estimation Techniques
Basic Estimation Techniques
Simple Linear Regression - Introduction
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
Linear regression Fitting a straight line to observations.
M248: Analyzing data Block D UNIT D2 Regression.
Fixed, Random and Mixed effects
Product moment correlation
Chapter 14 Inference for Regression
Algebra Review The equation of a straight line y = mx + b
Longitudinal Data & Mixed Effects Models
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Model Comparison for Tree Resin Dose Effect On Termites Lianfen Qian Florida Atlantic University Co-author: Soyoung Ryu, University of Washington

Outline  Introduction  Longitudinal Data: Termites Data Set  Model Comparison –Partially Linear Model –Piecewise Linear Models –Nonparametric Smoothing Methods  Conclusions

Introduction Termite destruction in Florida is a serious problem.  Each year wood termites bore into thousands of homes and businesses causing millions of dollars of damage.  Current chemical pesticides that are used in the control of termites and protection from their damage are potentially harmful to Florida’s delicate environment.

Goal of study To determine the effectiveness of a natural tropical tree resin in controlling termites thus providing protection from their destruction.

Longitudinal Data  Definition: Longitudinal data is characterized by repeated measures over time on the same set of units.  Incomplete data: one or more of the sequences of measurements from units are incomplete.  Unbalanced data if the measurement was NEVER INTENDED to be taken  Missing data if the measurement was INTENDED to be taken

Longitudinal Data, Cont.  Benefits  Distinguish changes over time within units from the differences among units  Use units efficiently once they are enrolled in a study  Issue: Repeated observations on the same subject tend to be correlated  Need to find appropriate statistical analysis considering this correlation.

Termites Data Set  The resin was derived from the bark of tropical trees and was dissolved in a solvent and is placed on filter paper in two different levels of concentration, either 5mg or 10mg dosage.  There are eight dishes for each dose.  Twenty five alive termites are placed in each dish. Each dish was observed on 13 specific days. No observation was made on day 3 and day 9. O O O O O 5mg or 10mg25

Termites Data, Cont. 5mg 10mg Day

Scatter Plot

Longitudinal Plots (a) 5mg dose (b) 10mg dose

Partially Linear Model Data Set EDA Strange Behavior of dishes 1 & 2 for 10mg dose is found Mistake? Remove dishes 1 & 2 for 10mg Add additional unknown level of dosage. *mg, 5mg, 10mg Add random effect of dish NO YES Common time effect for the different dose Are error terms correlated? Add correlation additional term to catch the correlation End No YES Different time effect for the different dose

Partially Linear Model Benefits:  It is more efficient than the standard linear regression model, when the response variable depends on some variables in linear relationship, but is nonlinearly related to other covariates.  It can provide a parsimonious description of relationship between the response variable and explanatory variables.  It has the flexibility of the nonparametric model.

Partially Linear Model Y ij = x ij T β + g(t ij ) + ε ij, i = 1,…,m, j = 1,…, n i  m is the number of units  n i is the number of observations for each unit  (x ij, t ij ) is either independent and identically distributed random design points or fixed design points  g is an unknown non-parametric function  ε ij are a set of N random variables, each with zero mean and finite variance.  N = n 1 +…+n m

Back-fitting Algorithm 1. Given the current estimate , calculate residuals r ij =Y ij -x T ij  and use these in place of Y ij to calculate a c ubic spline estimate, g(t). 2. Given g, calculate residuals, r ij =Y ij -g(t ij ), and update the estimate  using generalized least squares, ß = (X T V -1 X) -1 X T V -1 r, where X is the matrix with rows x T ij, V is the assumed block diagonal covariance matrix of the data and r is the vector of residuals. 3. Repeat steps 1 and 2 for convergence. ^ ^ ^ ^^ ^ ^

Spline Estimator of g Spline Estimator of g  Among all functions g(x) with two continuous derivatives, find the one that minimizes the penalized residual sum of squares: ∑{r ij – g(x ij )} 2 + λ ∫ {g″(t)} 2 dt  λ controls the smoothness of the fitted curve:  Larger λ => Smaller variance => Smoother curve => Larger bias  Trade-off between bias and variance.

 The Generalized Cross-Validation function (Rice & Silverman, 1991) is used to choose λ: Minimize

Original Data Set with Common Time Effect

Removing Outliers (dishes 1 &2)

Add Additional Dose

Different Time Effect for Dose

Piecewise Linear Regression Model For 5mg, the data does not show change point. For 10mg, the data shows a change point.  Use the following piecewise linear model: E(y|x)=  0 +  1 x, if x<   0 +  1 x, otherwise. Change point estimated using M-estimation (Koul & Qian & Surgailis, 2003)

Two-Phase Linear Regression

Piecewise Linear Regression

Cubic Splines Smoothing Cubic Splines E(y|x)=  +  1 x+  2 x 2+  3 x 3 +  4 (x-7) 3 +

Cubic Spline Method (a) 5 mg dose (b) 10 mg dose(c) Unknown dose No significant different between cubic smoothing and piecewise models

Model Comparisons  Partially Linear Model gives significant dose effects and non-linear time trend.  The dose effect under 10 mg is about 1.5 times faster than under 5 mg dosage in killing termites.  Time trend levels off by the end of the experiments. It is possible that there are not many termites in the dishes or the termites build up resistance to the tree resin.

Piecewise Linear Models  It shows that there is a dramatic effect in the first seven days under 10 mg dosage.  There is linear trend and dose effect under 5 mg dosage.  For the two strange dishes under 10 mg dosage, the first seven day effect is not significantly from 5 mg dose, while after seven days, it shows worse effect than 5 mg dose. This indicates that there are recording or operating mistakes for those two dishes’ records.

Cubic Spline It shows the similar results as the piecewise linear models. There is one knot identified at the seventh day for 10 mg dosage, but there is none for 5mg dosage.

Conclusions  Overall, 10 mg dose is significantly more effective than 5 mg dose.  For 10 mg dose, both piecewise linear model and cubic spline smoothing show that termites are killed in about 7 days.

Conclusion, Cont.  For 5 mg dose, all methods (linear, partial linear, piecewise linear and cubic spline smoothing) show that the effect is linear. It takes more than double time to kill termites comparing 10 mg dose.  Two dishes recorded for 10 mg dose behaviors insignificant from 5 mg dose for the first 7 days. After seven days, it shows significantly none effectiveness on killing termites.

Conclusions, cont.  The estimated treatment effect is time varying with a change point at day 7.  The final piecewise model fits the data with adjusted R 2 =93.7%.  On average, 10mg is 68.9% more efficient than 5mg in killing termites d uring the first week.

Thank you ! Florida Atlantic University Please contact at PHONE: Department of Mathematical Sciences Florida Atlantic University Boca Raton, FL