Sang-Hyop Lee & Comfort Sumida 3rd NTA Workshop January 21, 2006 Smoothing profiles Sang-Hyop Lee & Comfort Sumida 3rd NTA Workshop January 21, 2006
Lowess Lowess carries out a locally weighted regression of dependent variable on independent variable, displays the graph, and optionally saves the smoothed variable. The command in Stata is .lowess yl age75 if age75>14, nograph gen(sm_yl1) .lowess yl age75 if age75>14, nograph bwidth(0.1) gen(sm_yl2) where yl is the key variable to be smoothed; bwidth(*) specifies the bandwidth and gen(*) is used to save smoothed values.
Methodology Cleveland, William S. 1979. Robust locally weighted regression and smoothing scatter plots. Journal of the American Statistical Association 74:829-36. Let yi and xi be the two variables, and assume that the data are ordered so that xi xi+1 for i = 1,.....N – 1. For each yi, a smoothed (predicted) value of yiP is calculated. The subset used in calculating yiP is indices i- = max(1, i – k) through i+ = min(i + k, N) where k = [(N * bandwidth – 0.5)/2] := (N * bandwidth)/2 Local weight
Bandwidth and Smoothing The optimal bandwidth is dependent on the variable and dataset, and is determined through examination of smoothed profiles plotted against unsmoothed ones. Too narrow bandwidth results in smoothed estimates that are still noisy. Too wide bandwidth does not provide an accurate representation of the unsmoothed data.
Too narrow bandwidth
Too wide bandwidth
Proper bandwidth
Warning “lowess” can produce smoothed values that consistently larger (or smaller) than unsmoothed values The high smoothed values are due to the frequency weight in the data. . table age75 [w=weight], content (mean yl mean sm_yl1 mean sm_yl2)
Solution In principle, the smoothing procedure should be done simultaneously with sample weight, not after the smoothing. .lowess yl age75 if age75>14, nograph bwidth(0.1) gen(sm_yl2) . table age75 [w=weight], content (mean yl mean sm_yl1 mean sm_yl2) Expand (duplicate) the data using sample weight, smooth the data, and tabulate the non-smoothed and smoothed values without w=weight option. => takes long time to execute.