Download presentation
Presentation is loading. Please wait.
Published byErin Malone Modified over 11 years ago
1
Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch
2
Outline Current practices Methods investigated Results Next steps
3
Influential Observation An observation is considered influential if its weighted contribution has an excessive effect on the estimate of the total (Chambers et al 2000)
4
The Data - U.S. Monthly Retail Trade Survey Collect sales and inventories Monthly survey of about 12,500 retail business with paid employees Sample selected every 5 years –Sample is stratified based on industry and sales –Quarterly sample of births –Deaths are removed
5
The Data Analysis done at published NAICS level Hidiroglou-Berthelot algorithm ran on the data before looking for influential values Horvitz-Thompson estimator
6
Causes of Influential Units One time or rare event Erroneous measure of size Change in the make-up of the unit Seasonal Businesses
7
Current Practices Analyst review an effect listing of micro level data and investigates units that may be influential When the analyst determines a correctly reporting unit may be influential, the case is referred to a statistician
8
Current Practices One time influential value –Imputation Recurring influential value –Weight adjustment based on the principles of representativeness –Moving the unit to a different industry when the nature of the business changes
9
Goals To improve upon current methodology by making it more objective and rigorous To find methodology that uses the observation but in a manner that assures its contribution does not have an excessive effect on the total
10
Assumptions Influential observations occur infrequently, but are problematic when they appear. The influential observation is true, although unusual. It is not the result of a reporting or coding error.
11
Strategy Identify candidate methodologies and test with real data from one industry (about 700 businesses) for a month that contains an influential value
12
Evaluation Criteria Number of influential observations detected, including the number of true and false detections made Estimate of bias Impact on month-to-month change
13
Notation where Y i is the sales for the i-th business in a survey sample of size n w i is the sample weight for the i-th unit X i is the previous months sales for the i th business
14
Methods Examined Weight trimming Reverse calibration Winsorization Generalized M-estimation
15
Weight Trimming Does not identify influential units Adjusts the weight of the observation
16
Weight Trimming Truncate the weight of the influential observation Adjust the weights of the non-influential observations to account for the remainder of the truncated weight Sum of the new weights is the same as the sum of the original weights (Potter 1990)
17
Weight Trimming Notes Calculations were done within sample stratum. Choice of correction factor could be investigated. We arbitrarily chose c i =w i /3.
18
Reverse Calibration Does not identify influential units Adjusts the value of the observation
19
Reverse Calibration 1.Use a robust estimation method to estimate the total 2.Modify the influential observations to achieve that total (Chambers and Ren 2004)
20
Winsorization Identifies influential units Adjusts the value of the observation
21
Winsorization Type I Type II
22
Winsorization – Defining K Define a separate K h for each stratum in a manner than minimizes the mse (Kokic and Bell 1994) Define a separate K i for each observation in a manner that minimizes the mse (Clarke 1995)
23
Winsorization – Defining K Use unweighted data to define K h for each stratum where K h = h +2s h Use weighted data to define K h for each stratum where K h = h +2s h where h and s h are based on the weighted data
24
Winsorization-Our Implementation Used a robust regression in SAS to estimate the parameters needed in the calculations
25
M-estimation M-estimators are robust estimators that come from a generalization of maximum likelihood estimation
26
M-estimation Identifies influential units Adjusts either the weight or the value of the influential observation
27
M-estimation Used a weighted M-estimation technique that is able to modify the weights or the values of the influential observations (Beaumont and Alavi 2004)
28
Results
32
Number of Outliers Detected *Method does not detect outliers, one outlier was specified
33
Replacement Values (in Millions) *Weight trimming adjusts the other 18 weights in the stratum **Winsor wgt +2s identified 3 other values
34
Total Sales for the Industry
37
Chosen for Further Study Winsorization by each observation M-estimation by observation M-estimation by weight
38
Contact Information Mary.H.Mulry@census.gov Roxanne.Feldpausch@census.gov
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.