Presentation is loading. Please wait.

Presentation is loading. Please wait.

Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch.

Similar presentations


Presentation on theme: "Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch."— Presentation transcript:

1 Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch

2 Outline Current practices Methods investigated Results Next steps

3 Influential Observation An observation is considered influential if its weighted contribution has an excessive effect on the estimate of the total (Chambers et al 2000)

4 The Data - U.S. Monthly Retail Trade Survey Collect sales and inventories Monthly survey of about 12,500 retail business with paid employees Sample selected every 5 years –Sample is stratified based on industry and sales –Quarterly sample of births –Deaths are removed

5 The Data Analysis done at published NAICS level Hidiroglou-Berthelot algorithm ran on the data before looking for influential values Horvitz-Thompson estimator

6 Causes of Influential Units One time or rare event Erroneous measure of size Change in the make-up of the unit Seasonal Businesses

7 Current Practices Analyst review an effect listing of micro level data and investigates units that may be influential When the analyst determines a correctly reporting unit may be influential, the case is referred to a statistician

8 Current Practices One time influential value –Imputation Recurring influential value –Weight adjustment based on the principles of representativeness –Moving the unit to a different industry when the nature of the business changes

9 Goals To improve upon current methodology by making it more objective and rigorous To find methodology that uses the observation but in a manner that assures its contribution does not have an excessive effect on the total

10 Assumptions Influential observations occur infrequently, but are problematic when they appear. The influential observation is true, although unusual. It is not the result of a reporting or coding error.

11 Strategy Identify candidate methodologies and test with real data from one industry (about 700 businesses) for a month that contains an influential value

12 Evaluation Criteria Number of influential observations detected, including the number of true and false detections made Estimate of bias Impact on month-to-month change

13 Notation where Y i is the sales for the i-th business in a survey sample of size n w i is the sample weight for the i-th unit X i is the previous months sales for the i th business

14 Methods Examined Weight trimming Reverse calibration Winsorization Generalized M-estimation

15 Weight Trimming Does not identify influential units Adjusts the weight of the observation

16 Weight Trimming Truncate the weight of the influential observation Adjust the weights of the non-influential observations to account for the remainder of the truncated weight Sum of the new weights is the same as the sum of the original weights (Potter 1990)

17 Weight Trimming Notes Calculations were done within sample stratum. Choice of correction factor could be investigated. We arbitrarily chose c i =w i /3.

18 Reverse Calibration Does not identify influential units Adjusts the value of the observation

19 Reverse Calibration 1.Use a robust estimation method to estimate the total 2.Modify the influential observations to achieve that total (Chambers and Ren 2004)

20 Winsorization Identifies influential units Adjusts the value of the observation

21 Winsorization Type I Type II

22 Winsorization – Defining K Define a separate K h for each stratum in a manner than minimizes the mse (Kokic and Bell 1994) Define a separate K i for each observation in a manner that minimizes the mse (Clarke 1995)

23 Winsorization – Defining K Use unweighted data to define K h for each stratum where K h = h +2s h Use weighted data to define K h for each stratum where K h = h +2s h where h and s h are based on the weighted data

24 Winsorization-Our Implementation Used a robust regression in SAS to estimate the parameters needed in the calculations

25 M-estimation M-estimators are robust estimators that come from a generalization of maximum likelihood estimation

26 M-estimation Identifies influential units Adjusts either the weight or the value of the influential observation

27 M-estimation Used a weighted M-estimation technique that is able to modify the weights or the values of the influential observations (Beaumont and Alavi 2004)

28 Results

29

30

31

32 Number of Outliers Detected *Method does not detect outliers, one outlier was specified

33 Replacement Values (in Millions) *Weight trimming adjusts the other 18 weights in the stratum **Winsor wgt +2s identified 3 other values

34 Total Sales for the Industry

35

36

37 Chosen for Further Study Winsorization by each observation M-estimation by observation M-estimation by weight

38 Contact Information Mary.H.Mulry@census.gov Roxanne.Feldpausch@census.gov


Download ppt "Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch."

Similar presentations


Ads by Google