Methodology of Allocating Generic Field to its Details Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007.

Methodology of Allocating Generic Field to its Details Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007

Outline Background Information on Tax Data Objective Current Methodology Other Methodologies Considered Comparison of the Methodologies Future Work and Conclusions

Tax Data Statistics Canada receives annual data from Canada Revenue Agency (CRA) on incorporated (T2) businesses Tax data: Balance Sheet Income Statement 88 different Schedules

Tax Data About 700 different fields to report Most companies provide only 30-40 fields Only 8 fields are actually required by CRA (section totals) Non-farm revenue Non-farm expenses Farm revenue Farm expenses Assets Liabilities Shareholder Equity Net Income/Loss

Objective To impute the missing detail variables Why ? Tax data users need detailed data (tax replacement project (TRP)) Different concepts and definitions between tax and survey data A subset of details linked to the same generic can be mapped to different survey variables (Chart of Account)

Challenges to meet Methodology must Work well for a large number of details Be capable of dealing with details which are rarely reported and those which are frequently reported Give good micro results for tax replacement, but also give good macro results when examined at the NAICS or full database level

First attempt to complete Tax Data Edit rules Outlier detection within a record Deterministic edits (to ensure the record balances within section) Review and manual corrections Overlap between fiscal period Negative values Consistency edits between tax variables Outlier detection between records (Hidiroglou-Berthelot) CORTAX balancing edits Deterministic imputation of key variables Inventories Depreciation Salaries and wages

GDA Concepts Corporation can use either generic or detail fields to report their results Case 1Case 2Case 3 Generic8810Office expenses amount10030 Details 8811 Office stationery and supply expense amount20 8812 Office utilities expense amount3010 8813 Data processing expense amount5060 Total100

GDA Concepts Block is defined by a generic and its details Generic field is not a total Goal is to impute the most significant detail variables when a generic amount has been reported GDA: Generic to detail allocation

Current method Uses imputation classes based on industry codes and size of company First 2 digits of NAICS (about 25 industries) Three sizes of revenue (boundaries of 5 and 25 million) Calculates ratios within imputation classes for each block Uses all non-zero and non-missing details Uses only details reported at least 10% of the time (5% for block General Farm Expense) Assigns ratios to businesses with a generic

Current method Originally proposed as a solution with good macro (aggregate) results Now need good micro (business) level results for TRP Problems Imputation classes are frequently not homogeneous in terms of distribution A large number of small imputation classes

Other methods considered Historic imputation method Scores method Cluster method

Historic imputation method Assumes distributions of details are the same from one year to the next Problems A change in business strategies/properties will not be considered this way Most businesses which report details in the previous year will report them also in the current year, leaving few businesses which could be imputed with this method (~5% on all blocks tested) Requires use of another method for remaining businesses

Scores method Uses response/non response models for each detail Groups businesses into imputation classes on the basis of percentiles of response probability Calculates ratios within imputation classes Assigns ratios to businesses with a generic

Scores method Problems Need to create a model for each detail Difficult to resolve what to do in the case of blocks with many details (5 or more) which are frequently reported This method was excluded due to it’s difficulty in coping with blocks with a moderate to large number of details

Cluster method Divides businesses into imputation classes on the basis of response patterns to details Uses clustering or dominant detail method Uses discriminatory models (parametric or not) to assign businesses with generic to imputation classes Calculates ratios within imputation classes Assigns ratios to businesses with a generic

Cluster method Problems For certain blocks it can be difficult to find good variables on which to discriminate Issue of how often clustering method and models should be reviewed

Comparing the methods Estimate distributions of known data for year n from ratios calculated for year n-1 Create a benchmark file Reported details in years n-1 and n Put all details into generic fields in year n Calculate ratios from businesses in year n-1 for all methods Assign ratios to businesses in year n Compare the results to the reported fields

Comparing the methods Compare the results at the micro (businesses) and the macro (aggregate) levels Compare true and estimated distributions

Comparing the methods Macro statistics for the j th detail in the block

Comparing the methods Micro Statistics Median Pseudo CV for the j th detail and i th business in the block

Comparing the methods Micro Statistics Median Pearson Contingency Coefficient for the j th detail and i th business in the block f values represent the marginal distributions d 2 represents the degree of dependency (depends on n, r and c )

Comparing the methods We show results for Block 8230: Other Revenue This block has 20 details covering revenue distribution Important for clients as used in many surveys The scores method is not shown as it is difficult to implement with this many details

Comparing the methods OTHER REVENUE FLDS 8230 TO 8250 8230 Other revenue 8231 Foreign exchange gains/losses 8232 Income/loss of subsidiaries/affiliates 8233 Income/loss of other divisions 8234 Income/loss of joint ventures … 8248 Insurance recoveries 8249 Expense recoveries 8250 Bad debt recoveries

Results Block 8230Micro StatisticsMacro Statistics Median Pseudo CV IQRMedian Pearson Cont. Coeff. IQRSSESSEP Current Method 1.080.430.660.142.2e20120 Cluster Method 0.341.390.360.632.8e2012 Historic + Cluster 0.510.990.100.79.9e194.5

Cluster methodology Most blocks use dominant detail (attractor) x clusters to define the imputation classes A business i belongs to cluster j of attractor x where x>50 if where is the total value reported by business i in detail j. If this statement is not true for any detail then the business is assigned to cluster j+1.

Cluster methodology Distribution ratios to details are calculated for each cluster Discriminatory models are then created (nonparametric for most blocks) to assign businesses with a generic Use variables on industry (NAICS), location (province), size (revenue, log revenue), details and totals of details in other blocks

Cluster methodology Generic amounts are assigned to details in the following 3 ways If generic amount and no details reported then ratios are assigned as calculated If generic amount and all details with ratio greater than 0% are reported then ratios are assigned as calculated If generic amount and some details but not all are reported, then ratios are pro-rated and generic is assigned only to details which were not reported

Cluster methodology Gives better micro results Improved data for tax replacement Macro results remain similar to current methodology Micro results are consistent year to year

Future work and conclusions The cluster methodology will be implemented for reference year 2006 for the Income Statement Model fitting and implementation for Balance Sheet will follow Review of models and clustering methods as deemed appropriate

For more information please contact Pour plus d’information, veuillez contacter Visit our web site at www.statcan.ca Contact Information / Coordonnées Jessica.andrews@statcan.ca Francois.brisebois@statcan.ca Nathalie.hamel@statcan.ca

Methodology of Allocating Generic Field to its Details Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007.

Similar presentations

Presentation on theme: "Methodology of Allocating Generic Field to its Details Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Methodology of Allocating Generic Field to its Details Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007.

Similar presentations

Presentation on theme: "Methodology of Allocating Generic Field to its Details Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007."— Presentation transcript:

Similar presentations

About project

Feedback