Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Business Taxation Data as Auxiliary Variables and as Substitution Variables in the Australian Bureau of Statistics Frank Yu, Robert Clark and Gabriele.

Similar presentations


Presentation on theme: "Using Business Taxation Data as Auxiliary Variables and as Substitution Variables in the Australian Bureau of Statistics Frank Yu, Robert Clark and Gabriele."— Presentation transcript:

1 Using Business Taxation Data as Auxiliary Variables and as Substitution Variables in the Australian Bureau of Statistics Frank Yu, Robert Clark and Gabriele B. Durant

2 Outline of talk  Use of tax data in ABS  Using tax data as auxiliary variables  example: subannual surveys  Using tax data as variables of interest  missing taxation data  example: annual surveys  Dealing with missing tax data:  Missing at Random  Common Error Measurement model  Conclusion

3 Use of tax data  construct and maintain population frame  as auxiliary variables for estimation  substitute survey data to reduce provider burden  as source for imputing missing/invalid survey data  provide independent estimates for validation of outputs

4 Data supplied by Australian Taxation Office  Australian Business Register information  businesses identified by name, address  industry, payees  Business Activity Statement data - GST and PAYG data  available (90%) 6 months after reference quarter  turnover, wage and salaries, capital and non-capital expenses  Income Tax data  available (70 to 80%)18 months after reference quarter  detailed expenses and revenue and balance sheet

5 Use of tax data for frame creation ABS MP ATO MP complex units simple units: ABN = statistical unit from Australian Busines Register ABS Maintained Population ATO maintained population

6 Use of tax data for frame construction  construction: units from ABR  industry, sector  number of payees  multistate indicators  maintenance:  births and cancellation  tax roles : e.g. employing vs non-employing units  long term non-remitters excluded  stratification: single/multiple states, industry

7 Frame auxiliary variables (x i 's)  derived size benchmarks:  from BAS, based on wage and salaries data  used as stratification variables  BAS turnover  BAS wages  need imputation (derived from average of quarterly data)  lag reference quarter by 2 quarters

8 Sample Survey BAS dataBIT data concept**** accuracy****** timeliness****** detailed domain****** richness of data items ****** Survey data vs tax data

9 Use of tax data as auxiliary variables SurveyVariables of interest Auxiliary Variables for estimation Retail TradeSalesBAS turnover Economic Activity Survey financial variables BIT variables Annual Integrated Collection same as EASBAS variables

10 s U\s yiyi xixi xixi tax data as auxiliary variables

11 Generalised Regression Estimation

12 Advantages and disadvantages Advantages  provide efficiency  approximately unbiased  does not require X's to be measuring the right concepts  does not require X's to be current Disadvantages  does not model Y directly e.g. zero units  influential points  efficiency in estimating levels not equal to efficiency for estimating change

13

14 Issue: inactive/out of scope units Solution: apply GREG to positive units only

15 efficiency for estimating level does not necessarily translate to efficiency for estimating change

16 Data Substitution Approach: Use tax as the variable of interest  Assumes tax data are better  respondents more serious about getting it right  more time to provide information  audited accounts (for BIT) for tax purposes  Detailed breakdown  Missing tax data  require matching to frame  missingness is non- ignorable ƒ inactive units ƒ late units have more expenses

17 Examples: Economic Activity Survey (annual) 1990s to 05/06 estimation of totals for broad items for microbusinesses tax data as substitution variables augmenting sample for simple businesses tax data to replace broad level income and expenses items estimation of detailed items detailed items imputed by pro-rating broad tax data based on splits observd in surveys

18 Examples: Annual Integrated Collection (06/7 onwards) AIC - core survey estimates estimation of totals for survey variables for small and large businesses tax data as auxiliary variables for generalised regression estimation AIC - complementary estimates estimation of totals for broad items for microbusinesses tax data as substitution variables AIC - complementary estimates estimation of detailed state/industry classes tax data as substitution variables AIC - complementary estimates estimation of detailed economic variables tax data as substitution variables, disaggregated by model estimation of pro-rating factors

19 Notation Y available r i = 1 Y not available r i = 0 U

20 Use MAR model on frame only Y available r i = 1 Y not available r i = 0 U model: Y= f(x) for r i = 1 Xi frame variables tax data of interest

21 Use MAR model conditional on frame variables only Y available r i = 1 Y not available r i = 0 U model: Y= f(x) for r i = 1 impute Y^ = f(x) for r i = 0 Xi MAR

22 But for non-ignorable missingness Y available r i = 1 Y not available r i = 0 U model: Y= f(x) for r i = 1 impute Y^ = f(x) for r i = 0 Xi

23 Use a sample to inform about the nonreporters based on their survey response. Notation: Use Y to represent tax variables and Y* for survey variables (a surrogate of Y) Y available r i = 1 Y not available r i = 0 U s Y* available Xi

24 Imputing tax data from survey data Y available r i = 1 Y not available r i = 0 U s Y* available model: Y= f(Y*, x i ) Xi

25 Imputing tax data from survey data Y available r i = 1 Y not available r i = 0 U s Y* available model: Y= f(Y*) impute Ŷ model: Y= f(Y*, x i ) Xi

26 Imputing tax data from survey data Y available r i = 1 Y not available r i = 0 U s Y* available model: Y= f(Y*, x) impute Ŷ =f(Y*, x) Xi

27 Models for Y Missing at Random: Y independent of r given x and Y* Common measurement error: Given Y, distribution of Y* Is independent of r

28 Use MAR model: missing at random given X and Y* Y available r i = 1 Y not available r i = 0 U s Y* available model: Y= f(Y*, x) for r i = 1 impute Ŷ for r i = 0 Xi MAR

29 Imputation using MAR model 1. Using data on Y and Y* observed from the units in the sample where where both survey and tax data are reported, model Y as a function of Y*. 2. Use this model to impute Y i * for tax non reporters in the sample (assuming Y* is known for them). 3. For units not in the sample, if their tax data is missing, impute using the distribution

30 Use CME model Y available r i = 1 Y not available r i = 0 U s Y* available model: Y*= f(Y, x) for r i = 1 Xi CME invert to get Ŷ = g(Y*) impute Ŷ = h(X) for r i = 0 for i in U\s

31 Imputation using CME model

32 Modelling survey data (Y*) and tax data (Y) - invert this to predict Y from Y*

33 Model: survey data Y* (EAS 05/06) as a function of frame variable X (tax_turn_0405) for tax nonrespondents (i.e. r =0)

34 BLUP impute: Empirical Best Linear Unbiased Predictor (EBLUP) of Y i EBLUP impute

35 CME imputation process  use units in sample where tax and survey variables are observed and model the survey variable (Y*) as a function of tax and frame data. (Y, X)  Under CME this model applies to r = 0 too.  use units in the sample where survey data are observed (i in s) but tax data are not (r i = 0) to model the survey variable (Y*)as function of frame data (x).  combine to give an impute for (Y) for tax nonrespondents (r = 0):  Combine to get EBLUP

36 Further work  domain estimation for CME/MAR  variance estimation  discriminating between CME and MAR based on data

37 Conclusion  GREG is useful for estimation of survey data but efficiency gain is limited.  There is increasing interest in using tax data directly on its own to produce economic statistics.  Non-ignorable missingness becomes a key issue with tax data.  Survey data could be useful to help impute the tax data


Download ppt "Using Business Taxation Data as Auxiliary Variables and as Substitution Variables in the Australian Bureau of Statistics Frank Yu, Robert Clark and Gabriele."

Similar presentations


Ads by Google