Using Business Taxation Data as Auxiliary Variables and as Substitution Variables in the Australian Bureau of Statistics Frank Yu, Robert Clark and Gabriele.

Slides:



Advertisements
Similar presentations
Statistics NZs experience in using Administrative Data in an Integrated Programme of Economic Vince Galvin General Manager Strategy & Communications.
Advertisements

1 ESTIMATION IN THE PRESENCE OF TAX DATA IN BUSINESS SURVEYS David Haziza, Gordon Kuromi and Joana Bérubé Université de Montréal & Statistics Canada ICESIII.
Paul Smith Office for National Statistics
Possibilities of exploiting administrative data in short term statistics in Poland Jacek Kowalewski STATISTICAL OFFICE IN POZNAŃ.
Data Imputation United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
1 Third Workshop on ICP Western Asia Beirut, October 2004 Design of ICP price survey Sultan Ahmad, Consultant Based on Keith.
Unido.org/statistics International workshop on industrial statistics 8 – 10 July, Beijing Non response in industrial surveys Shyam Upadhyaya.
Nonparametric estimation of non- response distribution in the Israeli Social Survey Yury Gubman Dmitri Romanov JSM 2009 Washington DC 4/8/2009.
Treatment of missing values
Improvements to the Quality of Tax Data in the Context of their Use in Business Surveys at Statistics Canada François Brisebois, Martin Beaulieu, Richard.
Towards a Better Integration of Survey and Tax Data in the Unified Enterprise Survey Claude Turmelle Statistics Canada ICES-III Montréal, Québec, Canada.
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Riku Salonen Regression composite estimation for the Finnish LFS from a practical perspective.
François Brisebois, Statistics Canada International Total Survey Error Workshop June 15, 2010 Improvements to Economic Survey Methodologies to Reduce Revisions.
Examining the use of administrative data for annual business statistics Joanna Woods, Ria Sanderson, Tracy Jones, Daniel Lewis.
1 Report on the Income- and Product-Side Estimates of Output Growth Comments Steve Landefeld Brookings Panel on Economic Activity March 19,
The Use of Administrative Sources for Economic Statistics An Overview Steven Vale Office for National Statistics UK.
The Use of Administrative Sources for Statistical Purposes Administrative Sources and Statistical Registers.
Use of administrative data in statistics - challenges and opportunities ICES III End Panel Discussion Montreal, June 2007 Heli Jeskanen-Sundström Statistics.
Seminar on Developing a Programme on Integrated Statistics in the Caribbean Saint Lucia The Components of an Integrated Business and International Statistics.
Eurostat Repeated surveys. Presented by Eva Elvers Statistics Sweden.
Household Surveys ACS – CPS - AHS INFO 7470 / ECON 8500 Warren A. Brown University of Georgia February 22,
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)
Administrative Data at Statistics Canada – Current Uses and the Way Forward 27 th Voorburg Group Meeting Warsaw, Poland André Loranger October 4, 2012.
The Canadian Integrated Approach to Economic Surveys Marie Brodeur, Peter Koumanakos, Jean Leduc, Éric Rancourt, Karen Wilson Statistics Canada International.
12th Meeting of the Group of Experts on Business Registers
Use of administrative data in short term economic indicators Statistics NZ Rochelle Barrow.
1 Presentation to OG6 Canberra, Australia May 2011 Statistical Uses of Administrative Data in Canada.
Sébastien CHAMI 5 May, 2010 Reengineering French structural business statistics An extended use of administrative data.
Integrating administrative and survey data in the new Italian system for SBS: quality issues O. Luzi, F. Oropallo, A. Puggioni, M. Di Zio, R. Sanzo Nurnberg,
Improvements in stratification in the UK's Office for National Statistics Pete Brodie, Martina Portanti & Emily Carless UK Office for National Statistics.
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
The Future of Administrative Data ICES III End Panel Discussion Don Royce Statistics Canada June 2007.
Quality issues on the way from survey to administrative data: the case of SBS statistics of microenterprises in Slovakia Andrej Vallo, Andrea Bielakova.
Impact of using fiscal data on the imputation strategy of the Unified Enterprise Survey of Statistics Canada Ryan Chepita, Yi Li, Jean-Sébastien Provençal,
Integration of Annual Economic Collections – The Australian Experience ICESIII, Canada, 2007 Presented by Eden Brinkley.
Collecting Electronic Data From the Carriers: the Key to Success in the Canadian Trucking Commodity Origin and Destination Survey François Gagnon and Krista.
Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.
System of Economic Surveys in Egypt. Agenda Introduction Survey design stages What types of surveys are needed Challenges in surveying the informal sector.
The new multiple-source system for Italian Structural Business Statistics based on administrative and survey data Orietta Luzi, Ugo Guarnera, Paolo Righi.
Investigating improvements in quality of survey estimates by updating auxiliary information in the sampling frame using returned and modelled data Alan.
for statistics based on multiple sources
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
Evaluating generalised calibration / Fay-Herriot model in CAPEX Tracy Jones, Angharad Walters, Ria Sanderson and Salah Merad (Office for National Statistics)
Calibrated imputation of numerical data under linear edit restrictions Jeroen Pannekoek Natalie Shlomo Ton de Waal.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Copyright 2010, The World Bank Group. All Rights Reserved. Economic statistics, part 2 Business statistics; core element of economic statistics 1 Business.
Processing Methodology of Tax Data at Statistics Canada Authors: François Brisebois, Richard Laroche and Rossana Manriquez (Statistics Canada) Presenter:
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
Administrative Data at Statistics Canada – Current Uses and the Way Forward Wesley Yung and Peter Lys, Statistics Canada.
The Evolution of Administrative Data Use for the Canadian Business Register (BR) IAOS Conference Gaétan St-Louis October 2008.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
4-6 September 2013, Vilnius Quality in Statistics: Administrative Data and Official Statistics USING ADMINISTRATIVE DATA SOURCES IN OFFICIAL.
Michael Biddington, UN ESCAP Statistics Division,
Regression composite estimation for the Finnish LFS from a practical perspective Riku Salonen.
Michael Biddington, Statistics Division, UN ESCAP.
Dublin, april 2012 Role of Business Register in coordinated sampling
Quality Aspects and Approaches in Business Statistics
SBRs and Economic Surveys
Michael Biddington, UN ESCAP Statistics Division,
ADMINISTRATIVE DATA IN ANNUAL BUSINESS STATISTICS OF LATVIA
The European Statistical Training Programme (ESTP)
Issues in Administrative Data
ANALYSIS OF POSSIBILITY TO USE TAX AUTHORITY DATA IN STS. RESULTS
The Swedish survey on turnover in the service sector
Use of administrative data for statistical purposes
Chapter 13: Item nonresponse
Presentation transcript:

Using Business Taxation Data as Auxiliary Variables and as Substitution Variables in the Australian Bureau of Statistics Frank Yu, Robert Clark and Gabriele B. Durant

Outline of talk  Use of tax data in ABS  Using tax data as auxiliary variables  example: subannual surveys  Using tax data as variables of interest  missing taxation data  example: annual surveys  Dealing with missing tax data:  Missing at Random  Common Error Measurement model  Conclusion

Use of tax data  construct and maintain population frame  as auxiliary variables for estimation  substitute survey data to reduce provider burden  as source for imputing missing/invalid survey data  provide independent estimates for validation of outputs

Data supplied by Australian Taxation Office  Australian Business Register information  businesses identified by name, address  industry, payees  Business Activity Statement data - GST and PAYG data  available (90%) 6 months after reference quarter  turnover, wage and salaries, capital and non-capital expenses  Income Tax data  available (70 to 80%)18 months after reference quarter  detailed expenses and revenue and balance sheet

Use of tax data for frame creation ABS MP ATO MP complex units simple units: ABN = statistical unit from Australian Busines Register ABS Maintained Population ATO maintained population

Use of tax data for frame construction  construction: units from ABR  industry, sector  number of payees  multistate indicators  maintenance:  births and cancellation  tax roles : e.g. employing vs non-employing units  long term non-remitters excluded  stratification: single/multiple states, industry

Frame auxiliary variables (x i 's)  derived size benchmarks:  from BAS, based on wage and salaries data  used as stratification variables  BAS turnover  BAS wages  need imputation (derived from average of quarterly data)  lag reference quarter by 2 quarters

Sample Survey BAS dataBIT data concept**** accuracy****** timeliness****** detailed domain****** richness of data items ****** Survey data vs tax data

Use of tax data as auxiliary variables SurveyVariables of interest Auxiliary Variables for estimation Retail TradeSalesBAS turnover Economic Activity Survey financial variables BIT variables Annual Integrated Collection same as EASBAS variables

s U\s yiyi xixi xixi tax data as auxiliary variables

Generalised Regression Estimation

Advantages and disadvantages Advantages  provide efficiency  approximately unbiased  does not require X's to be measuring the right concepts  does not require X's to be current Disadvantages  does not model Y directly e.g. zero units  influential points  efficiency in estimating levels not equal to efficiency for estimating change

Issue: inactive/out of scope units Solution: apply GREG to positive units only

efficiency for estimating level does not necessarily translate to efficiency for estimating change

Data Substitution Approach: Use tax as the variable of interest  Assumes tax data are better  respondents more serious about getting it right  more time to provide information  audited accounts (for BIT) for tax purposes  Detailed breakdown  Missing tax data  require matching to frame  missingness is non- ignorable ƒ inactive units ƒ late units have more expenses

Examples: Economic Activity Survey (annual) 1990s to 05/06 estimation of totals for broad items for microbusinesses tax data as substitution variables augmenting sample for simple businesses tax data to replace broad level income and expenses items estimation of detailed items detailed items imputed by pro-rating broad tax data based on splits observd in surveys

Examples: Annual Integrated Collection (06/7 onwards) AIC - core survey estimates estimation of totals for survey variables for small and large businesses tax data as auxiliary variables for generalised regression estimation AIC - complementary estimates estimation of totals for broad items for microbusinesses tax data as substitution variables AIC - complementary estimates estimation of detailed state/industry classes tax data as substitution variables AIC - complementary estimates estimation of detailed economic variables tax data as substitution variables, disaggregated by model estimation of pro-rating factors

Notation Y available r i = 1 Y not available r i = 0 U

Use MAR model on frame only Y available r i = 1 Y not available r i = 0 U model: Y= f(x) for r i = 1 Xi frame variables tax data of interest

Use MAR model conditional on frame variables only Y available r i = 1 Y not available r i = 0 U model: Y= f(x) for r i = 1 impute Y^ = f(x) for r i = 0 Xi MAR

But for non-ignorable missingness Y available r i = 1 Y not available r i = 0 U model: Y= f(x) for r i = 1 impute Y^ = f(x) for r i = 0 Xi

Use a sample to inform about the nonreporters based on their survey response. Notation: Use Y to represent tax variables and Y* for survey variables (a surrogate of Y) Y available r i = 1 Y not available r i = 0 U s Y* available Xi

Imputing tax data from survey data Y available r i = 1 Y not available r i = 0 U s Y* available model: Y= f(Y*, x i ) Xi

Imputing tax data from survey data Y available r i = 1 Y not available r i = 0 U s Y* available model: Y= f(Y*) impute Ŷ model: Y= f(Y*, x i ) Xi

Imputing tax data from survey data Y available r i = 1 Y not available r i = 0 U s Y* available model: Y= f(Y*, x) impute Ŷ =f(Y*, x) Xi

Models for Y Missing at Random: Y independent of r given x and Y* Common measurement error: Given Y, distribution of Y* Is independent of r

Use MAR model: missing at random given X and Y* Y available r i = 1 Y not available r i = 0 U s Y* available model: Y= f(Y*, x) for r i = 1 impute Ŷ for r i = 0 Xi MAR

Imputation using MAR model 1. Using data on Y and Y* observed from the units in the sample where where both survey and tax data are reported, model Y as a function of Y*. 2. Use this model to impute Y i * for tax non reporters in the sample (assuming Y* is known for them). 3. For units not in the sample, if their tax data is missing, impute using the distribution

Use CME model Y available r i = 1 Y not available r i = 0 U s Y* available model: Y*= f(Y, x) for r i = 1 Xi CME invert to get Ŷ = g(Y*) impute Ŷ = h(X) for r i = 0 for i in U\s

Imputation using CME model

Modelling survey data (Y*) and tax data (Y) - invert this to predict Y from Y*

Model: survey data Y* (EAS 05/06) as a function of frame variable X (tax_turn_0405) for tax nonrespondents (i.e. r =0)

BLUP impute: Empirical Best Linear Unbiased Predictor (EBLUP) of Y i EBLUP impute

CME imputation process  use units in sample where tax and survey variables are observed and model the survey variable (Y*) as a function of tax and frame data. (Y, X)  Under CME this model applies to r = 0 too.  use units in the sample where survey data are observed (i in s) but tax data are not (r i = 0) to model the survey variable (Y*)as function of frame data (x).  combine to give an impute for (Y) for tax nonrespondents (r = 0):  Combine to get EBLUP

Further work  domain estimation for CME/MAR  variance estimation  discriminating between CME and MAR based on data

Conclusion  GREG is useful for estimation of survey data but efficiency gain is limited.  There is increasing interest in using tax data directly on its own to produce economic statistics.  Non-ignorable missingness becomes a key issue with tax data.  Survey data could be useful to help impute the tax data