Survey of Electronic Commerce and Technology: Past, Present and Future Challenges Jason Raymond Third International Conference on Establishment Surveys June 2007
Outline Description of the survey Methodology Improvements to the sample design Weighted Outliers Future challenges
Description of the survey Annual survey in place since 1999 Cross-economy survey Some exceptions at sub-industry level Domains of interest: NAICS, SIZE (number of employees)
Description of the survey Two-page questionnaire with questions on: Use of information and communications technologies (Internet, intranet, web site, …) Use of electronic commerce for the purchase and sale of goods and services Barriers to electronic commerce Types of questions: Mostly categorical Some numerical total sales over Internet percentages
Methodology Sampling Universe Statistics Canadas Business Register List of public units Target population Fixed thresholds of exclusion: $100,000 or $250,000 in gross business income depending on industry Covers approximately 95% of income in each industry around 700,000 businesses
Methodology Sampling Stratification NAICS3, NAICS4 Size: 0 to 19 employees 20 to 99 employees 100 to 499 employees 500 employees and more -> Take-all stratum Public/private sector Take-some strata
Methodology Sampling Neyman allocation Sample Selection Sample size: around 19,000 enterprises Maximum overlap between two consecutive years: Kish and Scott method (1971) Approximately 70% overlap
Outlier detection Variables: Sales over Internet Year over year difference for sales over Internet Method: Variant of sigma gap Distance measure between observations Methodology
Partial nonresponse (8.3%) imputation Deductive (1%) Historical (0.1%) Administrative (0.02%) Donor (7.2%) Total nonresponse (31%) reweighting Methodology
Estimation using Statistics Canadas Generalized Estimation System (GES) Types of estimates Means Totals Proportions Ratios Data quality measures based on CVs and imputation rates
Improvements to the sample design When? Current sample design tested in 2004 in parallel with original design and adopted in 2005 Why? Improve the comparability of estimates over time Need for estimates by size of enterprise
Target population Original sampling design: Units accounting for 95% of the total income Drawback: Unstable population over time New sampling design Fixed thresholds of exclusion: $100,000 or $250,000 depending on the industry Improvements to the sample design
Stratification and allocation Original sampling design NAICS3, NAICS4 Lavallée-Hidiroglou: 2 take-some strata and 1 take- all stratum Auxiliary variable: GROSS BUSINESS INCOME Drawback: Not efficient for estimates by size (Number of employees) Improvements to the sample design
New sampling design Stratification: NAICS3, NAICS4 Size: 0 to 19 employees 20 to 99 employees 100 to 499 employees 500 employees and more -> Take-all stratum Public/private Neyman allocation Improvements to the sample design Take-some strata
Weighted Outliers Small proportions of firms sell over Internet (8% of private sector and 16% public sector) Moderate values but large weights sometimes significantly influence estimates Previously outlier detection uniquely for unweighted values of sales over the Internet
Weighted Outliers Weighted outlier detection and treatment implemented in 2006 Same detection method as for unweighted values (variant of sigma gap method) Treatment methods studied Hidiroglou/Srinath Winsorization Dalén and Tambay Promotion to own stratum
Hidiroglou/Srinath (1981) Weight reduction method Minimizes MSE of estimator for total Requires use of population characteristics which are unknown, and which may possibly not be estimated reliably. Weighted Outliers
Winsorization Reduces values larger than a certain cutoff to the cutoff itself (dependent on outlier detection method) Modified to weight reduction method Weighted Outliers
Dalén(1987) and Tambay(1988) Cross between Winsorization and weight reduction The cutoff for weighted outlier detection is determined for each stratum Outlier value is split into two parts: Portion less than the cutoff which receives the same new weight as the non-outliers; Portion greater than the cutoff which is allocated a weight of 1 Weighted Outliers
Promotion to own stratum Outliers assigned a weight of 1 Remaining units in stratum have their weights adjusted Outlier represents only itself during estimation
Implemented method: Dalén and Tambay Fewer assumptions Nice compromise Impact on the estimates is reduced Not as drastic as promotion to own stratum Method performed well using 2005 data Additional empirical studies to confirm effectiveness of the method (simulations?) Weighted Outliers
Future challenges Response burden Maximising overlap = increased response burden? Minimal effect on response rates Conditioning effect? Sample rotation: Ease response burden Control sample overlap for longitudinal analysis
Statistics Canadas Business Register redesign Sampling elements based on operating structure VS statistical structure Certain modeled variables replaced by administrative data Future challenges
For more information please contact Pour plus dinformation, veuillez contacter Jason Raymond