Download presentation
Presentation is loading. Please wait.
1
Data processing German foreign trade statistics
ADVANCED ISSUES IN INTERNATIONAL TRADE IN GOODS STATISTICS ESTP training course 2 – 4 April 2014 German foreign trade statistics
2
Data processing
3
German foreign trade statistics
Up to 30 million records per month First results 40 days after reference period High efficient data processing necessary Uneven distributed statistical value Most records have limited effect on results
4
The ASA System
5
ASA: Data submission monitoring
6
Data submission monitoring
Monitoring the very important enterprises (“Top 60“) for the German foreign trade Checking of the variance Investigation of large deviations from previous year or month Identification of unusual deviations for all enterprises Acceptance Factor: (Current Value – Mean Value)/Std. Dev. Fast correction or confirmation of unusual values
7
Data submission monitoring
8
Data submission monitoring
Checking of data delivery Structural Checks Data file format, Field format, Readability, Statistics Delivery specific checks Declaration attributes: Form, Flow, Specific Number, Doublet Declaration specific checks: Tax Number, Serial errors Processing serial errors in the data declarations A data delivery with more then 250 errors is generally rejected Approval of data declarations for the main (micro-) data processing
9
The ASA system: Selective editing
10
The selective editing process
Limited capacity for manual correction Important data records are corrected manually The vast majority of the data records have limited impact on the results Rather unimportant data records are corrected by automated procedures Rule-based procedures Hot-Deck procedure Regression-based procedure
11
Selective editing: Threshold values
Prioritization by CN8 specific threshold values High quality results for all commodity codes Determination of the important micro data for the results The highest potential value of a record (according to statistical value, supplementary unit and net mass) is compared with the threshold value of the respective CN8 code Threshold values are calculated by the processed error free micro data of the previous 12 months
12
Selective editing: Threshold values
<25% >75% Threshold (75%) for CN8 code flow arrivals: ( )/2=4150
13
Classification by fictional value
The statistical value can be erroneous The fictional value (highest potential value) is less vulnerable for errors The fictional value is the maximum of: The statistical value The average statistical value per supplementary unit multiplied by the supplementary unit The average statistical value per net mass multiplied by the net mass
14
Selective editing: Validation checks
The data records are compared with reference data in order to find errors and to prioritize them The reference data and validation rules are managed by the tool “BASE PL-Editor“ The validation rules and the structure of the reference data are implemented in the ASA system by a XML file (Definite) Errors and possible errors
15
Selective editing: Validation checks
Errors Invalid codes Very unusual unit-price Invalid combinations Possible Errors Unlikely Partner countries etc. Unlikely unit-price, value Unlikely combinations
16
Selective editing: Validation checks
17
Selective editing: Validation checks
19
The ASA system: Selective editing
20
Selective editing: Automated correction
Deterministic error correction If – then correction rules Effective method provided a strong correlation between variables For example: CN8 code and mode of transport Typical errors For example: Numerical code instead of Iso-Alpha Numerical variables The supplementary unit and net mass are corrected by the statistical value and the average ratio
21
Selective editing: Automated correction
Hot-Deck error correction Correcting erroneous micro data by imputing values of error free micro data (donor records) Only categorical variables Nearest-Neighbor approach for donor determination Calculating of the distance between the records Weighting of the variables In most cases a donor with the same CN8 code Avoiding outliers as donors Considering the impact on the donor result
22
Selective editing: Automated correction
Hot-Deck Donor determination Variable 1 Variable 2 Variable 3 Distance w 1 =1 2 3 =2 Erroneous record A B C Potential donor 1 D Potential donor 2 Potential donor 3 Corrected record å = - k XY y x
23
The ASA system: Outlier detection
24
Outlier detection Comparison of current results with results of previous 12 months Outliers are highlighted by the Acceptance Factor (Current value – Mean value)/Std. dev. Detailed results at CN8 level CN8 result Partner country result Statistical value, net mass, supplementary unit and their ratios
25
Outlier detection
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.