Training Course on EDIT For Users
Outline of the module Introduction Using EDIT - integration with other tools Objects in EDIT for Users EDIT Graphical User Interface Future developments
A - Introduction
EDIT is a tool for data validation - data edit/imputation What is data validation? - An activity aimed at verifying whether the value of a data item comes from the given set of acceptable values: What is data editing? - The activity aimed at identifying erroneous entries and correcting them if necessary. Example: the response is missing or incorrect.
How EDIT works shortly? Define a format A format contains a description of the data in a dataset A dataset is a set of data according to a specific format Define a format Define a program containing rules and file operations to be executed on the dataset(s) Uploads dataset(s) from external files For users Execute the job Get the report containing errors (if any)
EDIT User types 'User‘ - Executes programs on datasets and accesses the reports. 'Programmer‘ - Manages the metadata needed by the user to execute programs; Implements 'formats‘; Implements ‘validation rules’ by means of 'programs'; Defines other operations on files by mean of 'programs'; Sets up the unattended mode configuration. 'Administrator' Manages users and permissions.
'User' type functionalities ‘Change Password’ Allows users to change their password; ‘Dataset Import/Export’ Allows users to import and export data to and from EDIT as well as monitor any ongoing import/export processes; ‘Job Execution’ Allows users to execute programs on imported datasets and view/export the results of the execution.
The 'User' Workflow Data Import Job Execution Job Results Data Export
The link between 'User workflow' and 'User interface'
What can we do by means of a ‘program’? Run programs containing mainly validation rules / computations: A1 – Single column – only a column is involved; A2 – Multiple columns – two or more columns within a single record are involved; B - Vertical – multiple records involved; C - Hierarchical – multiple datasets involved. Perform dataset operations: Copy, Merge, Alter, Aggregate, etc. Use specialised functions like outlier detection: Terror, Hidiroglu-Berthelot, σ-Gap; Accepted formats: SDMX-ML, GESMES, CSV, FLR.
multi-year 2007, 2008, 2009 observations Accepted data formats GESMES (BOP ITS, BOP FDI) UNA:+.? ' UNB+UNOC:3+FR2+4D0+100929:1637+IREF000243++GESMES/TS' UNH+MREF000001+GESMES:2:1:E6' BGM+74' NAD+Z02+ECB' NAD+MR+4D0' NAD+MS+FR2' IDE+10+EUROSTAT_BOP_01 reporting' DSI+BOP_FDI_A' STS+3+7' DTM+242:201009291637:203' DTM+Z02:20072009:702' IDE+5+EUROSTAT_BOP_01' GIS+AR3' GIS+1:::-' ARR++A:FR:N:2:330:N:4A:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F‘ ARR++A:FR:N:2:330:N:4F:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F' ARR++A:FR:N:2:330:N:7Z:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F' ARR++A:FR:N:2:330:N:A1:E:1100:9999:20072009:702:5824:A:F+5930:A:F+4204:A:F' ARR++A:FR:N:2:330:N:A1:E:1495:9999:20072009:702:5828:A:F+5932:A:F+4206:A:F' CSV (with or without header) (SBS, CVTS,TOURISM) 9H; 2008; LT; 2; B-N_X_K642; 11930; 16236; ; ; ; ; UNIT; ; ; ; ; ; TT0; ; ; ; ; D08 9H; 2008; LT; 3; B-N_X_K642; 11930; 1001; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 9H; 2008; LT; 4; B-N_X_K642; 11930; 529; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 9H; 2008; LT; 30; B-N_X_K642; 11930; 17766; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 9H; 2008; LT; 2; B-E; 11930; 1138; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 9H; 2008; LT; 3; B-E; 11930; 104; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 9H; 2008; LT; 4; B-E; 11930; 61; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 multi-year 2007, 2008, 2009 observations FLR example 1 001E20100121814 00 804.822 001E20100121816 93 5295.54 001E20100121814 99 6166.24 001E20100125290334 581.371 FLR example 2 2010010011 010252000405595911005909580E 01ZZZZZ 2691.966 2734482.0 0.0 2010010011 010252000405595911004009600E 01ZZZZZ 237.543 341202.0 0.0
B - Using EDIT - integration with other tools
Ways of using EDIT As a web-based application – called by other applications; Standalone – running on a PC; Client – server – running in a Data Centre.
EDIT as Web-based application Web-based Interface Unified interface for both the standalone version and the server deployment; EUROSTAT Look & Feel; Light interface, simplified workflows. ECAS account is needed.
EDIT running standalone Downloadable package; Standalone installation supported by Windows XP and Windows 7; Simple installation wizard; Full functionality; Standard authentication is requested.
Client - server mode for EDIT EDIT runs on a UNIX machine; The current setup is EDIT installed at Eurostat & other DGs; Contains all registered domains (= user specific workspaces) as by default imbedded; ECAS credentials needed for external users.
EDAMIS integration EDAMIS allows transmitting data files through a single entry point; EDAMIS can send data to EDIT by placing the files in a configurable location; EDIT detects metadata based on the EDAMIS naming convention; EDIT performs the processing in unattended mode.
SDMX integration Statistical Data and Metadata Exchange (SDMX) initiative is sponsored by seven institutions (the BIS, the ECB, Eurostat, the IMF, the OECD, the UN and the World Bank); SDMX describes and universalises the way to exchange statistical data and metadata; EDIT can import SDMX-ML datasets.
C - Objects in EDIT for Users Datasets instantiations - lookups; Programs, jobs
1 - Dataset instantiations Dataset Instance (Dataset) – a collection of data rows according to the structure of a format; A two dimensional table composed by rows and columns: Columns correspond to the fields defined in the format; Records – no limit on size or number.
Dataset example – Table AES (Adult Education Survey)
The description of the table AES
Example: 'Format' – 'Dataset instantiation'
The same format – different datasets
Lookup tables – code lists Lookup – An auxiliary dataset containing a list of values to be used for validating codes; Code lists – usually lookup tables refer to code lists; One can use several code lists inside the same program – as many as needed for the given data sets – 'Country', NACE, NUTS; Several versions of the same code list can be used from within the same program, if needed.
2 - Programs, jobs Program – a set of operations to be performed on a specified dataset definition (format); No specific dataset is associated with a program, only formats (dataset definitions) should be specified; Job – the association between a 'Program' and concrete 'Dataset Instances'; Possible operations types of rules/checks: Single and Multiple column(s), Vertical and Hierarchical.
Validation report It contains: Job results – information about the job; Error statistics – summary of the errors; Error report – detailed list of errors.
Error statistics The error statistics are displayed in a table format and it consists of the following columns: Rule name: The name of the program rule that failed; No of Failures: Individual rows that the error appeared through job execution; Rule Message: Rule’s error message as defined in the program.
Errors statistics Rule Name No of Failures Rule Message RC07 10 Error : This region’s code is not valid RC185 1 Error : IntPrv does not contain the expected values SC04 8 Error : Invalid value (if MAINSTAT in(20, 31, 32, 33, 34, 35, 36, -1) then JOBISCO should be -2) SC05 Error : Invalid value (if MAINSTAT in(20, 31, 32, 33, 34, 35, 36, -1) then LOCNACE should be -2) SC32 Error : Invalid value (if SpkPrv04 <> 1 AND pskPrv05 <> 1 then SpkEquip should be -2) SC33 Error : Invalid value (if SpkPrv04 <> 1 AND pskPrv05 <> 1 then SpkPHelp should be -2) CC04 Error : Invalid value ()
Detailed error report No MESSAGE SEVERITY EXP NAME PARTITION AUXILIARY DATA 1 This region code is not valid Error RC07 ROW_NUMBER=1 REGION= “JP” 2 Invalid value (if MAINSTAT in(20,31,32,33,34,35,36,-1) then JOBISCO should be -2) Warning SC04 ROW_NUMBER=3 MAINSTAT=20 JOBISCO=2 3 Invalid value (if MAINSTAT in(20,31,32,33,34,35,36,-1) then LOCNACE should be -2) SC05 MAINSTAT=20 LOCNACE=7 4 ROW_NUMBER=4 REGION=EG 5 MAINSTAT=31 LOCNACE=6 6 ROW_NUMBER=6
D - EDIT GRAPHICAL USER INTERFACE
EDIT - Log in
EDIT Home page Menu options User profile information Here password can be changed
Defining dataset: import dataset Go in >Dataset>> Import dataset Screen part I Select a file on your hard drive Select a file type (CSV / GESMES / FLR / SDMX) Reuse saved parameters Starting line Save properties for further use
Defining dataset: import dataset Screen part II Select a format Reuse saved configuration Select columns to import Use the arrows to add remove fields Provide a name for the new dataset Save configuration for further use Click to import
Defining dataset: import dataset Unsuccessful import Click to download the importing report in text format Status is FAILED
Defining dataset: import dataset Successful import with warnings In the report, two records were skipped (lines 2 and 5) Click to download the importing report in text format Status is COMPLETED
Defining dataset: import dataset Successful import After importing, EDIT redirects you to the search dataset screen Click to look at the content imported Delete a selected dataset Status is COMPLETED
Defining dataset: import dataset Click to hidden fileds Select fields to be hidden in the display Hidden fields EDIT hides the selected fields
Defining dataset: import dataset Unfold the Basic filtering options Select an logical operator Enter a value Select a field in the datatset (e.g. WEIGHT) The corresponding records are filtered
Defining dataset: import dataset Unfold the Advanced filtering options Create an expression aided by the lists of fields, operators and functions Click to apply the search criteria The corresponding records are filtered
Defining dataset: import dataset Customize your view Export in CSV format
Defining dataset: search dataset Search criteria Restore an archived dataset Export the dataset in CSV format List of already imported datasets View details of the dataset with filtering options Delete the dataset Archive the dataset
Defining dataset: Import/Export dataset Import/Export history search Search criteria View details of the dataset with filtering options List of Import/export history Delete the dataset
Defining jobs: Create a job Menu option Search criteria Click to create a job for this program List of existing programs to be executed
Defining jobs: Create a job Enter a name and a description Choose the dataset to validate (if several) Execute the job
Defining jobs: Create a job When the validation is finished the date is displayed During the validation process, only cancellation is possible Validation is RUNNING
Defining jobs: Create a job Delete the job Copy the job When the validation is finished the date is displayed Click to view the results Validation is COMPLETED
Defining jobs: Create a job VIEW RESULTS OF A JOB Click to view the Error table
Defining jobs: Create a job VIEW ERROR TABLE OF A JOB Filtering by Error fields Unfold Basic filtering Unfold Advanced filtering Error message number Export the error table (CSV)
Defining jobs: Create a job Message contained into the program Severity used into the program Name of the rule into the program Row number where the error occured into the dataset Click to view details of the error Variable values defined into the program
Defining jobs: Create a job DETAILED VIEW OF ERROR Select the dataset fileds to display Error information Dataset record (fields selected)
Defining jobs: Create a job EXPORT ERROR REPORT OF A JOB Click to Export the error table in CSV format
Defining jobs: Create a job EXPORT ERROR REPORT OF A JOB Choose CSV or FLR format CSV parameters Error fields selected Optionally, select Ascending or Descending order for any error field Export table
Defining jobs: Create a job VIEW PROGRAM DETAILS Content of the program
Defining jobs: job statistics Menu option Job statistics
Defining jobs: search job Enter the search criteria Delete the job Copy the job The corresponding jobs are displayed (all jobs if no selected criteria) Click to view the results
E - Future developments Internationalisation – to offer the translation of the menus in other languages; Gesmes full integration (registry); SDMX 2.1 formats.
Useful links To EDIT page: http://ec.europa.eu/eurostat/edit To VIPv page: CIRCAbc -> Eurostat -> VIP Validation Project Generic data validation and editing service: mailto: ESTAT-VALIDATION@ec.europa.eu EDIT as web – client - https://webgate.ec.europa.eu/eurostat/edit CIRCAbc for: EHSIS: https://circabc.europa.eu/w/browse/0b5ab24d-68a0-419f-a6bd-e41eb84f33fb BoP: https://circabc.europa.eu/w/browse/01940df9-91ec-407b-9ba4-0f5c47086e0c BoP:https://circabc.europa.eu/w/browse/ef8b542b-35a8-401c-9dd4-37f61e49f34d
Thank you for your attention! Questions? Thank you for your attention!