6. EGR Identification Service ESTP Course on the EGR 3-4-5 December 2014 6. EGR Identification Service ESTAT
What is this presentation about? 1. General overview of EGR IS 2. Two mode of operation: a. Online search b. Batch processing 3. EGR IS challenges 4. Questions and Answers
EGR IS in the context of the EGR system
Access to EGR IS EGR IS uses ECAS 2 factor authentication The user must: 1. Manage ECAS account (provide GSM number) 2. Provide UID to EGR Team.
ECAS account management
EGR IS workflow From the NSI perspective the following steps have to be done: Selecting the data to be send out Converting the data into SDMX format (remove " from the csv file) Sending the data via eDamis using: EGR_ISRLE_N for resident LEU EGR_ISNORLE_N for non-resident LEU After validation with EDIT if error are found – error log will be sent via e-mail – MS will have to correct the file and resubmit it Receiving the result of the identification: EGROUT_ISLEU_N
EDIT Tool Edit tool is used for validation of the incoming files In the context of EGR IS we have 2 groups of validaiton rules: ISRLE file 31 rules presently: LEU_NAT_ID is not null LEU_NAT_ID_NIS_CODE is not null LEU_NAT_ID_NIS_CODE values as in the code list If LEU_STA_CODE=’L’, LEU_DATE_LIQ is not null ISNORLE file 6 rules presently: LEU_NAME is not null LEU_COUNTRY_CODE is not null LEU_COUNTRY_CODE values as in the code list The error reports form EDIT are sent back to the users via e-mail in a text format. In future we foresee to automate this using Edamis.
EDIT Error report Example of error report from EDIT where not allowed characters were sent: Example of error report from EDIT where not allowed codes were sent: These are examples of edit report: 1. Problem with wrong characters 2. Problems with Country code
Online search
Batch processing I Start Identification If resident legal units, EGR Team will identify them and send the results to the requestor. The result of identifying resident LEUS is always 1:1.
Batch processing II These fields can be used for filtering Magnifying glass - to see details on the search and proposed result Automatic select as best match for string search Resubmit to next available source all those that are still open for identification Close the searching process
Batch processing III 1. This functionality gives the opportunity to insert the legal unit that we are looking for into the DB. In short: this is not to be used by the NSIs for the time being.
Searching algorithm Similarity between search data and matched record is computed only when full text search is performed. There is no need to do it when result is found by a unique identifier. Similarity in the EGR IS application is computed based on Levenshtein distance algorithm. For each of the provided criteria, distance between criterion value and matched record’s field is computed. Weighted average of all partial similarities is considered as the final similarity which is visible for the user on search results’ panel. Full text search criteria have following weights: Country code: 1 (country code must be exactly matched for the record to appear in the result list, so there is no need to increase its weight above 1); Name: 4; Address: 1; Postal code: 1; City: 1. Similarity computation algorithm is also utilized when there is a need of choosing the best matching result that will be further used to perform update operation.
LEID search criteria The EGR IS uses configuration of search criteria stored centrally in the database in: EGR2ISLEID.LEID_SEARCH_CRITERIA to determine the way of searching for LEID records This is not configurable by the end user Each record is specified by four parameters: Request descriptor – determines the set of criteria used to search in LEID database Criteria descriptor – description of field or set of fields to be searched. Priority – the priority of the criteria in the request descriptor Full text – specify if the algorithm should use full search mechanism for this criteria
LEID search criteria specification of request descriptor NSA_EUROPEAN_RESIDENT NSA_ NON_RESIDENT 1 NON_NSA_EUROPEAN 2 NON_NSA_NON_EUROPEAN 3 specification of field or set of fields to be searched Full text search criteria have following weights: Country code: 1 (country code must be exactly matched for the record to appear in the result list, so there is no need to increase its weight above 1); Name: 4; Address: 1; Postal code: 1; City: 1. Similarity computation algorithm is also utilized when there is a need of choosing the best matching result that will be further used to perform update operation. COUNTRY_CODE_NATIONAL_ID_REGISTER DUNS_ID 1 BVD_ID 2 ADDRESS_NAME_CITY_POSTAL_COUNTRY 3 CDP_ID 4 COUNTRY_CODE_NATIONAL_ID 5 COUNTRY_CODE_NAME 6
DB settings MAXIMUM_NUMBER_OF_RESULTS Integer parameter which specifies maximum number of legal units that can be displayed as a result of a single search. PERCENTAGE_THRESHOLD Integer parameter which specifies minimum similarity that has to be achieved by full text search algorithm to consider legal unit as a valid search result. PERCENTAGE_THRESHOLD_IN_MATCHING Integer parameter which specifies minimum similarity that has to be achieved by LEID record matched by full text search algorithm for update. BATCH_AUTO_BEST_MATCH_SIMILARITY_THRESHOLD Minimum similarity value for the auto best match feature. Results above this value will be best matched automatically.
EGR IS figures Number of EU Legal Units provided by NSAs Provided by the NSAs Total: 15 976 275
EGR IS figures Number of EU Legal Units including CDP Total: 16 010 398
EGR IS figures Total Number of LEUs in EGR IS = 16 338 811 Partial Authentic Store: Austria Cyprus Germany Greece Ireland Island Lichtenstein Luxembourg Malta Netherlands
EGR IS challenges Improvement of the search algorithm (quality and performance) Integration of the overall process and its automation in the context of ESTAT infrastructure Integration of the EGR IS in the overall process of the EGR system Future maintenance of the system and introducing Change requests from the MS Standardization of the data exchange (Legal Forms) Duplicates based on name Volume of data
Questions & Answers
Address of test application: http://s-clonaz.eurostat.cec:7003/egr-is
Thank you