© Federal Statistical Office Germany, IV A2 Federal Statistical Office Germany Application of Regular Expressions in the German Business Register Session.

Slides:



Advertisements
Similar presentations
Federal Guidance on Statistical Use of Administrative Data Shelly Wilkie Martinez, Statistical and Science Policy, OIRA U. S. Office of Management and.
Advertisements

Federal Department of Home Affairs FDHA Federal Statistical Office FSO The revision of the codification of the economic activities in the Swiss Business.
© Federal Statistical Office, Business Register Federal Statistical Office Germany Information from Administrative Data for Business Register Session 5:
United Nations Statistics Division Recoding the business register to ISIC Rev.4.
Federal Department of Home Affairs FDHA Federal Statistical Office FSO The Creation of a Unique Business Identification Number in Switzerland OECD – 25.
1 Constructing and Maintaining a Business Register: Singapore’s Experience By Ong Lai Heng Singapore Department of Statistics International Workshop on.
Iman El Hitta Economic Statistical Sector, Central Agency
Linking administrative and survey data - employment variable for enterprises and establishments in Finnish Business Register Jaakko Salmela Statistics.
Role of editing and imputation in integration of sources for structural business statistics Svein Gåsemyr, Statistics Norway Svein Nordbotten, University.
1 BUSINESS REGISTER CBS-ISRAEL. 2 LEGAL FRAME WORK in 1997 two inter-governmental committees issued: 1. LEGAL ASPECTS 2. PRACTICAL & TECHNICAL ASPECTS.
CZSO Business Register in the Czech Statistical Office Prepared by: Jan Matejcek CZSO, Prague, Czech Republic
Quality in the Swedish Business Database The Quality Survey 2004 Round Table Beijing 2004 Swedish presentation, session 5, 18 th Round Table, Beijing –
MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.
1 National Job Vacancy Surveys: The Same or Still Different? Anja Kettner and Michael Stops Institute for Employment Research, Nuremberg (Germany) European.
The Statistical Business Register of Macao SAR Government of Macao SAR Statistics and Census Service.
12th Meeting of the Group of Experts on Business Registers
The Use of Administrative Sources for Statistical Purposes Matching and Integrating Data from Different Sources.
Copyright 2010, The World Bank Group. All Rights Reserved. Business registration, part 2 Administrative and statistical business registers 1 Business statistics.
Georgia: business register data and gender-disaggregated indicators Tengiz Tsekvava Technical Meeting on Measuring Entrepreneurship from Gender Perspective.
Statistics Portugal « (Quality Rome, 10 July 2008) « Simplified Business Information: « Improving quality by using administrative data in Portugal.
Workshop Risk management Dutch Tax administration Jon Hornstra.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
New sources – administrative registers Genovefa RUŽIĆ.
© Federal Statistical Office Germany Statistisches Bundesamt Statistical analysis through linking Intra Trade Register and General Business Registers Christiane.
2008 Population Census of Cambodia Post Enumeration Survey Mrs. Hang Lina Deputy Director General National Institute of Statistics, Min. of Planning Regional.
© Federal Statistical Office Germany, Division IB, Institute for Research and Development in Federal Statistics Sheet 1 Surveys, administrative data or.
Data sharing and disseminating from Finnish Business Register Timo Laukkanen Wiesbaden City Group, International Round Table on Survey Frames November.
© Federal Statistical Office, IV-A Business Register, Roland Sturm September 2011 Folie 1 Joint UNECE/OECD/Eurostat Meeting of experts on Business Register.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 22.
1 For a Population Statistical Register Characteristics and Potentials for the Official Statistics Central department for administrative data and archives.
African Centre for Statistics United Nations Economic Commission for Africa Issues to be Addressed in Reforming and Improving Civil Registration and Vital.
Experience and response in developing countries: the twinning project with the Tunisian National Statistical Institute Monica Consalvi ISTAT, Division.
Costa Rica´s business registry: Directory of institutional units and establishments Contacts: Odilia Bravo:
1 Statistical business registers as a prerequisite for integrated economic statistics. By Olav Ljones Deputy Director General Statistics Norway
Commercial sources & compiling statistics Floortje Sniedt Statistics Netherlands Department of Business Statistics Wiesbaden Group, Paris Session 6c: “Measuring.
© Statistisches Bundesamt, VI A Statistisches Bundesamt The new method of the next german Population census Johann Szenzenstein, Federal Statistical Office,
© Statistisches Bundesamt, Division IB, Institute for Research and Development in Federal Statistics Is the utilization of administrative data in short.
The Setup of the Register of Addresses and Buildings of the German 2011 Census Data quality issues and solutions.
Detecting Sequences and Cycles of Web Pages Narayan L. Bhamidipati and Sankar K. Pal Indian Statistical Institute Kolkata.
United Nations Economic Commission for Europe Statistical Division Statistical Business Register in the CIS countries 21st meeting of the Wiesbaden Group.
The use of administrative data for the production of official economic statistics in Brazil - current situation and challenges for the future Shanghai,
Statistik.atSeite 1 Wiesbaden Group on Business Registers Tallinn, September 2010 Development of a business register for administrative purposes.
Administrative Data and Official Statistics Administrative Data and Official Statistics Principles and good practices Quality in Statistics: Administrative.
Armenia Action B1 28 th March – 1 st April 2011 General introduction to Business Registers Wednesday 30 March 2011 Mrs Vibeke Skov Møller
CARICOM.
WEB SCRAPING FOR JOB STATISTICS
Business Register (TSBR)
Civil society in figures: The identification of the Nonprofit Sector
Assessing Disclosure Risk in Microdata
Cooperation with the ministries of education
Classifying enterprises by economic activity
Civil society in figures: The identification of the Nonprofit Sector
BR Initiative Improvement in China
Dublin, april 2012 Role of Business Register in coordinated sampling
Sample surveys versus business register evaluations:
The Integration of Enterprise Groups into the German Business Register
Business Register Quality Improvement
The Business Register as a tool for multi source analyses
VAT data in Business Register and Business Statistics
Sub-regional workshop on integration of administrative data, big data
A brief overview of the 2010 UNECE BR Survey results
Boro Nikic WP1&WP2 meeting Rome, November 2016
Workshop II: Implementation of a more efficient way of collecting data
22nd Meeting of the Wiesbaden Group on Business Registers
Statistical units in the public sector
Difficult Problems of Business Register in China
Hungarian Business Register
Wiesbaden Group Neuchatel 24 – 27 September 2018
Methodological questions raised by the combined use of administrative and survey data for the French structural business statistics Work session on statistical.
Stephanie Hirner ESTP ”Administrative data and censuses
Presentation transcript:

© Federal Statistical Office Germany, IV A2 Federal Statistical Office Germany Application of Regular Expressions in the German Business Register Session 5: Projects on Improvements for Business Registers Wiesbaden Group on Business Registers Paris, November 26 th 2007, Patrizia Moedinger

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 2 Example 1: Improving legal form coding by using regular expressions

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 3 Background  information on legal forms mainly from VAT records  not all administrative sources provide information on legal forms  use of different not compatible legal form coding or different aggregation levels  special requirements for other purposes like the coding of institutional sectors

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 4 Background  enterprises (legal units) with certain legal forms are legally obliged to carry their legal form in the enterprise name:  incorporated firms  non-incorporated firms  cooperatives  merchants that are registered in the German Commercial Register enterprise names can be used for legal form coding

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 5 Definition of search patterns  patterns from nomenclature, abbreviation and notations (tax authorities) GmbH, AG & Co.KG, Limited, Ltd.  patterns from BR real data mistakes in writing, missing blanks,.. construction of regular expression

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 6 Evaluation of search patterns  completeness of coding legal obligation: high level of found legal forms in enterprise names  degree of reliance: evaluation of coding results  drawing sample after legal form coding  classification of the coding results  calculation of sensitivity, specificity, positive predictive value, negative predictive value

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 7 Completeness of coding

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 8 Evaluation of Type I and II errors Enterprise name contains legal form no or wrong legal form regular expression detects legal form1,0094 PPV (positive predictive value) = 1,009 / (1, ) = 99.6 % no or wrong legal form 262,961 NPV (negative predictive value) = 2,961 / (2, ) = 99.1 % Sensitivity = 1,009 / (1, ) = 97.5 % Specificity = 2,961 / (4 + 2,961) = 99.8 % N =4,000

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 9 Example 2: Data pre-processing as a preliminary for record linkage

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 10 Background  no common unique identifiers available  data from different sources are initially linked by names and addresses  different or none address standards  different notations “BMW“ or “Bayerische Motorenwerke“ or “Bay. Motorenwerke“  German BR is technically limited in storing several addresses (only dispatch and domicile)

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 11 Problem of non standardized notations  matching by administrative identifiers dependent variable = match by administrative identifiers + no change in the postal code independent variable = differences between enterprise names, street names and town names (Levenshtein edit distance)  same (administrative) source  different sources (administrative source – BR)

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 12 Matching probability against string similarity within an administrative source (Employment Agency) (Model: Logistic regression)

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 13 Matching probability against string similarity between an administrative source (Employment Agency) and BR (Model: Logistic regression)

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 14 Pre-processing of administrative data for record linkage high level of similarity between two strings  identical units high level of disparity between two strings  different units differences in name or address lowhigh identical unit different unit

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 15 Pre-processing of administrative data for record linkage  conversion into specific variables for string matching BMW AG Branch Munich Mr Mueller enterprise name: legal form: other elements: BMW AG Branch Munich Mr Mueller enterprise address  simplify comparison strings

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 16 Methods for evaluation  evaluate link between string similarity and match before and after pre-processing the data  evaluation of matching results  (drawing sample after matching process)  classification of the matching results  calculation of sensitivity, specificity, positive predictive value, negative predictive value  controlling for effects caused by the used matching program

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 17 Synopsis  BR text data needs special treatment in data processing  applications for regular expressions  simple application: legal form coding (limited set of search pattern)  more complex application: pre-processing (set of pattern depends on data source and later use)  application of regular expressions should always be evaluated

© Federal Statistical Office Germany, IV A2 – Patrizia Moedinger Federal Statistical Office Germany Slide 18 Thank you for your attention.