Presentation is loading. Please wait.

Presentation is loading. Please wait.

provision of statistics in Germany

Similar presentations


Presentation on theme: "provision of statistics in Germany"— Presentation transcript:

1 provision of statistics in Germany
Tobias Link KOSIS-Gemeinschaft Urban Audit Progress Meeting on sub-national statistics Luxembourg, November 7, 2018

2 Getting the data into shape Processing the data
Overview Finding the data Getting the data into shape Processing the data Data validation and transmission

3 Finding the data 38% 40% 9% 13% Data Sources
Regionaldatenbank Deutschland DESTATIS (NSI) Statistische Landesämter Estimates on basis of Mikrozensus data Estimates on basis of data by the Federal Employment Agency Urban Audit Cities Federal Office for Motor Traffic Police Crime Statistics (BKA, LKAs) Eurostat Filmförderungsanstalt (public film funding) German statistics on libraries (DBS) Institute for Museum Research Deutscher Bühnenverein German Real Estate Association (IVD) Association of German Transportation Companies e. V. (VDV) taxi-rechner.de OpenStreetMap 38% 40% 9% 13%

4 Getting the data into shape
Data comes in different formats Excel files (good) Web database (good) PDF files (it depends …) Print publications (lots of work)

5 Getting the data into shape
Data in Excel files Getting the data you need out of an excel file is usually easy given there´s a useful key column Amtlicher Gemeindeschlüssel“ AGS (official municipality key)

6 Getting the data into shape
Data in Web Databases Data in Web Databases can either be extracted using a webinterface or doing a query on the database directly. In Germany all GENESIS Online based Data portals (e.g. Regionaldatenbank Deutschland) support direct queries/deep links via the URL bar of your web browser:

7 Getting the data into shape
Data in Web Databases: Regionaldatenbank Deutschland

8 Getting the data into shape
Data in Web Databases: Urban Audit Cities Survey

9 Getting the data into shape
Data in PDF files Data in PDF files can be more challenging to extract. If the PDF contains data tables in text format, there are several ways to extract the data: In R using packages like pdftools or tm (text mining) In Python using similar libraries PyPDF2 or pdfminer Apps like Tabula ( Commandline based tools like poppler/pdftotext

10 Getting the data into shape
Data in PDF files It is rare that extracting tabular data from PDF files works the way you want on first try. Normally you need to get rid of artifacts like unneeded text blocks, spacing, wrong separators, etc. To accomplish this you can use Regular Expression based replacing operations in either R, Python or a text editor.

11 Getting the data into shape
Data in print publications Lots of manual/tedious work Hard to automate (OCR Scanning?) Motivates to look for alternative data sources

12 Processing the data Having a standardized way to process the data
The German Urban Audit data catalogue is organized as 1 Excel file per variable and grant period (2 years) Each year with existing data is structured in 4 tabs: Complete source data table with a reference to the source file and if needed derived tables with calculated values Table with a fixed structure containing only the data needed for calculating the city and FUA values (extracted from the first tab via vlookup) Table containing the city values from the 2nd tab (Pivot table) Table containing the aggregated/calculated FUA values from the 2nd tab (Pivot table)

13 Processing the data Having a standardized way to process the data: Source Data

14 Processing the data Having a standardized way to process the data: Extracted Data

15 Processing the data Having a standardized way to process the data: Cities Pivot Table

16 Processing the data Having a standardized way to process the data: FUAs Pivot Table

17 Data validation and transmission
Data validation using the EDIT Tool

18 Data validation and transmission
Data validation using the EDIT Tool: Checking the error log file

19 Data validation and transmission
Transmitting the data using EDAMIS: CSV File Structure URBANREG_EC_A3_DE_2016_V1.csv

20 Data validation and transmission
Transmitting the data using EDAMIS

21 Thank you for your attention!


Download ppt "provision of statistics in Germany"

Similar presentations


Ads by Google