Download presentation
Presentation is loading. Please wait.
1
provision of statistics in Germany
Tobias Link KOSIS-Gemeinschaft Urban Audit Progress Meeting on sub-national statistics Luxembourg, November 7, 2018
2
Getting the data into shape Processing the data
Overview Finding the data Getting the data into shape Processing the data Data validation and transmission
3
Finding the data 38% 40% 9% 13% Data Sources
Regionaldatenbank Deutschland DESTATIS (NSI) Statistische Landesämter Estimates on basis of Mikrozensus data Estimates on basis of data by the Federal Employment Agency Urban Audit Cities Federal Office for Motor Traffic Police Crime Statistics (BKA, LKAs) Eurostat Filmförderungsanstalt (public film funding) German statistics on libraries (DBS) Institute for Museum Research Deutscher Bühnenverein German Real Estate Association (IVD) Association of German Transportation Companies e. V. (VDV) taxi-rechner.de OpenStreetMap 38% 40% 9% 13%
4
Getting the data into shape
Data comes in different formats Excel files (good) Web database (good) PDF files (it depends …) Print publications (lots of work)
5
Getting the data into shape
Data in Excel files Getting the data you need out of an excel file is usually easy given there´s a useful key column Amtlicher Gemeindeschlüssel“ AGS (official municipality key)
6
Getting the data into shape
Data in Web Databases Data in Web Databases can either be extracted using a webinterface or doing a query on the database directly. In Germany all GENESIS Online based Data portals (e.g. Regionaldatenbank Deutschland) support direct queries/deep links via the URL bar of your web browser:
7
Getting the data into shape
Data in Web Databases: Regionaldatenbank Deutschland
8
Getting the data into shape
Data in Web Databases: Urban Audit Cities Survey
9
Getting the data into shape
Data in PDF files Data in PDF files can be more challenging to extract. If the PDF contains data tables in text format, there are several ways to extract the data: In R using packages like pdftools or tm (text mining) In Python using similar libraries PyPDF2 or pdfminer Apps like Tabula ( Commandline based tools like poppler/pdftotext
10
Getting the data into shape
Data in PDF files It is rare that extracting tabular data from PDF files works the way you want on first try. Normally you need to get rid of artifacts like unneeded text blocks, spacing, wrong separators, etc. To accomplish this you can use Regular Expression based replacing operations in either R, Python or a text editor.
11
Getting the data into shape
Data in print publications Lots of manual/tedious work Hard to automate (OCR Scanning?) Motivates to look for alternative data sources
12
Processing the data Having a standardized way to process the data
The German Urban Audit data catalogue is organized as 1 Excel file per variable and grant period (2 years) Each year with existing data is structured in 4 tabs: Complete source data table with a reference to the source file and if needed derived tables with calculated values Table with a fixed structure containing only the data needed for calculating the city and FUA values (extracted from the first tab via vlookup) Table containing the city values from the 2nd tab (Pivot table) Table containing the aggregated/calculated FUA values from the 2nd tab (Pivot table)
13
Processing the data Having a standardized way to process the data: Source Data
14
Processing the data Having a standardized way to process the data: Extracted Data
15
Processing the data Having a standardized way to process the data: Cities Pivot Table
16
Processing the data Having a standardized way to process the data: FUAs Pivot Table
17
Data validation and transmission
Data validation using the EDIT Tool
18
Data validation and transmission
Data validation using the EDIT Tool: Checking the error log file
19
Data validation and transmission
Transmitting the data using EDAMIS: CSV File Structure URBANREG_EC_A3_DE_2016_V1.csv
20
Data validation and transmission
Transmitting the data using EDAMIS
21
Thank you for your attention!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.