Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling direct data access to social science research data

Similar presentations


Presentation on theme: "Enabling direct data access to social science research data"— Presentation transcript:

1 Enabling direct data access to social science research data
within the GESIS Data Catalogue DBK Wolfgang Zenk-Möltgen DI4R Digital Infrastructures for Research 2016 28-30 September, Krakow, Poland

2 The Data Catalogue DBK Social science research data
Surveys and survey programmes Time series Comparative studies Broad range of topics covered DBK at the GESIS Data Archive Contains study descriptions of primary data from survey research and historical time series Has more than 5100 studies, e.g. ALLBUS, ISSP, Eurobarometer, Politbarometer, DeutschlandTrend, European Values Study, German Longitudinal Election Study, Comparative Study of Electoral Systems, Youth Studies and many more… Uses a generic metadata schema for the description of social science data Was developed since the 1960ies, together with international archives and the DDI Alliance Provides metadata management to publish it in different retrieval and distribution platforms Includes a version history for the datasets, including errata documentation and persistent identifiers (DOIs by da|ra)

3 Metadata access possibilities
OAI-PMH provides Dublin Core, DDI-C/L, DCAT Metadata for discovery Web UI API via OAI-PMH available, but no data API.

4 Current data access possibilities
Users download or order data needs for more diverse data formats interdisciplinary analysis methods linking data from different sources data which are rapidly changing Currently internal project to convert data automatically Based on ascii, csv data + definitions Rectangular data files from 100kB to 221MB, mean 12 MB (43.4GB in total) Columns=Variables. Rows=Cases. (datorium: single twitter dataset 7.8GB) But this still requires a download from the users

5 The Data Tank Open source data API diverse input formats
diverse output formats via API XML, HTML, JSON, or CSV Supports semantic technologies DCAT-AP metadata Apache, MySQL, PHP stack To provide an API for the users Established standards should be used RESTful services No data duplication should be needed by Open Knowledge Belgium

6 Installation and integration
Installation issues due to Windows (WAMP) - Using different ports, since iis+mysql already on server running PHP extension under Windows not supported (ext-pcntl) Dbase extensions not working Caching module (memcached) not available, used file cache instead Data issues due to CSV format Tab delimiter not supported Double quote text qualifier not supported Design with „Collection“ and „URI“ for datasets has to be defined WAMP Server Using data from DBK Existing CSV datasets Define Collection & URI Map some metadata

7 Challenges and problems
Research Workflow Access restrictions by dataset Versioning Usage stats needed Performance Selecting variables (paging by columns) Research question Find data Analyze data Social science data Privacy concerns even after anonymization Archive creates updates, versions for the dataset – API structure Archive needs statistics about usage Data API must be reliable and quick Rectangular dataset is never used completely, variable selection seems necessary

8 Future Thank you! User has a personal space at DBK
personal data access possibilities personal statistics could also be used for uploading resulting datasets by users to enable preservation, citation, and sharing Thank you! After solving all the issues Researchers may have direct access to data via API Organized by a „personal space“ at DBK Can show also statistics for users Thank you very much! Contact:


Download ppt "Enabling direct data access to social science research data"

Similar presentations


Ads by Google