Research Data Management towards Data Integration Roman Gerlach, Birgitta König-Ries, Javad Chamanara, David Blaa, Sven Thiel, Martin Hohmuth, Nafiseh Navabpour Friedrich-Schiller-University, Jena (Germany) Endowed Chair for Distributed Information Systems (Research Data Management Helpdesk)
Intro BEXIS 2 is: Data Management Platform (i.e. software) designed for large research projects with central data management (incl. data manager) focus on active data (i.e. project live time) focus on tabular data, but not limited to focus on data integration and re-use generic, scalable, modular, free and open source roman.gerlach@uni-jena.de
BExIS++ Project (DFG) BEXIS 2 SOFTWARE SUSTAINABILITY DEVELOPMENT OUTREACH SUPPORT TRAINING roman.gerlach@uni-jena.de
BExIS Community BEXIS 2 BExIS++ BExIS AquaDiva iDiv TerraSensE GFBio UFZ Halle BExIS++ Biodiversity Exploratories Kilimanjaro GRK 1086 Jena Experiment EFForTS MPI-BGC Research Database BEFmate GRK 1666 BExIS roman.gerlach@uni-jena.de
What do we do to facilitate data integration and re-use? roman.gerlach@uni-jena.de
No Data in Black Boxes roman.gerlach@uni-jena.de
Let‘s take a look inside! Carl Zeiss Jena Biotar 2.0/58mm f. Exakta (http://www.klassik-cameras.de/Biotar.html) roman.gerlach@uni-jena.de
Heterogenity roman.gerlach@uni-jena.de
Heterogenity For example: 18,200 different variables in 856 datasets Download of templates mapped into ~80 Data Attributes roman.gerlach@uni-jena.de
Example: Tabular data headers Data Type: DateTime Unit: None Data Structure Unit: Time Unit: Celecius Data Type: Float Data Attributes Soil Sampling Timestamp Temperature Ratio Rec. Time Air Temp. Soil Temp. Humidity Variables Sharing Data attributes among variables Sharing units and data types among data attributes Good for automatic data conversion, cross data set search, and data integration 1 22 18 46 2 23 17 45 3 21 16 30 5 15 25 6 14 11 Rec. Time Air Temp. Soil Temp. Hu. 1 22 18 46 2 23 17 45 3 21 16 30 5 15 25 6 14 11 Dataset roman.gerlach@uni-jena.de
Data structure creation Providing support at dataset design time roman.gerlach@uni-jena.de
Data Package Red classes come from other packages roman.gerlach@uni-jena.de
Views Subset of a dataset obtained by selection or projection Purpose Further processing, sharing or sampling Security /Digital rights management Spanning view View across multiple dataset using the same Data Structure Only data structure? How about same attributes? Does not apply! roman.gerlach@uni-jena.de
Metadata level roman.gerlach@uni-jena.de
Metadata level Import/export of multiple schemas/standards mapping between different schemas User-friendly tools to create metadata re-use (e.g. enter once, copy, import) guidance (e.g. terminologies, autocomplete) custom structure (standard compliant) roman.gerlach@uni-jena.de
System level Interaction with external systems Persistent Identifier Providers Authentication Providers (e.g. LDAP) Annotation Providers (GFBio terminology services) Geographic Information Systems roman.gerlach@uni-jena.de
Web API Data Access Sample REST API calls: Data http://www.name.com/api/data/6 /api/data/6?header=id,name /api/data/6?filter=(Grade>50 AND Grade <90) /api/data/6?header=id,name&filter=(Grade>50) Sample REST API calls: Metadata http://www.name.com/api/metadata/6 http://www.name.com/api/metadata/6?ConvertTo=EML roman.gerlach@uni-jena.de
Conclusion Facilitating data integration is one of the big challenges in data life cycle management Data integration starts with data design System should provide support (e.g. data structure design) roman.gerlach@uni-jena.de
Further Reading A conceptual model for data management in the field of ecology, Javad Chamanara, Birgitta König-Ries, Journal of Ecological Informatics, volume 24, November 2014, Pages 261–272, doi:10.1016/j.ecoinf.2013.12.003 An Extensible Conceptual Model for Tabular Scientific Datasets, Javad Chamanara, Michael Owonibi, Alsayed Algergawy, Roman Gerlach, The International Symposium on Challenges for Designing and Using Datasets (DATASETS 2015), June 21 - 26, 2015, Brussels, Belgium, http://www.thinkmind.org/index.php?view=article&articleid=immm_2015_5_20_98008 BEXIS 2 Tech Talk Series: https://youtu.be/ANGAVoZHTII Conceptual Model: http://fusion.cs.uni-jena.de/bppCM/index.htm roman.gerlach@uni-jena.de
Thanks! Questions? Contact: roman.gerlach@uni-jena.de http://bexis2.uni-jena.de/