Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Intensive Research in South Africa

Similar presentations


Presentation on theme: "Data Intensive Research in South Africa"— Presentation transcript:

1 Data Intensive Research in South Africa
Anwar Vahed March 2017 March 2017 © CSIR, 2017

2 Outline SA investment in research data NICI and DIRISA
South African National Data Infrastructure & Services (SANDIS) Current status Next steps March 2017 © CSIR, 2017

3 National Data Investment
SAEON: Environmental HSRC: Human Sciences and Humanities DataFirst: Survey and Administrative  Agincourt: Social and Health demographics SANSA: Earth Observation SADA: Survey & related SAAO: Astronomy Govt departments: DWA... Academia & research councils Meteorology, EO, Climate Change, Water, Energy, Health… SKA projected budget € 2 billion to 2020 € 650 million (Phase 1) SA so far: R2 billion

4 National Integrated Cyberinfrastructure (NICI)
Amalgamated physically distributed cyber platform for e-research Data Networking HPC Overarching coordination & national strategy National (Tier1) Regional (Tier2) Institutional (Tier 3) Priority & cross-cutting domains Phy Sci & Eng. Health, Bio & Food Energy Earth & Environment Humans & Society Materials & Manuf. Data intensive research environments (Cloud) Computing Services (CHPC +) Networking Services (SANReN) Data Services (DIRISA) Core services Networked resources Skills & expertise March 2017 © CSIR, 2017

5 DIRISA Objectives Sustained, federated and Trusted nodes
Research Ecosystems: cross & multi disciplinary research Data Services: harmonised data management Federated Data Infrastructure: observations (models and measurements) Robust infrastructure & services Sustained, federated and Trusted nodes Enabling environments (VREs/Gateways) Sound data management Policies, practices and standards Internationally benchmarked (Certification) Capacity & expertise Data intensive research & management Data Science “engineers” Advocacy & outreach Data sharing Stakeholder engagement Coordination & strategy National data intensive research activities Inform strategic agenda © CSIR, 2015

6 South African National Data Infrastructure and Services (SANDIS)
Collaborators RDA CODATA WDS DCC EUDAT ANDS UK D_A Data.gov March 2017 © CSIR, 2017

7 Phase2: Collaborative Research Environments
SANDIS Services DSubscribe Register as DIRISA user DataDrop Deposit and store data reliably FindGet Discover, download data sets SafeShare Safely share data with users DataStage Prepare data for processing User documentation Help & support Core services (DMP, DOI) Phase2: Collaborative Research Environments My data management plans My workflows My data sets and outputs My communities Community driven

8 Automated Data Lifecycle Management
PID Registries (People, Objects, Activities): DONA, DataCite; ORCID Data must be FAIR Tiered harvesting with Trusted Repository Certification: WDS DSA March 2017 © CSIR, 2017

9 So far... (Current status)
Robust infrastructure and services Regional Tier 2 data node (Western Cape): Astronomy & Bioinformatics SANDI services: Server cluster, DCC DMP tool, DONA and DOI registry Data management Policies (Open and not Open): Regulatory (POPI Act), Ethical Guidelines for EULA, SA DMP, subscription Capacity & expertise National MSc in eScience; NRF call for Data Intensive Research projects Collaborations on data fundamentals courses (Data & Software Carpentry; IBM), Advocacy & outreach Workshops, DMP roadshow, SARDA, USAf, ASSAF, NRF (OA), DST, DHET SciDataCon, RDA, eResearch conference; RDA, CODATA, WDS, SKA Coordination & strategy National data intensive research strategy SADC Cyberinfrastructure framework

10 Next... (Roadmap) Infrastructure Data management Capacity development
Active & Passive data: 8 PB with cloud services & 40 PB tape Domain specific scientific workflow systems (VREs) Data management Across entire data lifecycle Domain specific and computer actionable policies (DMP) Capacity development Cross disciplinary PhD & post-doc in data sciences; engineers for the data sciences Outreach National Data conference; SARDA; IDW2018? Africa take leading role Strategy Shared cyberinfrastructure for African solutions

11 In conclusion (Why Open Science?)
Data Driven Research: The Fourth Paradigm Focus on data-intensive systems and rapid, trustworthy scientific communication Increase quality: “Those who share data, do better science” Increase depth and speed: Reused data allow us “to see further ... and to do it all more quickly and easily” Maximise RoI: Much effort (and cost) in data collection but not rewarded. Trend is changing Provenance: Data represents our heritage March 2017 © CSIR, 2017

12 Thank you October 2016 © CSIR, 2016

13 Issues Regulations & bureaucracy Resources Aging infrastructure
Attitude Plan A, Plan B, Plan C... Islandora for data deposit Server cluster (March) RFP for 8 PB upgrade + OpenStack cloud services March 2017 © CSIR, 2017

14 October 2016 © CSIR, 2016


Download ppt "Data Intensive Research in South Africa"

Similar presentations


Ads by Google