TSD: a Secure and Scalable Service for Sensitive Data and eBiobanks Gard Thomassen, PhD Head of Research Support Services Group University Center for Information Technology (USIT) University of Oslo
What is sensitive data? Norway : Personal Data Act §2, point 8 – race/ethnic data, political opinion, philosophical and religious beliefs, the fact that a person has been suspected of, charged with, indicted for or convicted a criminal act, health, sex life and trade-union membership
System requirements Security, isolation and access control as given by law Large storage capacity Multi tenant (multiple users) High performance computing (HPC) resource High bandwidth Easy to maintain and operate Easy to use and “practical” (also for audio and video) Some freedom within confined user space Accessible from anywhere through proper mechanisms A variety of software and public data-sources must be available Windows and Linux support (server/host-side) Data collection services Data sharing services
System outline Gateway HPC - ColossusVM-server Storage Internet Secure network to special high volume data production sites 1 (project) 1 (storage area) n 1
Using TSD VM U 1 S 1 S1S1 TSD disk VM U 2 S 1 GW User 1 Study 1 Colossus disk Colossus Front end Colossus User 2 Study 1 TSD S 1 DB
Data import and export using TSD File lock server Virtual file lock server Virtual project- server File lock HD Project HD TSD NFS mount 2 Data copied here by SFTP (2-factor authentication) encrypted data if sensitive 1 4 3
Data collection using TSD “Nettskjema-minID” “Nettskjema-minID” Nettskjema homepage minID Project VM Project disk File lock Encrypted XML (PGP) TSD
TSD status > 80 research projects > 350 users Secure storage (> 1 PiB on disk) Secure data analysis Linux or windows hosts (> 250 VMs) Secure import and export Web-based data harvesting HPC cluster (>1500 cores) Postgres DBs Video and sound display
Future of TSD - main topics How to handle video and sound – harvesting – management – metadata – analysis Journal system for Psychologists (Univ of Umeå collaboration) Biobanks PCoIP & Thinlinc VMware and VDI infrastructure Galaxy inside TSD Elixir helpdesk connected to TSD Hosting docker containers
People involved Project group / developers IT-dir Lars Oftedal Hans A. Eide Märtha Felton Administration / associated