Dr Tim Smith CERN/IT For the visit of the Alliance of German Science Organizations
[Oct 2013] - 2 As Designed: W. LHC Computing Grid Distributed Data Management – Limited Network resources – Optimize / minimize movement – File placement logic – Deterministic / Static Site Data Management – HSMs – Transparent file access and movement Disk-Tape migration/recall
[Oct 2013] - 3 Research Data Infrastructure of today Distributed Data Management – Network: a resource to schedule – Dynamic data placement – Data transfer services – Expt replica management rules Site Data Management – Indep. technology choices – Decoupled tiers – Disk caches Managed by owners – Bulk 3 rd party migration to tertiary by owners AAA: any data, any time, any where
[Oct 2013] - 4 CERN Infrastructure of tomorrow Connectivity (100 Gbps) 2015: 15k servers, 300k VMs
[Oct 2013] - 5 Big Data … in small pieces Long tail of science Big facilities Data Size x (a small number) x (a large number) Dedicated Big Data Stores
[Oct 2013] - 6
[Oct 2013] - 7 Naming Zenodotus of Ephesus – First librarian of the Ancient Library of Alexandria – First recorded use of metadata
[Oct 2013] - 8 Features
[Oct 2013] - 9 Communities
[Oct 2013] - 10 Deposit
[Oct 2013] - 11 HEP: Data Reduction / Analysis Publication Reduced Reconstructed Raw Researchers T2s, T1s Analysis Coordinators T1s Production Managers T0, T1s File Size # Files
[Oct 2013] - 12 HEP: More than Data Papers Tabular Data Correlation Matrices Internal Notes Wikis Presentations Quality monitoring data Filter / selection algorithms Formatters Calibration Data Conditions Data Log Books Researchers T2s, T1s Analysis Coordinators T1s Production Managers T0, T1s Workflows Contextual metadata SW: 10M LoC
[Oct 2013] - 13 Deposit
[Oct 2013] - 14 Differentiating Features Easy to use and attractive – DropBox integration – Drag-n-drop deposition Low barriers – Little fixed metadata Open on input as well as output – No restrictions on type of data – No restrictions on format of data – No restrictions on licences Distributed community curation
[Oct 2013] - 15 Retro/Per -spective OpenAIRE – FP7 Open Access pilot for peer reviewed articles OpenAIREplus – FP7 OA pilot for publications and research data CERN – Cloud Service
[Oct 2013] - 16 Interested Communities Workshops – Proceedings and presentations Projects – Research output and project artifacts Research Groups – Datasets – snapshots of a live store Universities – Datasets and articles Libraries – Newsletters – Data not fitting in traditional repositories Publishers – Publication/subsidiary datasets and software – Scanned and annotated logbooks Young Radiation Oncologists’ Conference
[Oct 2013] - 17 Perceived Attraction Trust / Security / Know-how – LHC data is thought safe there – Bit Preservation & Media Migration Longevity – An institute with a clear future – A memory institution for HEP Not a company Not a profit enterprise – No tricks and changes
[Oct 2013]