Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t DSS CERN and Computing … … and Storage Alberto Pace Head, Data.

Slides:



Advertisements
Similar presentations
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR Status Alberto Pace.
Advertisements

Cloud Storage in Czech Republic Czech national Cloud Storage and Data Repository project.
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
CERN IT Department CH-1211 Genève 23 Switzerland t Messaging System for the Grid as a core component of the monitoring infrastructure for.
Quantum Confidential | LATTUS OBJECT STORAGE JANET LAFLEUR SR PRODUCT MARKETING MANAGER QUANTUM.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Introduction to data management.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Introduction to data management.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
CERN IT Department CH-1211 Genève 23 Switzerland t Plans and Architectural Options for Physics Data Analysis at CERN D. Duellmann, A. Pace.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS From data management to storage services to the next challenges.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
CERN IT Department CH-1211 Genève 23 Switzerland t Castor development status Alberto Pace LCG-LHCC Referees Meeting, May 5 th, 2008 DRAFT.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Virtual visit to the computer Centre Photo credit to Alexey.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
VMware vSphere Configuration and Management v6
Future home directories at CERN
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Federated Data Stores Volume, Velocity & Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Introduction to data management Alberto Pace.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Data architecture challenges for CERN and the High Energy.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
DDN Web Object Scalar for Big Data Management Shaun de Witt, Roger Downing (STFC) Glenn Wright (DDN)
Hepix spring 2012 Summary SITE:
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Storage plans for CERN and for the Tier 0 Alberto Pace (and.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Federating Data in the ALICE Experiment
MERANTI Caused More Than 1.5 B$ Damage
Ian Bird, CERN & WLCG CNAF, 19th November 2015
Managing Storage in a (large) Grid data center
Introduction to Data Management in EGI
Ákos Frohner EGEE'08 September 2008
CTA: CERN Tape Archive Overview and architecture
INFNGRID Workshop – Bari, Italy, October 2004
Presentation transcript:

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CERN and Computing … … and Storage Alberto Pace Head, Data and Storage Services group CERN, Geneva, Switzerland Big Data processing and analysis challenges in mega-science experiments January 29-31, 2015, Moscow, Dubna, Russia

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 2 Scientific Computing Scientific computing for large experiments is typically based on a distributed infrastructure Storage is one of the main pillars A key parameter in distributed scientific computing is the efficiency –High efficiency requires to have the CPUs colocated with the data to analyze, using the network. DATA CPU NET Scientific Computing

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 3 Dataflows for Science Storage in scientific computing is distributed across multiple data centres Data flows from the experiments to all datacenters where there is CPU available to process the data Tier-0 (LHC Experiments) Tier-2 Tier-1

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 4 Computing Services for Science Whenever a site has … –idle CPUs (because no Data is available to process) –or excess of Data (because there is no CPU left for analysis) … the efficiency drops

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 5 Why storage is complex ? Analysis made with high efficiency requires the data to be pre-located to where the CPUs are available Tier-0 (LHC Experiments) Tier-2 Tier-1

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 6 Why storage is complex ? Analysis made with high efficiency requires the data to be pre-located to where the CPUs are available Or to allow peer-to peer data transfer –This allows sits with excess of CPU, to schedule the pre- fetching of data when missing locally or to access it remotely if the analysis application has been designed to cope with high latency Tier-0 (LHC Experiments) Tier-2 Tier-1

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 7 Why storage is complex ? Both approaches coexists in WLCG Data is pre-placed –This is the role of the experiments that plans the analysis Data is globally accessible and federated in a global namespace –The middleware always attempt to take the local data and uses an access protocol that redirects to the nearest remote copy when the local data is not available –All middleware and jobs are designed to minimize the impact of the additional latency that the redirection requires Using access protocols that allows global data federation is essential –http, xroot

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 8 Within a Storage site A simple storage model: all data into the same storage –Uniform, simple, easy to manage, no need to move data –Can provide sufficient level of performance and reliability TIER-2 (or 3) STORAGE

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 9 Limitations Different storage solutions can offer different quality of services –Three parameters: Performance, Reliability, Cost –You can have two but not three To deliver both performance and reliability you must deploy expensive solutions –Ok for small sites (you save because you have a simple infrastructure) –Difficult to justify for large sites Slow Expensive Unreliable TapesDisks Flash, Solid State Disks Mirrored disks

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 10 What is data management ? Examples from LHC experiment data models Two building blocks to empower data analysis –Data pools with different quality of services –Tools for data transfer between pools

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 11 Storage services Storage need to be able to adapt to the changing requirement of the experiments –Cover the whole area of the triangle and beyond Slow Expensive Unreliable

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 12 Examples Why multiple pools with different quality of services are needed ? Derived data used for analysis and accessed by thousands of nodes –Need high performance, Low cost, minimal reliability (derived data can be recalculated) Raw data that need to be analyzed Need high performance, High reliability, can be expensive (small sizes) Raw data that has been analyzed and archived Must be low cost (huge volumes), High reliability (must be preserved), performance not necessary

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 13 But the balance is not simple Many ways to split (performance, reliability, cost) Performance has many sub-parameters Cost has many sub-parameters Reliability has many sub-parameters Reliability Performance Latency / Throughput ScalabilityElectrical consumption HW cost Ops Cost (manpower) Consistency Cost

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 14 And reality is complicated Key requirements: Simple, Scalable, Consistent, Reliable, Available, Manageable, Flexible, Performing, Cheap, Secure. Aiming for “à la carte” services (storage pools) with on-demand “quality of service” And where is scalability ?

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 15 Tools needed Needed: –Tools to transfer data effectively across pools of different quality –Storage elements that can be reconfigured “on the fly” to meet new requirements without moving the data Examples –Moving petabytes of data from a multiuser disk pool into a reliable tape back-end –Increasing the replica factor to 5 on a pool containing condition data requiring access form thousands of simultaneous users –Deploying petabytes of additional storage in few days

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 16 Roles Storage Services Three main roles –Storage (store the data) –Distribution (ensure that data is accessible) –Preservation (ensure that data is not lost) Tier-2, Tier-3 Tier-1 Tier-0 (LHC Experiments) Data production, Storage, data distribution, data preservation Storage, data distribution Size in PB Availability Reliability

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 17 Reliability … You can achieve high reliability using standard disk pools –Multiple replicas –Erasure codes Here is where tapes can play a role –Tapes ??

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 18 Do we need tapes ? Tapes have a bad reputation in some use cases –Slow in random access mode high latency in mounting process and when seeking data (F-FWD, REW) –Inefficient for small files (in some cases) –Comparable cost per (peta)byte as hard disks

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 19 Do we need tapes ? Tapes have a bad reputation in some use cases –Slow in random access mode high latency in mounting process and when seeking data (F-FWD, REW) –Inefficient for small files (in some cases) –Comparable cost per (peta)byte as hard disks Tapes have also some advantages –Fast in sequential access mode > 2x faster than disk, with physical read after write verification (4x faster) –Several orders of magnitude more reliable than disks Few hundreds GB loss per year on 80 PB raw tape repository Few hundreds TB loss per year on 50 PB raw disk repository –No power required to preserve the data –Less physical volume required per (peta)byte –Inefficiency for small files issue resolved by recent developments –Nobody can delete hundreds of PB in minutes Bottom line: if not used for random access, tapes have a clear role in the architecture

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 20 Verifying more than 1PB in 24h Source: G. Cancio / CERN

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Practically … which solution ?

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 22 … not a single answer There are fixed costs for the skilled manpower that is required to manage the high efficiency but complex solutions –Disk technology, Database technology, Tape technology, Networks, Databases… COST STORAGE SIZE Tier-2 (partial infrastructure) Tier-1 (complete infrastructure)

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 23 AT CERN Many options available –interoperability is a key requirement for WLCG –DPM, dCache AT CERN –Separation between HOT (disks) and COLD (tapes) storage –CASTOR (used for the tape storage backend) –EOS (used for the disk storage frontend) guaranteed low-latency file access for analysis collaborative storage platform with space and access regulations expand/exchange/shrink during operation semi-automatic disk ejection & healing – reduced maintenance

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 24 Why this diversity Example: –DPM is a consolidated product which is fully supported by CERN. It meets all requirements of a small site Integrates well within both xroot and http data federations –EOS is the new design for a scalable infrastructure designed to minimize management cost. It is ready to meet the requirements for larger sites and multi-site environments Larger functionalities

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 25 Summary Storage for High Energy Physics or Scientific computing can be –Too simplistic –Simple Typical for Small Tier-2 –Scalable and Reliable Typical for a Tier-1 / Large Tier-2 –Too complex Behind the software you use, it is always the people that makes the difference

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Areas of developments in storage at CERN For EOS (Reserve slides) Source: A.J.Peters/ CERN

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 27 EOS design No need for a central relational database Designed for infinite scalability and arbitrary reliability expand/exchange/shrink during operation semi-automatic disk ejection & healing Multi site federation support –Wigner / Geneva > 1000 Km distance

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 28 EOS Arbitrary Reliability – RAIN –Redundant Array of Independent Nodes

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 29 EOS in production at CERN

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 30 EOS in production Five multi-PB installations –few thousand disks per instance Simplified life-cycle management workflows for on-going replacement/repair of hardware JBOD disks (no RAID controller) using software RAIN low-latency file access with in-memory namespace - 174M files fine-grained access control and quota management GRID storage element

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services DSS 31 Accessed from thousands of CERN-local and remote batch nodes with Kerberos and GSI authentication Full support for XRootD, GridFTP, HTTP(S) & WebDav protocol - and – mountable with FUSE as a remote file system