Storage for Run 3 Rainer Schwemmer, LHCb Computing Workshop 2015.

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

Storage Devices.
Sean Traber CS-147 Fall  7.9 RAID  RAID Level 0  RAID Level 1  RAID Level 2  RAID Level 3  RAID Level 4 
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
Performance benchmark of LHCb code on state-of-the-art x86 architectures Daniel Hugo Campora Perez, Niko Neufled, Rainer Schwemmer CHEP Okinawa.
How to Cluster both Servers and Storage W. Curtis Preston President The Storage Group.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
RAMCloud Design Review Recovery Ryan Stutsman April 1,
L/O/G/O External Memory Chapter 3 (C) CS.216 Computer Architecture and Organization.
Chapter 2: CPU &Data Storage. CPU Each computer has at least one CPU Each computer has at least one CPU CPU execute instructions to carry out tasks –
Maintaining File Services. Shadow Copies of Shared Folders Automatically retains copies of files on a server from specific points in time Prevents administrators.
Seth Moore 4 th Period Computers typically play both DVDs and CDs in the same drive. DVD can hold approximately 5GB of information. They first broke.
Advanced Diploma 1 Backing Storage. Advanced Diploma 2 Aims Understand how data is stored Be able to use the binary system to represent ASCII characters.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
1 3 Computing System Fundamentals 3.2 Computer Architecture.
The disk surface is divided into tracks. into tracks. 1.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
HPC system for Meteorological research at HUS Meeting the challenges Nguyen Trung Kien Hanoi University of Science Melbourne, December 11 th, 2012 High.
Inside your computer. Hardware Review Motherboard Processor / CPU Bus Bios chip Memory Hard drive Video Card Sound Card Monitor/printer Ports.
Inside your computer. Hardware Motherboard Processor / CPU Bus Bios chip Memory Hard drive Video Card Sound Card Monitor/printer Ports.
Lessons from HLT benchmarking (For the new Farm) Rainer Schwemmer, LHCb Computing Workshop 2014.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
Storage of Data Instructions and data are held in main memory which is divided into millions of addressable storage.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Niko Neufeld, CERN. Trigger-free read-out – every bunch-crossing! 40 MHz of events to be acquired, built and processed in software 40 Tbit/s aggregated.
Moore vs. Moore Rainer Schwemmer, LHCb Computing Workshop 2015.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
Chapter 6 Discovering Computers Fundamentals Storage.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
AMS02 Data Volume, Staging and Archiving Issues AMS Computing Meeting CERN April 8, 2002 Alexei Klimentov.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
1 Thierry Titcheu Chekam 1,2, Ennan Zhai 3, Zhenhua Li 1, Yong Cui 4, Kui Ren 5 1 School of Software, TNLIST, and KLISS MoE, Tsinghua University 2 Interdisciplinary.
HNC COMPUTING - COMPUTER PLATFORMS 1 Computer Platforms Week 2 Backing Storage.
BIG DATA/ Hadoop Interview Questions.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Basic Guide to Computer Backups Eric Moore Computer Users Group of Greeley September 13, 2008.
CS Introduction to Operating Systems
Basic Guide to Computer Backups
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Diskless Networks By Sam Morris.
Hadoop Aakash Kag What Why How 1.
LCG Service Challenge: Planning and Milestones
Computational Requirements
What every server wants!
Disks and RAID.
Database Management Systems (CS 564)
Scalability to Hundreds of Clients in HEP Object Databases
Cluster Disks and Cluster File Storage
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Status and Prospects of The LHC Experiments Computing
Backing Store.
The LHC Computing Grid Visit of Her Royal Highness
Bernd Panzer-Steindel, CERN/IT
Disks and Files DBMS stores information on (“hard”) disks.
Lecture 13 I/O.
Secondary Storage Devices
Example of DAQ Trigger issues for the SoLID experiment
R. Graciani for LHCb Mumbay, Feb 2006
Computers: Tools for an Information Age
Persistence: hard disk drive
High-Performance Storage System for the LHCb Experiment
13.3 Accelerating Access to Secondary Storage
Presentation transcript:

Storage for Run 3 Rainer Schwemmer, LHCb Computing Workshop 2015

Current Situation daqarea –Mix of 2 and 4 TB disks –340 TB –Max Sustained read or write ~ 7 GB/s –Mixed read and write ~ 3 GB/s farm –Mix of 2 and 4 TB disks –HLT1 reduces data rate by a factor of approximately 4 (250 kHz) –Writing at ~7 MB/s –No problem for current gen disk drives –Min write and read ~ 60 MB/s 1 –Concurrent read/write ~ 20 MB/s 2 1) I know that disks can do 150 MB/s, but this is only true on the very outside of the platters

Problems with the current system daqarea –File fragmentation due to excessive number of parallel streams O( ) –Throughput goes to below what’s needed if more than 70% full –Have to write all data once (1 GB/s) –Need to read all data twice (+2 GB/s) for verification/checksums and castor copy –System needs to be severely overdesigned to cope with mixed read/write workload Farm –Every node has its own, individual file system  Every farm node processes runs at its own pace  Which leads to fragmentation problem in daqarea –Failing drives/farm nodes Every failing node delays processing of all the runs it has files of 3

Future 4

Farm HLT1 output rate is estimated at 1 MHz  100 Gbyte /s instead of current 13 Gbyte/s Need to scale everything up by about a factor 10! If the size of the farm is comparable –4 TB disks  40 TB disks –7 MB/s  70 MB/s + 70 MB/s concurrent reading 5

Kryder’s law 6

Farm cont. Biggest drives on the market today: 8 TB –Will most certainly not increase by x5 over the next 3-4 years Disk throughput is already not increasing at the rate of capacity –70 MB/s mixed read/write is certainly not possible  Depending on farm size (< ~8000 nodes) we will not be able to store HLT1 output locally anymore The current model of individual, local file systems per node is imho already not sustainable –Too much manual intervention –If we do continue with local storage in farm nodes we need a better system of securing data against node failure –Processing of runs needs to become more synchronized 7

daqarea Projected output rate (at 100 kHz): 10 Gbyte/s –Need to read data at least twice and write once  30 Gbyte/s minimum  Need O( ) Gbyte/s of aggregated disk performance Individual disk might get to MB/s –Need O(3000) drives for throughput reasons alone –Currently 140 8

Discussion We need to overcome the storage gap created by the slow down in disk capacity increase It might be worth looking into common storage for deferred data and output data –Requirements are similar –Everything on the surface in the future anyway –It seems very unlikely that we can stuff all the deferred data into the farm nodes Possibly have individual “small” storage clusters on a sub-farm level for deferred and data output Look into rotated reading/writing ala Alice to cut down on overdesign for mixed read/write rates 9