PetaCache: Data Access Unleashed Tofigh Azemoon, Jacek Becla, Chuck Boeheim, Andy Hanushevsky, David Leith, Randy Melen, Richard P. Mount, Teela Pulliam,

Slides:



Advertisements
Similar presentations
S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
Advertisements

Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
Introduction to Systems Architecture Kieran Mathieson.
Planning for a Western Analysis Facility Richard P. Mount Planning for a Western Analysis FacilityPage 1.
Scientific Computing at SLAC Richard P. Mount Director: Scientific Computing and Computing Services DOE Review June 15, 2005.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Peta-Cache, Mar30, 2006 V1 1 Peta-Cache: Electronics Discussion II Presentation Ryan Herbst, Mike Huffer, Leonid Saphoznikov Gunther Haller
CHEP 2004 September 2004Richard P. Mount, SLAC Huge-Memory Systems for Data-Intensive Science Richard P. Mount SLAC CHEP, September 29, 2004.
Information Technology
25 February 2000Tim Adye1 Using an Object Oriented Database to Store BaBar's Terabytes Tim Adye Particle Physics Department Rutherford Appleton Laboratory.
CERN - European Laboratory for Particle Physics HEP Computer Farms Frédéric Hemmer CERN Information Technology Division Physics Data processing Group.
9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.
RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Planning and Designing Server Virtualisation.
May Richard P. Mount, SLAC Advanced Computing Technology Overview Richard P. Mount Director: Scientific Computing and Computing Services Stanford.
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.
Group Computing Strategy Introduction and BaBar Roger Barlow June 28 th 2005.
Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015.
March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.
Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.
1 Grid Related Activities at Caltech Koen Holtman Caltech/CMS PPDG meeting, Argonne July 13-14, 2000.
Experience with the Thumper Wei Yang Stanford Linear Accelerator Center May 27-28, 2008 US ATLAS Tier 2/3 workshop University of Michigan, Ann Arbor.
Computer Guts and Operating Systems CSCI 101 Week Two.
Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.
December 10,1999: MONARC Plenary Meeting Harvey Newman (CIT) Phase 3 Letter of Intent (1/2)  Short: N Pages è May Refer to MONARC Internal Notes to Document.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
ATLAS Computing at SLAC Future Possibilities Richard P. Mount Western Tier 2 Users Forum April 7, 2009.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
Latest ideas in DAQ development for LHC B. Gorini - CERN 1.
Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks (SLAC), Fabrizio Furano (INFN/Padova), Gerardo Ganis.
Xrootd Present & Future The Drama Continues Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University HEPiX 13-October-05
IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.
University user perspectives of the ideal computing environment and SLAC’s role Bill Lockman Outline: View of the ideal computing environment ATLAS Computing.
Architecture & Cybersecurity – Module 3 ELO-100Identify the features of virtualization. (Figure 3) ELO-060Identify the different components of a cloud.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
BMTS 242: Computer and Systems Lecture 2: Memory, and Software Yousef Alharbi Website
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Parallel IO for Cluster Computing Tran, Van Hoai.
A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)
Tackling I/O Issues 1 David Race 16 March 2010.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Peter Idoine Managing Director Oracle New Zealand Limited.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
STORAGE DEVICES Towards the end of this unit you will be able to identify the type of storage devices and their storage capacity.
Information Technology
Is 64 bit computing ready for the desktop?
Using non-volatile memory (NVDIMM-N) as block storage in Windows Server 2016 Tobias Klima Program Manager.
Western Analysis Facility
Experience of Lustre at a Tier-2 site
UK GridPP Tier-1/A Centre at CLRC
HPE Persistent Memory Microsoft Ignite 2017
STORAGE DEVICES Towards the end of this unit you will be able to identify the type of storage devices and their storage capacity.
Local secondary storage (local disks)
Operating Systems What are they and why do we need them?
Reflective Memory vs. Ethernet Evaluating Data Network Hardware Solutions for LCLS Fast Feedback Controls Marya Pearson.
STORAGE DEVICES Towards the end of this unit you will be able to identify the type of storage devices and their storage capacity.
TeraScale Supernova Initiative
Using an Object Oriented Database to Store BaBar's Terabytes
Peta-Cache: Electronics Discussion I Gunther Haller
Fast Accesses to Big Data in Memory and Storage Systems
Presentation transcript:

PetaCache: Data Access Unleashed Tofigh Azemoon, Jacek Becla, Chuck Boeheim, Andy Hanushevsky, David Leith, Randy Melen, Richard P. Mount, Teela Pulliam, William Weeks Stanford Linear Accelerator Center September 3, 2007

March 7, 2007Richard P. Mount, SLAC2 Outline Motivation – Is there a Problem? Economics of Solutions Practical Steps – Hardware/Software Some Performance Measurements

Motivation

March 7, 2007Richard P. Mount, SLAC4 Storage In Research: Financial and Technical Observations Storage costs often dominate in research –CPU per $ has fallen faster than disk space per $ for most of the last 25 years Accessing data on disks is increasingly difficult –Transfer rates and access times (per $) are improving more slowly than CPU capacity, storage capacity or network capacity. The following slides are based on equipment and services that I * have bought for data- intensive science * The WAN services from 1998 onwards were bought by Harvey Newman of Caltech

March 7, 2007Richard P. Mount, SLAC5 Price/Performance Evolution: My Experience CPU Disk Capacity WAN Disk Random Access Disk Streaming Access

March 7, 2007Richard P. Mount, SLAC6 Price/Performance Evolution: My Experience

March 7, 2007Richard P. Mount, SLAC7 Another View In 1997 $M bought me: ~ 200-core CPU farm (~few x 10 8 ops/sec/core) or ~ 1000-disk storage system (~2 x 10 3 ops/sec/disk) Today $1M buys me (you): ~ 2500-core CPU farm (~few x 10 9 ops/sec/core) or ~ 2500-disk storage system (~2 x 10 3 ops/sec/disk) In 5 – 10 years ?

March 7, 2007Richard P. Mount, SLAC8 Impact on Science Sparse or random access must be derandomized Define, in advance, the interesting subsets of the data Filter (skim, stream) the data to instantiate interest-rich subsets

Economics of Solutions

March 7, 2007Richard P. Mount, SLAC10 Economics of LHC Computing Difficult to get $10M additional funding to improve analysis productivity Easy to re-purpose $10M of computing funds if it would improve analysis productivity

March 7, 2007Richard P. Mount, SLAC11 Cost-Effectiveness DRAM Memory: –$100/gigabyte –SLAC spends ~12% of its hardware budget on DRAM Disks (including servers) –$1/gigabyte –SLAC spends about 40% of its hardware budget on disk Flash-based storage (SLAC design) –$10/gigabyte –If SLAC had been spending 20% of its hardware budget on Flash we would have over 100TB today.

Practical Steps The PetaCache Project

March 7, 2007Richard P. Mount, SLAC13 PetaCache Goals 1.Demonstrate a revolutionary but cost effective new architecture for science data analysis 2.Build and operate a machine that will be well matched to the challenges of SLAC/Stanford science

March 7, 2007Richard P. Mount, SLAC14 The PetaCache Story So Far We (BaBar, HEP) had data-access problems We thought and investigated –Underlying technical issues –Broader data-access problems in science We devised a hardware solution –We built a DRAM-based prototype –We validated the efficiency and scalability of our low-level data- access software, xrootd –We set up a collaboration with SLAC’s electronics wizards (Mike Huffer and Gunther Haller) to develop a more cost-effective Flash- based prototype We saw early on that new strategies and software for data access would also be needed

March 7, 2007Richard P. Mount, SLAC15 DRAM-Based Prototype Machine (Operational early 2005) Cisco Switch Data-Servers 64 Nodes, each Sun V20z, 2 Opteron CPU, 16 GB memory 1TB total Memory Solaris or Linux (mix and match) Cisco Switches Clients up to 2000 Nodes, each 2 CPU, 2 GB memory Linux PetaCache MICS + HEP- BaBar Funding Existing HEP-Funded BaBar Systems

March 7, 2007Richard P. Mount, SLAC16 DRAM-Based Prototype

March 7, 2007Richard P. Mount, SLAC17 FLASH-Based Prototype Operational Real Soon Now 5 TB of Flash memory Fine-grained, high bandwidth access

March 7, 2007Richard P. Mount, SLAC18 Slide from Mike Huffer

March 7, 2007Richard P. Mount, SLAC19 Slide from Mike Huffer

March 7, 2007Richard P. Mount, SLAC20 Slide from Mike Huffer

March 7, 2007Richard P. Mount, SLAC21 Commercial Product Violin Technologies –100s of GB of DRAM per box (available now) –TB of Flash per box (available real soon now) –PCIe hardware interface –Simple block-level device interface –DRAM prototype tested at SLAC

Some Performance Measurements

March 7, 2007Richard P. Mount, SLAC23 Latency (1) Ideal Client Application Memory

March 7, 2007Richard P. Mount, SLAC24 Latency (2) Current reality Client Application Xrootd Data-Server-Client OS TCP Stack NIC Xrootd Data Server OS TCP Stack NIC Network Switches OS File System Disk

March 7, 2007Richard P. Mount, SLAC25 Latency (3) Immediately Practical Goal Client Application OS TCP Stack NIC OS TCP Stack NIC Network Switches OS File System Disk Memory Xrootd Data ServerXrootd Data-Server-Client

March 7, 2007Richard P. Mount, SLAC26 DRAM-Based Prototype Latency (microseconds) versus data retrieved (bytes)

March 7, 2007Richard P. Mount, SLAC27 DRAM-Based Prototype Throughput Measurements 22 processor microseconds per transaction

March 7, 2007Richard P. Mount, SLAC28 Throughput Tests ATLAS AOD Analysis –1 GB file size (a disk can hold 500 – 1000 of these) –59 xrootd client machines (up to 118 cores) performing top analysis getting data from 1 server. –The individual analysis jobs perform sequential access. –Compare time to completion when server uses its disk, compared with time taken when server uses its memory.

March 7, 2007Richard P. Mount, SLAC29 DRAM-based Protoype ATLAS AOD Analysis

March 7, 2007Richard P. Mount, SLAC30 Comments and Outlook Significant, but not revolutionary, benefits for high-load sequential data analysis – as expected. Revolutionary benefits expected for pointer- based data analysis – but not yet tested. The need to access storage in serial mode has become part of the culture of data- intensive science – why design a pointer- based analysis when its performance is certain to be abysmal? TAG database driven analysis?