Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

Slides:



Advertisements
Similar presentations
Case Study: Photo.net March 20, What is photo.net? An online learning community for amateur and professional photographers 90,000 registered users.
Advertisements

Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Evolution of the Configuration Database Design Andrei Salnikov, SLAC For BaBar Computing Group ACAT05 – DESY, Zeuthen.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Organizing the Extremely Large LSST Database for Real-Time Astronomical Processing ADASS London, UK September 23-26, 2007 Jacek Becla 1, Kian-Tat Lim 1,
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
Panel Summary Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University XLDB 23-October-07.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Simplify your Job – Automatic Storage Management Angelo Session id:
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
CHEP 2004 September 2004Richard P. Mount, SLAC Huge-Memory Systems for Data-Intensive Science Richard P. Mount SLAC CHEP, September 29, 2004.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
1 The Google File System Reporter: You-Wei Zhang.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Online Database Support Experiences Diana Bonham, Dennis Box, Anil Kumar, Julie Trumbo, Nelly Stanfield.
DISTRIBUTED COMPUTING
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
McLean HIGHER COMPUTER NETWORKING Lesson 15 (a) Disaster Avoidance Description of disaster avoidance: use of anti-virus software use of fault tolerance.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Astronomy, Petabytes, and MySQL MySQL Conference Santa Clara, CA April 16, 2008 Kian-Tat Lim Stanford Linear Accelerator Center.
3-Tier Architecture Chandrasekaran Rajagopalan Cs /01/99.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
On the Verge of One Petabyte – the Story Behind the BaBar Database System Jacek Becla Stanford Linear Accelerator Center For the BaBar Computing Group.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Xrootd Proxy Service Andrew Hanushevsky Heinz Stockinger Stanford Linear Accelerator Center SAG September-04
Tackling I/O Issues 1 David Race 16 March 2010.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
GDB meeting - Lyon - 16/03/05 An example of data management in a Tier A/1 Jean-Yves Nief.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
PetaCache: Data Access Unleashed Tofigh Azemoon, Jacek Becla, Chuck Boeheim, Andy Hanushevsky, David Leith, Randy Melen, Richard P. Mount, Teela Pulliam,
BIG DATA/ Hadoop Interview Questions.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.
MERANTI Caused More Than 1.5 B$ Damage
Ramya Kandasamy CS 147 Section 3
CHAPTER 3 Architectures for Distributed Systems
Chapter 17: Database System Architectures
Database System Architectures
Enterprise Class Virtual Tape Libraries
Microsoft Virtual Academy
An Interactive Browser For BaBar Databases
Presentation transcript:

Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC

2 of 18CIDR’05, Asilomar, CA Roadmap u Who we are u Simplified data processing u Core architecture and migration u Challenges/surprises/problems u Summary Don’t miss the “lessons”, just look for yellow stickers

3 of 18CIDR’05, Asilomar, CA Who We Are u Stanford Linear Accelerator Center –DoE National Lab, operated by Stanford University u BaBar –one of the largest High Energy Physics (HEP) experiments online –in production since 1999 –over petabyte of production data u HEP –data intensive science –statistical studies –needle in a haystack searches

4 of 18CIDR’05, Asilomar, CA Simplified Data Processing

5 of 18CIDR’05, Asilomar, CA A Typical Day in Life (SLAC only) u ~8 TB accessed in ~100K files u ~7 TB in/out tertiary storage u 2-5 TB in/out SLAC u ~35K jobs complete –2500 run at any given time –many long running jobs (up to few days)

6 of 18CIDR’05, Asilomar, CA Some of the Data-related Challenges u Finding perfect snowflake(s) in an avalanche u Volume  organizing data u Dealing with I/O –sparse reads –random access –small object size: o(100) bytes u Providing data for many tens of sites

7 of 18CIDR’05, Asilomar, CA More Challenges… Data Distribution u ~25 sites worldwide produce data –many more use it u Distribution pros/cons + keep data close to users –makes administration tougher + works as a backup Kill two birds with one stone, replicate for availability as well as backup

8 of 18CIDR’05, Asilomar, CA Core Architecture u Mass Storage (HPSS) –tapes cost-effective & more reliable than disks u 160 TB disk cache, 40+ data servers u Database engine: ODBMS: Objectivity/DB –scalable thin-dataserver thick-client architecture –gives full control over data placement & clustering –ODBMS later replaced by system built within HEP u DB related code hidden behind transient-persistent wrapper Consider all factors when choosing software and hardware

9 of 18CIDR’05, Asilomar, CA Reasons to Migrate u ODBMS not a mainstream –true for HEP and elsewhere –long term future u Locked in certain OSes/compilers u Unnecessary DB overhead –e.g. transactions for immutable data u Maintenance at small institutes u Monetary cost Build flexible system, be prepared for non-trivial changes. Bet on simplicity.

10 of 18CIDR’05, Asilomar, CA xrootd Data Server u Developed in-house –becoming de facto HEP standard now u Numerous must-have features, some hard to add to the commercial server –deferral –redirection –fault tolerance –scalability –automatic load balancing –proxy server Larger systems depend more heavily on automation

11 of 18CIDR’05, Asilomar, CA More Lessons… Challenges, Surprises, Problems u Organizing & managing data –Divide into mutable & immutable, separate queryable data  immutable easier to optimize, replicate & scale –Decentralize metadata updates  contention happens in unexpected places  makes data mgmt harder  still need some centralization u Fault tolerance –Large system  likely to use commodity hardware  fault tolerance essential Single technology likely not enough to efficiently manage petabytes

12 of 18CIDR’05, Asilomar, CA u Main bottleneck: disk I/O –underlying persistency less important than one’d expect –access patterns more important  must understand to derandomize I/O u Job mgmt/bookkeeping –better to stall jobs than to kill u Power, cooling, floor weight u Admin Challenges, Surprises, Problems (cont…) Hide disruptive events by stalling data flow

13 of 18CIDR’05, Asilomar, CA On Bleeding Edge Since Day 1 u Huge collection of interesting challenges… –Increasing address space –Improving server code –Tuning and scaling whole system –Reducing lock collisions –Improving I/O –…many others u In summary –we made it work (big success), but… –continuous improvements were needed for the first several years to keep up When you push limits, expect many problems everywhere. Normal maxima are too small. Observe  refine  repeat

14 of 18CIDR’05, Asilomar, CA Uniqueness of … Scientific Community u Hard to convince scientific community to use commercial products –BaBar: 5+ million lines of home grown, complex C++ u Continuously look for better approaches –system has to be very flexible u Most data immutable u Many smart people that can build almost anything Specific needs of your community can impact everything, including the system architecture

15 of 18CIDR’05, Asilomar, CA DB-related Effort u ~4-5 core db developers since 1996 –effort augmented by many physicists, students and visitors u 3 DBAs –since production started till recently –less than 3 now  system finally automated and fault tolerant Automation. is the key to low- maintenance, fault tolerant, system

16 of 18CIDR’05, Asilomar, CA Lessons Summary Kill two birds with one stone, replicate for availability as well as backup Consider all factors when choosing software and hardware When you push limits, expect many problems everywhere. Normal maxima are too small. Observe  refine  repeat Specific needs of your community can impact everything, including the system architecture Automation. is the key to low- maintenance, fault tolerant, system Larger systems depend more heavily on automation Hide disruptive events by stalling data flow Single technology likely not enough to efficiently manage petabytes Organize data (mutable, immutable, queryable, …) Build flexible system, be prepared for non-trivial changes. Bet on simplicity.

17 of 18CIDR’05, Asilomar, CA Petabyte Frontier just a few highlights… u How to cost-effectively backup a PB? u How to provide fault tolerance with 1000s disks –RAID 5 is not good enough u How to build low maintenance system? –“1 full-time person per 1 TB” does not scale u How to store the data? (tape anyone? ) –consider all factors: cost, power, cooling, robustness u …YES, there are “new” problems beyond “known problems scaled up”

18 of 18CIDR’05, Asilomar, CA The Summary u Great success –ODBMS based system, migration & 2 nd generation –Some DoD projects are being built on ODBMS u Lots of useful experience with managing (very) large datasets –Would not be able to achieve all that with any RDBMS (today) –Thin server thick client architecture works well –Starting to help astronomers (LSST) to manage their petabytes