Optimization of Large Scale HEP Data Analysis A file staging approach for analysis jobs on Stoomboot Daniela Remenska 1 B-physics Workshop - June 14, 2010.

Slides:

Advertisements

Similar presentations

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Advertisements

Secure Off Site Backup at CERN Katrine Aam Svendsen.

The Virtual Data Toolkit distributed by the Open Science Grid Richard Jones University of Connecticut CAT project meeting, June 24, 2008.

File System Implementation CSCI 444/544 Operating Systems Fall 2008.

New Cluster for Heidelberg TRD(?) group. New Cluster OS : Scientific Linux 3.06 (except for alice-n5) Batch processing system : pbs (any advantage rather.

High Performance Computing Course Notes High Performance Storage.

1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.

Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.

CT NIKHEF June File server CT system support.

June 21-25, 2004Lecture 1: Basic Skills1 Lecture 1 Basic Skills Presenter Name Presenter Institution Presenter address Grid Summer Workshop June.

A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.

Lecture 11: DMBS Internals

EPICS Archiving Appliance Test at ESS

I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.

Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.

Improving Disk Latency and Throughput with VMware Presented by Raxco Software, Inc. March 11, 2011.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Group Computing Strategy Introduction and BaBar Roger Barlow June 28 th 2005.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

Optimizing UDP-based Protocol Implementations Yunhong Gu and Robert L. Grossman Presenter: Michal Sabala National Center for Data Mining.

Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.

Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.

9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.

São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.

Resource Predictors in HEP Applications John Huth, Harvard Sebastian Grinstein, Harvard Peter Hurst, Harvard Jennifer M. Schopf, ANL/NeSC.

ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.

CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.

DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.

VTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core Embedded Lab. Kim Sewoog Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella,

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.

U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.

Computational Research in the Battelle Center for Mathmatical medicine.

Chapter 11.4 END-TO-END ISSUES. Optical Internet Optical technology Protocol translates availability of gigabit bandwidth in user-perceived QoS.

An Efficient Threading Model to Boost Server Performance Anupam Chanda.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.

PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

Evaluating the performance of Seagate Kinetic Drives Technology and its integration with the CERN EOS storage system Ivana Pejeva openlab Summer Student.

Atlas Software Structure Complicated system maintained at CERN – Framework for Monte Carlo and real data (Athena) MC data generation, simulation and reconstruction.

DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.

Max Baak 1 Efficient access to files on Castor / Grid Cern Tutorial Max Baak, CERN 30 October 2008.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.

Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.

15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.

ANL T3g infrastructure S.Chekanov (HEP Division, ANL) ANL ASC Jamboree September 2009.

Brief introduction about “Grid at LNS”

Solid State Disks Testing with PROOF

Distributed Network Traffic Feature Extraction for a Real-time IDS

FileStager test results

Vanderbilt Tier 2 Project

Adrian Bevan / Dave Brown

Bernd Panzer-Steindel, CERN/IT

Experience of Lustre at a Tier-2 site

SAM at CCIN2P3 configuration issues

WORKFLOW PETRI NETS USED IN MODELING OF PARALLEL ARCHITECTURES

ALICE Computing Upgrade Predrag Buncic

Lecture 11: DMBS Internals

File Transfers to Support TSI/eScience

Grid Canada Testbed using HEP applications

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster

What Happens if There is no Free Frame?

Support for ”interactive batch”

Persistence: hard disk drive

13.3 Accelerating Access to Secondary Storage

Jean Joseph DBA\DEVELOPER

Presentation transcript:

Optimization of Large Scale HEP Data Analysis A file staging approach for analysis jobs on Stoomboot Daniela Remenska 1 B-physics Workshop - June 14, 2010

Approach::day 1 Q: What is the underlying problem? A: “If you can build a file stager in three weeks from now, that would be perfect!” ?! Q: What is the underlying problem? A: We have a perception that the analysis jobs running on Stoomboot are inefficient. 2

3

The “Why”? Why Stoomboot for analysis jobs, and not the Grid? 1.The Grid is not so intuitive for users 2.Test the correctness of algorithms 3.Because you can! 4

archive Hi Dox, As I understood it, running on stoomboot has been slow because of I/O issues: running is limited by reading speed of the files instead of by processing speed. It was rather busy on stoomboot these last couple of days. As for the tmpdir problem, I guess you should ask the admins to clean out the tmpdirs....? 5

Bonnie++: benchmarking file systems 6

Problem analysis: Profiling Stoomboot Two basic metrics collected: - CPU time - Wallclock time Number of Files:3 Total Data volume:6 GB (3 x 2GB) Number of Events:

Results:Sequential access with DaVinci File locationsCPU time [min] Main EventLoop time [min] Wallclock time [min] CPU efficiency grid2.fe.infn.it % tbn18.nikhef.nl % nfs partition % local storage % 8

Latency? -bash-3.2$ ping tbn18.nikhef.nl PING tbn18.nikhef.nl ( ) 56(84) bytes of data. 64 bytes from tbn18.nikhef.nl ( ): icmp_seq=1 ttl=61 time=0.478 ms --- tbn18.nikhef.nl ping statistics packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.189/0.285/0.478/0.137 ms -bash-3.2$ grid2.fe.infn.it PING grid2.fe.infn.it ( ) 56(84) bytes of data. 64 bytes from grid2.fe.infn.it ( ): icmp_seq=1 ttl=53 time=26.1 ms time=26.0 ms --- grid2.fe.infn.it ping statistics packets transmitted, 3 received, 0% packet loss, time 1998ms rtt min/avg/max/mdev = /26.047/26.100/0.070 ms 9

The stager approach: staging all files before the job starts Advantage: service (file access) closer to user Drawback: storage on Stoomboot not sufficient to keep all data for analysis jobs 10

The stager approach: staging/removing files subsequently Advantage: smaller storage demands Drawback : application blocked due to I/O, wallclock time not reduced 11

The stager approach: prefetching data & overlapping CPU and I/O Advantage: wallclock time significantly reduced Drawback: job blocked at the beginning 12

Demo:Performance evaluation “Feels” like running over local files The only “extra” time due to staging of the first file 60% overhead of data transfer with rfio File stager insensitive to data locality 13 Stager Demo No stager used (rfio access) wallclock time12 min.150 min. CPU time8.51 min.8.57 min CPU efficiency70.9%5.7% Total transfer8 GB13.1 GB Stager Demo No stager used (rfio access) wallclock time11 min.18 min. CPU time8.35 min.11.8 min CPU efficiency75.9%65.5% Total transfer8 GB13 GB

Design of the solution 14

Open questions for users Back-of-the-envelope calculations: What’s the processing time of an event? Expectations: when is the “optimization” sufficient? User friendliness: frustrations? 15

Stoomboot HW/SW 32 Worker Nodes each dual quad-core Intel Xeon 2GHz ; 2/3GB memory/core local disk space ~ 100GB Scientific Linux Cern 5 1Gbps/10Gbps Outside users have no access to SB, but grid files need to be accessed from SB 16