I/O aspects for parallel event processing frameworks Workshop on Concurrency in the many-Cores Era Peter van Gemmeren (Argonne/ATLAS)

Slides:



Advertisements
Similar presentations
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Advertisements

WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting “There it was, hidden in alphabetical order.” Rita Holt R&G Chapter 13.
1: Operating Systems Overview
External Sorting R & G Chapter 13 One of the advantages of being
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Input/Output Chapter 3 TOPICS Principles of I/O hardware Principles of I/O software I/O software layers Disks Clocks Reference: Operating Systems Design.
MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
XAOD persistence considerations: size, versioning, schema evolution Joint US ATLAS Software and Computing / Physics Support Meeting Peter van Gemmeren.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
ATLAS Peter van Gemmeren (ANL) for many. xAOD  Replaces AOD and DPD –Single, EDM – no T/P conversion when writing, but versioning Transient EDM is typedef.
 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Analysis of the ROOT Persistence I/O Memory Footprint in LHCb Ivan Valenčík Supervisor Markus Frank 19 th September 2012.
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Optimizing CMS Data Formats for Analysis Peerut Boonchokchuay August 11 th,
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
Processor Architecture
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Chapter 5 Input/Output 5.1 Principles of I/O hardware
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Slide 1/29 Informed Prefetching in ROOT Leandro Franco 23 June 2006 ROOT Team Meeting CERN.
I/O Strategies for Multicore Processing in ATLAS P van Gemmeren 1, S Binet 2, P Calafiura 3, W Lavrijsen 3, D Malon 1 and V Tsulaia 3 on behalf of the.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
CPSC Why do we need Sorting? 2.Complexities of few sorting algorithms ? 3.2-Way Sort 1.2-way external merge sort 2.Cost associated with external.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
I/O and Metadata Jack Cranshaw Argonne National Laboratory November 9, ATLAS Core Software TIM.
 IO performance of ATLAS data formats Ilija Vukotic for ATLAS collaboration CHEP October 2010 Taipei.
Buffering Techniques Greg Stitt ECE Department University of Florida.
Mini-Workshop on multi-core joint project Peter van Gemmeren (ANL) I/O challenges for HEP applications on multi-core processors An ATLAS Perspective.
Multi Process I/O Peter Van Gemmeren (Argonne National Laboratory (US))
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Peter van Gemmeren (ANL) Persistent Layout Studies Updates.
Athena I/O Component Refactorization - Overview Random notes about todays agenda. Peter Van Gemmeren (Argonne National Laboratory (US))
Atlas IO improvements and Future prospects
Introduction to Distributed Platforms
Applying Control Theory to Stream Processing Systems
Multicore Computing in ATLAS
External Sorting Chapter 13
Chapter 2: System Structures
湖南大学-信息科学与工程学院-计算机与科学系
External Sorting Chapter 13
CS703 - Advanced Operating Systems
Selected Topics: External Sorting, Join Algorithms, …
External Sorting.
Database Systems (資料庫系統)
External Sorting Chapter 13
New I/O navigation scheme
Presentation transcript:

I/O aspects for parallel event processing frameworks Workshop on Concurrency in the many-Cores Era Peter van Gemmeren (Argonne/ATLAS)

Outline  AthenMP  Input / Output considerations: –What’s bad now, waste wall, CPU, memory and disk  Developments: –Scatter –Input format 11/21/ Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks

AthenaMP: The ATLAS multi-core Control framework  In initial AthenaMP implementation, I/O is somewhat of an afterthought –Each worker node produces its own output file, which need to be merged after all worker are done. –Done in serial, can take significant amount of wall clock time. 11/21/2011 Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks 3 initfork Collect & merge bootstrap parallel event processing

This is not a yet a multi-core I/O framework!  Read data: A process (initialization, event execute,…) reads part of the input file (e.g., to retrieve one event). –All worker use the same input file. Multiple access may mean poor read performance, especially if events are not consecutive.  Uncompress / Stream ROOT baskets: Each worker will retrieve its own event data, which means reading multiple ROOT baskets, uncompressing them and streaming them into persistent objects. –ROOT baskets contain object member of several events, so multiple worker may use the same baskets and each of them will uncompress them independently: Wastes CPU time (multiple uncompress of the same data) Wastes memory (multiple copies of the same Basket, not shared)  Write data: Each process writes its own output file, which need to be merged.  Compress / Stream to ROOT baskets: Writers compress data separately. –Suboptimal compression factor (costs storage and CPU time at subsequent reads) Wastes memory (each worker needs its own set of output buffer) 11/21/2011 Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks 4

The Result: Control framework is only one side of the coin…  Each process writes its own output file, which need to be merged.  Which is done in serial and can kill you.  Previous Tier0 tests of athenaMP found 50% event throughput reduction mainly caused by serial merge 11/21/2011 Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks 5 Plot stolen from V. Tsulaia

Need to develop multi-core I/O framework  Input: –Event/Object Scatter by dedicated Reader Process: – Possible approaches: Event retrieve: –The single reader, reads all data for many events, and assembles ‘persistent events’. –The reader provides these events to the worker via shared memory. Object level retrieve: –A single reader, reads all DataHeader and provides these to the worker via a queue. –Using AthenaPOOL/StoreGate object retrieval mechanism, a request is send to a reader. –The reader reads the persistent data object and sends it to the worker.  The simplest implementation is for RAW data –No real gain expected, but good exercise 11/21/2011 Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks 6

Current AthenaMP Architecture for reading ByteStream (RAW) in event queue mode  Event scheduling is managed by the mother process using a python MP queue with event numbers for workers to iterate over.  I/O is done by worker, which first seek and than read the scheduled event  File access is almost sorted –only when one worker passes another in the seek to next step, things get out of order. Can happen, but not often 11/21/ Input File Event# Queue Mother Process Worker Process seek Worker Process seek Worker Process seek 1,2,3,4,… 1,7,9, …2,4,8, …3,5,6, … Input Files iter 1,7,9, …2,4,8, …3,5,6, … next Almost 1,2,3,4,… Individual ByteStream I/O components Shared event process scheduling (python MP) Individual disk read On File Transition, each Worker fires and handles incidents for metadata. Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks

Alternative: Architecture for sharing ByteStream events in queue mode, no metadata yet 11/21/ Event# Queue Mother Process Worker Process lock Worker Process lock Worker Process lock 1,2,3,4,… 1,7,9, …2,4,8, …3,5,6, … iter 1,7,9, …2,4,8, …3,5,6, … next Almost 1,2,3,4,… Individual ByteStream O components Shared event process scheduling (python MP) Reader Process locknext Shared Memory Persistency event/object Store read unlock Input File Input Files Only the Reader sees File Transition, and needs to inform Workers to process metadata. Copy Metadata to store and inform Workers. Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks evtSeqNumber; evtStatus; evtSize; evtOffset; evtTerm1; evtTerm2; Detect File Transition and read Metadata from store.

Sequence Diagram for RAW sharing Worker Process Reader Process IEventShare EventSelector share(evtnum) SharedMemory Tool lockEvent(evtnum) readEvent() putEvent(num) evtNum == num getLockedEvent() next() putEvent(num) evtNum == num getLockedEvent() next() unlockEvent() loop [events] This is faster, the worker does not read the event from disk, but copies it from shared memory store. Worker processing events (> 10 s). Reader reading events (~ 0.1 s). Reader waiting. That’s OK. Worker waiting. Not good. 11/21/ Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks

Some Testing  Some initial performance testing: –No differences seen (or really expected) for up to 8 Workers. But no penalty was observed either, which may be a surprise for a ‘proof of concept’ implementation that is not optimized –Small (~5%) slowdown with 16 Workers and single Reader. Single reader does get congested as it is currently only pre-reading 1 event  Could easily extend framework to allow multiple dedicated Reader Processes assigned to provide events to groups of Workers  And of course, metadata handling requires special attention… 11/21/2011 Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks 10

Second leg: Parallel ‘chunked’ access to ROOT input files.  ROOT baskets contain object member of several events, so multiple worker may use the same baskets and each of them will uncompress them independently: –Wastes CPU time (multiple uncompress of the same data) –Wastes memory (multiple copies of the same Basket, not shared)  Small test showed that reading (disk, decompress, streaming, P->T) ESD via athenaMP on 4 cores takes 40% longer than in a serial job.  In rel. 17, ATLAS changed ROOT data layout (see –No splitting (fewer baskets) –AutoFlush every 5/10 events (depending on data product)  Allows athenaMP to schedule small ‘chunks’ of events and avoid double decompression. –Test reading ESD on 4 cores showed penalty reduced to less than 20%. 11/21/2011 Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks 11

Outlook  Output of course is just as important –Bought some time by moving to fast/hybrid merging  ROOT team is developing TMemFile which may help –But ATLAS needs externalized references to TTree entries.  Shared Writer will have more difficult scheduling –Doesn’t know when it is done  Plan to exercise with ByteStream events first.  Longer term, ATLAS may want to scatter objects (rather than events) to match retrieval granularity on derived data products 11/21/2011 Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks 12

Summary  Don’t just plan for the next Control Framework, consider the next I/O Framework to support it as well. 11/21/2011 Peter van Gemmeren (Argonne/ATLAS): I/O aspects for parallel event processing frameworks 13