Optimizing I/O Performance for ESD Analysis Misha Zynovyev, GSI (Darmstadt) ALICE Offline Week, October 28, 2009.

Slides:



Advertisements
Similar presentations
File Systems.
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 11 External Sorting.
Shimin Chen Big Data Reading Group.  Energy efficiency of: ◦ Single-machine instance of DBMS ◦ Standard server-grade hardware components ◦ A wide spectrum.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.
External Sorting R & G Chapter 13 One of the advantages of being
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
OS and Hardware Tuning. Tuning Considerations Hardware  Storage subsystem Configuring the disk array Using the controller cache  Components upgrades.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
Chapter 8 File Processing and External Sorting. Primary vs. Secondary Storage Primary storage: Main memory (RAM) Secondary Storage: Peripheral devices.
A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人:碩資工一甲 董耀文.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
11 INSTALLING AND MANAGING STORAGE DEVICES IN WINDOWS XP Chapter 8.
 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.
Sorting.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
CS Operating System & Database Performance Tuning Xiaofang Zhou School of Computing, NUS Office: S URL:
UNIX Files File organization and a few primitives.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
4P13 Week 12 Talking Points Device Drivers 1.Auto-configuration and initialization routines 2.Routines for servicing I/O requests (the top half)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
ROOT I/O: The Fast and Furious CHEP 2010: Taipei, October 19. Philippe Canal/FNAL, Brian Bockelman/Nebraska, René Brun/CERN,
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
CPSC Why do we need Sorting? 2.Complexities of few sorting algorithms ? 3.2-Way Sort 1.2-way external merge sort 2.Cost associated with external.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
External Sorting Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Lecture Topics: 11/22 HW 7 File systems –block allocation Unix and NT –disk scheduling –file caches –RAID.
Analysis efficiency Andrei Gheata ALICE offline week 03 October 2012.
1 Query Processing Exercise Session 1. 2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program.
Select Operation Strategies And Indexing (Chapter 8)
 IO performance of ATLAS data formats Ilija Vukotic for ATLAS collaboration CHEP October 2010 Taipei.
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
ROOT : Outlook and Developments WLCG Jamboree Amsterdam June 2010 René Brun/CERN.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Data Formats and Impact on Federated Access
Atlas IO improvements and Future prospects
ALICE experience with ROOT I/O
Solid State Disks Testing with PROOF
Analysis trains – Status & experience from operation
ALICE Computing Upgrade Predrag Buncic
Join Processing in Database Systems with Large Main Memories (part 2)
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Selected Topics: External Sorting, Join Algorithms, …
External Sorting.
Large Object Datatypes
CENG 351 Data Management and File Structures
Virtual Memory 1 1.
Presentation transcript:

Optimizing I/O Performance for ESD Analysis Misha Zynovyev, GSI (Darmstadt) ALICE Offline Week, October 28, 2009

Preview Research subjects and goals General conditions Logical layout of ALISE ESD tree Physical layout of ALICE ESD file Evaluation of disk access performance Ways to optimize disk access performance Summary

Research Subjects and Goals Subjects: Read performance of AliROOT analysis tasks and trains Read performance for ALICE ESD data-files Goals: To understand what is limiting the performance To find ways to improve the performance

The Path of I/O Request...trying to figure out what happens on every level

General Conditions ROOT AliROOT v4-16-Rev-08 pp collisions at 10 TeV Number of files in data set: Events per file: 100 File size: from 2.3 MB up to 8 MB Each analysis job runs on dedicated 1000 files

General Conditions 8 core Intel Xeon E GHz RAID controller: HP Smart Array P400 array A: RAID 1+0 (2 disks) - Debian 4.0 array B: RAID 5 (5 disks x 250 GB + 1 spare disk) – xfs with analysis data 64 KB stripe size block device readahead: 4 MB

Logical Layout of ALICE ESD Tree 499 branches in a tree: 484 branches are resident in the AliESDs.root 6 branches in AliESDfriends.root 9 are resident in memory

Physical Layout of AliESDs.root 675 baskets  70% hold 100 events per basket  15 % hold 1 event per basket  7% hold 2 events per basket  8% - else the physical layout of tree contents

Properties of some selected branch baskets of a tree with 100 events in a file on disk

Train and Task Properties 23 tasks in the train 8 require MC data (otherwise crash) If MC data is queried additional read calls are issued to: galice.root, Kinematics.root, TrackRefs.root AliESDfriends.root is not used => 484 branches are read; *FMD* and CaloClusters* branches are disabled => =440 branches are read

Read calls to AliESDs.root file for each event processed. Analysis task processing 100 events. All branches are activated. I/O Access to AliESDs.root

Read calls to AliESDs.root file for each event processed. Analysis task processing 100 events. 44 branches are deactivated. Among them those with 100 and 50 baskets

I/O Access per each ESD File Read calls and bytes read statistics for different ROOT files for each event processed in AliESDs.root file. Analysis task processing 100 events. Additional Monte Carlo truth information queried.

I/O Access Share of MC Data Queries read call = POSIX read() call

Ways to Optimize Disk Access Performance Tuning RAID controller and I/O subsystem of OS Using TTreeCache (ROOT I/O optimization) Prefetching files to memory (avoid ROOT disk access) Merging files reduce number of files increase file size Changing hardware technology hard disk -> solid state disk

TTreeCache galice.root, Kinematics.root, TrackRefs.root are not affected With the Learning Phase of 1 entry: 1)For the first entry all first baskets of every activated branch are requested as usual. A single request per basket. 2)For a second entry all remaining baskets of every activated branch in the file are requested. A single request per group of baskets adjacent on disk.

Ways to Optimize Disk Access Performance Tuning RAID controller and I/O subsystem of OS Using TTreeCache (ROOT I/O optimization) Prefetching files to memory (avoid ROOT disk access) Merging files reduce number of files increase file size Changing hardware technology hard disk -> solid state disk

Prefetching ESD file is prefetched in Notify() in ANALYSIS/AliAnalysisManager.cxx Kinematics, galice, TrackRefs files are prefetched in Init() in STEER/AliMCEventHandler.cxx Advantage: sequential disk access

#include using namespace std; int read_cache(const char *a) { ifstream is; is.open (a, ios::binary ); // get length of file: is.seekg (0, ios::end); const size_t length = is.tellg(); is.seekg (0, ios::beg); // allocate memory: char *buffer = new char [length]; // read data as a block: is.read (buffer,length); delete [] buffer; is.close(); return length; }

Comparison of throughput rates for parallel jobs running a single analysis task. Only AliESDs.root is accessed.

Comparison of throughput rates for parallel jobs running a single analysis task querying Monte Carlo data. AliESDs.root, Kinematics.root, galice.root, TrackRefs.root are read.

Comparison of throughput rates for parallel jobs running an analysis train querying Monte Carlo data. AliESDs.root, Kinematics.root, galice.root, TrackRefs.root are read.

CPU Utilization

Impact of Parallel Jobs

CPU bound train I/O bound task

CPU Utilization Statistics from sar for a train of 23 tasks 1 job8 parallel jobs

Read-ahead Effect

Ways to Optimize Disk Access Performance Tuning RAID controller and I/O subsystem of OS Using TTreeCache (ROOT I/O optimization) Prefetching files to memory (avoid ROOT disk access) Merging files reduce number of files increase file size Changing hardware technology hard disk -> solid state disk

Merging ROOT Files Plain merge Fast merge For unzipped baskets there are currently 3 supported sorting orders: SortBasketsByOffset (the default) SortBasketsByBranch SortBasketsByEntry

Statistics for Merged Files

Ways to Optimize Disk Access Performance Tuning RAID controller and I/O subsystem of OS Using TTreeCache (ROOT I/O optimization) Prefetching files to memory (avoid ROOT disk access) Merging files reduce number of files increase file size Changing hardware technology hard disk -> solid state disk

Solid State Disk hdparm results for read performance: 50 MB/s hard disk 130 MB/s solid state disk

Summary Analysis train is CPU bound Analysis task is I/O bound Factors contributing to low I/O performance: Large number of relatively small ESD files Large number of branches -> large number of baskets inside ESD files Numerous queries to Kinematics.root, TrackRefs.root

Summary (2) I/O subsystem tuning: default values mostly optimal. Block device read-ahead value: assign empirically TTreeCache -> reduces number of read calls -> can improve analysis speed Merging files: improves data throughput rate (factor of 3 for parallel jobs) The best option: 'fast SortBasketsByEntry'