Analysis Performance and I/O Optimization Jack Cranshaw, Argonne National Lab October 11, 2011.

Slides:



Advertisements
Similar presentations
State of Indiana Business One Stop (BOS) Program Roadmap Updated June 6, 2013 RFI ATTACHMENT D.
Advertisements

© Devon M.Simmonds, 2007 CSC 550 Graduate Course in Software Engineering ______________________ Devon M. Simmonds Computer Science Department University.
Peter Chochula, January 31, 2006  Motivation for this meeting: Get together experts from different fields See what do we know See what is missing See.
Introduction to the State-Level Mitigation 20/20 TM Software for Management of State-Level Hazard Mitigation Planning and Programming A software program.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
ATLAS Analysis Model. Introduction On Feb 11, 2008 the Analysis Model Forum published a report (D. Costanzo, I. Hinchliffe, S. Menke, ATL- GEN-INT )
Computer Engineering 203 R Smith Agile Development 1/ Agile Methods What are Agile Methods? – Extreme Programming is the best known example – SCRUM.
Fundamentals of Information Systems, Second Edition
Center for Health Care Quality Licensing & Certification Program Evaluation 1 August 2014 rev.
GLAST LAT ProjectOnline Peer Review – July 21, Integration and Test L. Miller 1 GLAST Large Area Telescope: I&T Integration Readiness Review.
Release & Deployment ITIL Version 3
Solution Overview for NIPDEC- CDAP July 15, 2005.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
COMPANY CONFIDENTIAL Page 1 Final Findings Briefing Client ABC Ltd CMMI (SW) – Ver 1.2 Staged Representation Conducted by: QAI India SM - CMMI is a service.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
Oracle Open World 2014 Integrating PeopleSoft for Seamless IT Service Delivery: Tips from UCF 1 Robert Yanckello, Chief Technology Officer University of.
Event Metadata Records as a Testbed for Scalable Data Mining David Malon, Peter van Gemmeren (Argonne National Laboratory) At a data rate of 200 hertz,
Ian Bird LHCC Referee meeting 23 rd September 2014.
ATLAS Peter van Gemmeren (ANL) for many. xAOD  Replaces AOD and DPD –Single, EDM – no T/P conversion when writing, but versioning Transient EDM is typedef.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
AREVA T&D Security Focus Group - 09/14/091 Security Focus Group A Vendor & Customer Collaboration EMS Users Conference September 14, 2009 Rich White AREVA.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
André Augustinus 10 September 2001 DCS Architecture Issues Food for thoughts and discussion.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
Tier 3 Computing Doug Benjamin Duke University. Tier 3’s live here Atlas plans for us to do our analysis work here Much of the work gets done here.
Using the Right Method to Collect Information IW233 Amanda Murphy.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Computing Division Requests The following is a list of tasks about to be officially submitted to the Computing Division for requested support. D0 personnel.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
Project quality management. Introduction Project quality management includes the process required to ensure that the project satisfies the needs for which.
Analysis trains – Status & experience from operation Mihaela Gheata.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
1 EMS Fundamentals An Introduction to the EMS Process Roadmap AASHTO EMS Workshop.
Network design Topic 6 Testing and documentation.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Evolution of storage and data management Ian Bird GDB: 12 th May 2010.
Integrated Corridor Management Initiative ITS JPO Lead: Mike Freitas Technical Lead: John Harding, Office of Transportation Management.
2005 GRIDS Community Workshop1 Learning From Cyberinfrastructure Initiatives Grid Research Integration Development & Support
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
State of Georgia Release Management Training
An Agile Requirements Approach 1. Step 1: Get Organized  Meet with your team and agree on the basic software processes you will employ.  Decide how.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
News and Miscellaneous UPO Jan Didier Contardo, Jeff Spalding 1 UPO Jan Workshop on Upgrade simulations in 2013 (Jan. 17/18) -ESP in.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
1 Lecture 2.4a: SEF SE Planning and the SEP (SEF Ch 16) Dr. John MacCarthy UMBC CMSC 615 Fall, 2006.
Summary of persistence discussions with LHCb and LCG/IT POOL team David Malon Argonne National Laboratory Joint ATLAS, LHCb, LCG/IT meeting.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
The Kiev Experiment Evolving Agile Partnerships. Who are we? Simon Sasha Peter.
Max Baak 1 Efficient access to files on Castor / Grid Cern Tutorial Max Baak, CERN 30 October 2008.
Chapter 7 Preliminary Construction The broad scope of implementation Preliminary construction in the SDLC Preliminary construction activities Preliminary.
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. The Value Review.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Main parameters of Russian Tier2 for ATLAS (RuTier-2 model) Russia-CERN JWGC meeting A.Minaenko IHEP (Protvino)
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Points from DPD task force First meeting last Tuesday (29 th July) – Need to have concrete proposal in 1 month – Still some confusion and nothing very.
 IO performance of ATLAS data formats Ilija Vukotic for ATLAS collaboration CHEP October 2010 Taipei.
POOL Based CMS Framework Bill Tanenbaum US-CMS/Fermilab 04/June/2003.
ATLAS Computing: Experience from first data processing and analysis Workshop TYL’10.
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
Peter van Gemmeren (ANL) Persistent Layout Studies Updates.
Evolution of storage and data management
Atlas IO improvements and Future prospects
SuperB and its computing requirements
Software Development Process
Intermountain West Data Warehouse
Presentation transcript:

Analysis Performance and I/O Optimization Jack Cranshaw, Argonne National Lab October 11, 2011

Data for Analysis (September 2011 Survey Results) September PAT Workshop Agenda Jack CranshawUSATLAS Facilities Workshop 11 October 2011 Roughly 2/3 of analysis is done on centrally produced ntuples. 2

POOL Optimizations Much of the non-D3PD analysis is done on POOL data products (DESD, AOD, DAOD, …) Optimization of this area is already underway (Argonne, Orsay, …). Jack CranshawUSATLAS Facilities Workshop 11 October 2011 Results: Read all events Read 1% of events AOD 1 split, 30MB TTree 55 ms/ev.270 ms/ev. AOD 2 un-split, flush 10 evt. 35 ms/ev.60 ms/ev. Difference~30 % faster read 4 – 5 times faster read MemoryReduced by 50 – 100 MB For Example: –By looking at the details of how data is retrieved and used, the splitting levels of the stored data were optimized to give significant performance increases. –The amount of resources used by each process was also reduced.  Close collaboration with ROOT team for feedback and requests. 3

Organizational Meeting Analysis Performance Priorities Meeting held at Argonne in mid September. Analysis Performance Priorities Meeting –Rik Yoshida, Doug Benjamin, Nils Krumnack, Jack Cranshaw, Peter van Gemmeren, David Malon Broad Mandate: “Optimize the time from data taken to physics results presented.” Specific Mandate: Address broad mandate by looking at improvements in I/O. Team Building –Build on expertise and previous efforts for POOL. –Leverage the resources available at Argonne (Tier3, local Tier2). –Leverage connections to other parts of the project. David Malon: Event Store Rik Yoshida: Analysis Support Doug Benjamin: Tier 3 Technical Coordination Peter van Gemmeren: ROOT Integration –Developers: Jack Cranshaw (coordinator), Nils Krumnack, Doug Benjamin, Peter van Gemmeren. Jack CranshawUSATLAS Facilities Workshop 11 October

Current Work Plan: Status Establish connections to stakeholders –Physics Analysis Tools (workshop, week of 9/26) –ROOT team (met with Philippe, Fons, and Rene on 10/3). Survey of current usage patterns –Start collecting examples Shuwei’s Top TSelector example … –Plan to make these into standard tests once instrumented. Instrument Code –Look at ROOT provided tools like TTreePerfStats. –Working on understanding what more is needed. Look at lessons from previous optimization –TTreeCache is currently not used much for analysis. TTreeCache makes reading more efficient by learning which branches are used most frequently and caching them. Jack CranshawUSATLAS Facilities Workshop 11 October

Current Work Plan: Next Steps Work with PAT to make sure documentation on how to write performant ntuple analyses is available and develop tools. –The D3PDReader project gives us a place to both implement improvements and install instrumentation. Using examples and relevant variables start compiling performance measurements. Start using TTreeCache in analysis examples. –Understand issues when used with PROOF. –Training of cache for random access mode of analysis may be different from that of read all mode seen in reconstruction. The learning mechanism may need to be customized. –Compare TTreeCache performance on LAN and WAN. Look at new ROOT features being prepared for release. –Pre-fetching in a background thread. –Disk resident TTreeCache. Provide Philippe/Rene with examples of D3PD and codes for comment. Begin looking at improving efficiency of disk/memory usage. Use locally available resources as sandbox, then expand into larger scale as the project matures. Jack CranshawUSATLAS Facilities Workshop 11 October

Analysis Model Evolution 2012 The LHC has passed into the phase of luminosity ramp up in its current configuration. –Initial threshold discovery opportunities have mostly finished. –Pileup in the detectors is already large and increasing, so event data is getting more complex. Data volumes are growing and are limited only by the acceptable trigger rate/Tier0 storage. Expected developments –Analysis will still be most efficient/fun when done ‘interactively’ on local resources. These users will continue in this mode as long as they can, but will have to become more and more selective as data accumulates. (see DPD train discussion, PAT Wkshp). –As more advanced analysis techniques are developed, analyses done on D(3+)PD will naturally get more complex. Jack CranshawUSATLAS Facilities Workshop 11 October

Summary An effort has been started to look at optimizing post-athena data access. This effort naturally focuses on D3PD access as most of the analysis user community uses this format. This effort builds on previous successful efforts with POOL data. We are working with the PAT team to develop tools. Other efforts in this area are being monitored and we will work with them when appropriate. –For example Sebastien’s work with benchmarking in mana and SFrame. This effort is currently using ANL resources as a test bed, but eventually these developments will be deployed more broadly. –Any feedback from facilities on concerns or interesting measurements is welcome. Jack CranshawUSATLAS Facilities Workshop 11 October