Argonne LCF Status Tom LeCompte High Energy Physics Division Argonne National Laboratory Tom Uram, Venkat Vishwanath Argonne Leadership Computing Facility.

Slides:

Advertisements

Similar presentations

Introduction to Grid Application On-Boarding Nick Werstiuk

Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.

Parasol Architecture A mild case of scary asynchronous system stuff.

1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.

Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.

A 100,000 Ways to Fa Al Geist Computer Science and Mathematics Division Oak Ridge National Laboratory July 9, 2002 Fast-OS Workshop Advanced Scientific.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.

WSN Simulation Template for OMNeT++

Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.

Critical Flags, Variables, and Other Important ALCF Minutiae Jini Ramprakash Technical Support Specialist Argonne Leadership Computing Facility.

Assessment of Core Services provided to USLHC by OSG.

Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.

Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

Four Components of a Computer System. Computer System Components Users (people, machines, other computers) Applications programs – define the ways in.

Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.

Roger Jones, Lancaster University1 Experiment Requirements from Evolving Architectures RWL Jones, Lancaster University Ambleside 26 August 2010.

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

F. Brasolin / A. De Salvo – The ATLAS benchmark suite – May, Benchmarking ATLAS applications Franco Brasolin - INFN Bologna - Alessandro.

Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Chapter 1: Introduction. 1.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 1: Introduction What Operating Systems Do Computer-System.

ANL/BNL Virtual Data Technologies in ATLAS Alexandre Vaniachine Pavel Nevski US-ATLAS Core/GRID software workshop Brookhaven National Laboratory May 6-7,

University of Palestine Dept. of Urban Planning Introduction to Planning ( EAGD 3304 ) M.A. Architect: Tayseer Mushtaha Mob.:

Interrupts By Ryan Morris. Overview ● I/O Paradigm ● Synchronization ● Polling ● Control and Status Registers ● Interrupt Driven I/O ● Importance of Interrupts.

Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.

Common software needs and opportunities for HPCs Tom LeCompte High Energy Physics Division Argonne National Laboratory (A man wearing a bow tie giving.

Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.

Performance Monitoring of SLAC Blackbox Nodes Using Perl, Nagios, and Ganglia Roxanne Martinez Mentor: Yemi Adesanya United States Department of Energy.

PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.

1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.

CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.

Data Transfers in the ALCF Robert Scott Technical Support Analyst Argonne Leadership Computing Facility.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.

U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.

David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.

Progress on HPC’s (or building “HPC factory” at ANL) Doug Benjamin Duke University.

1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.

Session 1 Module 1: Introduction to Data Integrity

Advanced Computer Networks Lecture 1 - Parallelization 1.

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

1 University of Maryland Runtime Program Evolution Jeff Hollingsworth © Copyright 2000, Jeffrey K. Hollingsworth, All Rights Reserved. University of Maryland.

PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.

T HE G AME L OOP. A simple model How simply could we model a computer game? By separating the game in two parts: – the data inside the computer, and –

HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.

TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.

ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Big PanDA on HPC/LCF Update Sergey Panitkin, Danila Oleynik BigPanDA F2F Meeting. March

Considering Time in Designing Large-Scale Systems for Scientific Computing Nan-Chen Chen 1 Sarah S. Poon 2 Lavanya Ramakrishnan 2 Cecilia R. Aragon 1,2.

Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,

Enabling Grids for E-sciencE LRMN ThIS on the Grid Sorina CAMARASU.

Mini-Workshop on multi-core joint project Peter van Gemmeren (ANL) I/O challenges for HEP applications on multi-core processors An ATLAS Perspective.

LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.

Getting the Most out of Scientific Computing Resources

Getting the Most out of Scientific Computing Resources

Tom LeCompte High Energy Physics Division Argonne National Laboratory

Grid Computing.

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

Grid Canada Testbed using HEP applications

Support for ”interactive batch”

Workflow and HPC erhtjhtyhy Doug Benjamin Argonne National Lab.

HPC resources LHC experiments are using (as seen from ATLAS - CMS lens) erhtjhtyhy Doug Benjamin Argonne National Lab.

Presentation transcript:

Argonne LCF Status Tom LeCompte High Energy Physics Division Argonne National Laboratory Tom Uram, Venkat Vishwanath Argonne Leadership Computing Facility Argonne National Laboratory Doug Benjamin Department of Physics Duke University

2 Strategic Thoughts  On the ATLAS side, Simulation is “Where the money is” –This is something like ~60% of our grid use –Event generation is well-suited for HPC, but is < 10-15% or so This will be helpful, of course, but it’s a smaller piece  On the HPC side, large (>500K core computers) are “Where the money is” –1/3 of the Top 500 computational capacity is in four computers (Titan, Mira, Sequoia, K) –1/2 of the Top 500 computational capacity is in 21 computers  We want a “run-anywhere” design –More total resources –More flexibility in utilizing them –Protects us against future changes “Why do you rob banks?” “Because that’s where the money is.” Attributed to Willie Sutton If we offload a cycle from the grid, we can use the recovered cycle any way we want.

3 Consequences  We need to adapt to run on a rather spartan architecture (typical for these giant computers) –No/little/slow TCP/IP (=no/little/slow database access) –Relatively little memory per core –Some use non x86-instruction sets  We need to connect this to the grid – otherwise it’s worthless  We are solving both problems with an x86-based front end –Lives on the grid – looks like a fast Tier-2 –Receives jobs –Adapts them to the supercomputer –Sends the output to a grid SE Looks a lot like a many-core architecture for future PCs.

4 How it Works (Alpgen example)  User sends a prun job to the front-end with an Alpgen input file and a number of events  The front-end then –Runs a “warmup job” to create the.grid1,.grid2 files –Creates a “purely computational” job for the HPC –Notifies the HPC that there is a job waiting –Polls the HPC to see when it is done –Collects the HPC output –Runs an “unweighting” job –Places the output on the grid  Output can be found here: –user.lecompte _00003.alpout.unw in dataset user.lecompte.ANALY_ANLASC.001/ (BG/P) –user.lecompte _00006.alpout.unw in dataset user.lecompte.ANALY_ANLASC.001/ (BG/Q) Logically similar to fetching database files for simulation Logically similar to a final format fix-up for simulation

5 ALCF Resources  Intrepid –BlueGene/P based (PowerPC 450 cores) – 2006-vintage technology –163,840 cores (40960 nodes x 4 MHz) –Each core is 1/3 to 1/5 the speed of a typical x86 –470 MB per core –Ideal job sizes: nodes, 6 hours or less About x86-days –Two small development systems (Challenger and Surveyor)  Mira –BlueGene/Q based (PowerPC A2 cores) – Intrepid’s successor –786,432 cores (49152 nodes x 16 MHz) –Each core is comparable to the speed of a typical x86 –1 GB per core –Ideal job sizes: nodes, 6 hours or less About x86-days –Two “small” development systems (Cetus and Vesta) These are Top500 #139 and #140

6 Other Possibilities  We have 1M hours allocated on Mira, and 0.5M on Intrepid  I have a handshake agreement that we can go beyond this, plus –Access to 1M hours on Hopper (at NERSC, at LBNL): x86 with 156,213 cores  There are additional US supercomputers outside of the National Labs, at institutes with some affiliation to ATLAS –Examples: Stampede at Texas, Blue Waters at Illinois  There are some worldwide computers with similar architectures –SAKURA is a BG/Q system, for example, like Mira

7 Code Issues  Everything we have seriously tried runs: Geant4, Root, Alpgen, Sherpa –Haven’t tried Athena, just the foundational code above  Code changes are minor –There’s a gcc bug that has to be programmed around in Alpgen (bug exists, but is not triggered, on an x86) –We use MPI to ensure unique random number seeds –G4 & Root require no changes –Rolf warns us that there is an “is this an x86?” in one spot in the ATLAS repository  Build environment issues are more substantial –Geant4’s build environment makes some x86-based assumptions in paths –Freetype2 (a Root dependence) is complicated  Optimization –Effects are large and not-simple Enabling one FPU on the BG/P almost doubles the speed Enabling the second one slows it down by ~8% –We decided not to spend a lot of time on this: the strategy is to get going first, and improve things second

8 Status  We have accepted an Alpgen grid job (4-vectors), run it on BG/P (and also BG/Q) and placed the output on the grid. –Technically, we could go into production immediately One person no longer has to sit there entering passwords from her cryptocard non-stop. Politically, this needs some discussion –The communication between systems that lets us do this automatically works, but could use some more development In particular, better error-checking and robustness should be in place before entering a production mode. Developed at ALCF, usable anywhere  Sherpa-MPI is an obvious next target  After that – maybe showering of Alpgen 4-vectors?  We will be evolving the HPCs (multiple)  x86 negotiation –Include allocation quotas, expected time to complete, allow failed jobs to restart…

9 The Future  Today, at submission time, a job specifies a machine and a number of cores. The hooks are in place to make this dynamic.  We want to be able to select computers at run-time, based on –Estimated completion time –Quota available –Upcoming jobs (we don’t want to start a job that will block a later one)  We want to dynamically alter the “shape” of a job –Recast a 1000-core 2-hour job as a 2000-core 1-hour job  The vision is to be able to run opportunistically, in the “nooks and crannies” –There is a lot (by our standards) of CPU available this way –It tends to be available at certain days and times (Sunday nights) – accepting longer latencies will enable us to do this

10 Thoughts In Lieu of Conclusions  We’re using prun now – should we continue this in the future? –Maybe there should be a “phpc” mode in addition to prun and pathena  Accepting longer (10 days?) job latencies would help  ATLAS tries very hard in several places to make the jobs x86-sized. –This is ~1000x smaller than we would like. –Random number seed starvation is a concern. Alpgen (for example) asks for two five-digit seeds. The second seed is made unique across nodes, so we burn through seeds 10’s of thousands of times faster.  Validation of Alpgen 4-vectors is taking longer than we would like (delivered 21 Feb) –Single-core runs can be compared bit-for-bit with x86 (OK with Alpgen) –Multi-core runs can’t (different seeds) –We need to think about this systematically  A well-designed single event server may make running ATLAS Simulation easier –Especially if it can send events via MPI as well as TCP/IP  An integrated simulation framework may make running ATLAS Simulation harder