Roger Jones, Lancaster University1 Experiment Requirements from Evolving Architectures RWL Jones, Lancaster University Ambleside 26 August 2010.

Slides:



Advertisements
Similar presentations
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Advertisements

1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
1: Operating Systems Overview
Chapter 1 and 2 Computer System and Operating System Overview
Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Digital Graphics and Computers. Hardware and Software Working with graphic images requires suitable hardware and software to produce the best results.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 1: Introduction.
Computer System Architectures Computer System Software
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Roger Jones, Lancaster University1 Experiment Requirements from Evolving Architectures RWL Jones, Lancaster University IC 9 July 2010.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’
Y. Kotani · F. Ino · K. Hagihara Springer Science + Business Media B.V Reporter: 李長霖.
Operating Systems Lecture 02: Computer System Overview Anda Iamnitchi
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.
Introducing collaboration members – Korea University (KU) ALICE TPC online tracking algorithm on a GPU Computing Platforms – GPU Computing Platforms Joohyung.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
WLCG Overview Board, September 3 rd 2010 P. Mato, P.Buncic Use of multi-core and virtualization technologies.
MultiJob pilot on Titan. ATLAS workloads on Titan Danila Oleynik (UTA), Sergey Panitkin (BNL) US ATLAS HPC. Technical meeting 18 September 2015.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Lecture on Central Process Unit (CPU)
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
General requirements for BES III offline & EF selection software Weidong Li.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.
Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.
Scientific Computing Goals Past progress Future. Goals Numerical algorithms & computational strategies Solve specific set of problems associated with.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Scaling up R computation with high performance computing resources.
I/O aspects for parallel event processing frameworks Workshop on Concurrency in the many-Cores Era Peter van Gemmeren (Argonne/ATLAS)
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Mini-Workshop on multi-core joint project Peter van Gemmeren (ANL) I/O challenges for HEP applications on multi-core processors An ATLAS Perspective.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Unit 2 Technology Systems
NFV Compute Acceleration APIs and Evaluation
Processes and threads.
SuperB and its computing requirements
Multicore Computing in ATLAS
for the Offline and Computing groups
Sharing Memory: A Kernel Approach AA meeting, March ‘09 High Performance Computing for High Energy Physics Vincenzo Innocente July 20, 2018 V.I. --
Spatial Analysis With Big Data
WLCG Collaboration Workshop;
Ray-Cast Rendering in VTK-m
Chapter 2 Operating System Overview
Process Management -Compiled for CSIT
Multicore and GPU Programming
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Presentation transcript:

Roger Jones, Lancaster University1 Experiment Requirements from Evolving Architectures RWL Jones, Lancaster University Ambleside 26 August 2010

Roger Jones, Lancaster University2 Technology Challenge Well known that the simulation and reconstruction of events for studies at (and even !) poses real challenges –Long time to process each event –High memory profile (presently more than total real memory on an 8- core processor) These all pose direct challenges to performance, which people are trying to address However, the challenges are much greater: Three components –Memory profile per process –I/O through processor –Data volume Growing resource needs only met if we can use the tech We are *obliged* to work smarter The Grid must support this

Parallelism Initial Solutions: AthenaMP etc Generally work smarter! Event level parallelism –E.g. AthenaMP Share common memory between parent and daughter processes to allow many on a single node Some speed-up using event loop parallelism –Also share common pages between processes with KSM Real gains in memory use, but some slow-down –Cache as much as you can (e.g. pile-up events) –Also Non-Uniform Memory Access, simultaneous multi-threading Issues: hard to monitor performance in parallel jobs Roger Jones, Lancaster University3 Other approaches –Job level parallelism (e.g parallel Gaudi) & hyperthreading –Pinning of processes to cores or hyperthreads with Affinty

AthenaMP on 32 core NERSC machine Specs: –8 x Quad-Core AMD Opteron™ –Processor 8384 = 32 cores –250GbofMemory! L2:512Kb,L3:6MB – Core Speed: 2700 MHz Roger Jones, Lancaster University4 AthenaMP Good (but) Limits of approach evident More parallelism needed

Related developments IO challenges being (partly) addressed by fast merging Re-write of Gaudi with stronger memory model planned Down the line, we need to parallise the code –This could be either for many-core processors or for Graphical Processing Units – but the development might address both GPUs having big success & cost savings in other fields We need the R&D to know which path to take Must explore these –If these work for us, use them –If not, have an answer for FAs –Developments require O(3 years) to implement –This includes Geant4 – architectural review this year

Convergence Intel, AMD see a convergent future (APU) OpenCL provides development platform for both – avoid technology lock-in CUDA is Nvidia specific, but not hard to port to Open CL Roger Jones, Lancaster University6 AMD Fusion unveiled To market core Single Chip Cloud unveiled 3/12/09

Approach Most suitable problems –Compute Intensity – Large number of arithmetic operations per IO or global memory reference. –Data Parallelism –same function applied to all records of an input stream and no need to wait for previous records. – Data Locality –data produced, read once or twice later in the application, and never read again. ATLAS specific demonstrators –Tracking code (examples exist from other experiments) –Magnetic field service (could save memory and well suited to GPU service) Run service in GPU as co-processor CEs must support –GPUs as primary compute node – many issues same as multicore –GPUs as co-processor –Work starting in UK on this, level of effort uncertain! Roger Jones, Lancaster University7

Multi-Core Workshops Second multi-core workshop 21/22 June –Under auspices of 2 OpenLab R&D projects WP8 Parallelization of Software Frameworks to explo it Multi-core Processors WP9 Portable Analysis Environment using Virtu alization Technology –Experiments represented and gave a view of requirements in these two areas –Not complete, but important! Roger Jones, Lancaster University8

Summarizing Multi-core Experiments need nodes, not just cores –Experiments responsible for utilization, using Pilot job mechanisms Multiple processes and/or multiple threads Job mix scheduling End-to-end changes needed in Grid frameworks –From user submission mechanism –To local batch system Accounting needs to change Larger files will result from parallel jobs Roger Jones, Lancaster University9

Summarizing Virtualization Virtualization of services is accepted and happens –VO boxes will be virtualized shortly Virtualization of WNs is for flexibility and efficiency –Performance hit must be known and monitored (e.g. IO to disk) –Cluster vitualization must support more than one image (e.g. Proof cluster, production,…) –HEPiX document to specify the obligations on image authors; separation of base OS and experiment code –Experiment support for CERNVM File System support for adding to image after instantiation –Some call for 24*7 CERNVM support Spin-out 3 discussion groups for next steps: –Performance, end-2-end roadmap, accounting Roger Jones, Lancaster University10

Other Issues: Storage developments Interaction with storage also challenging –Copy on write being investigated –Solid State Devices look promising for database access (TAG DB etc) and possibly analysis worker nodes –Need to change DB technologies for scaling –Grid impacts minimal? IO capacity does not scale with cores Facing issues in high-throughput access to storage (e.g. for analysis) –Common issue with HPC community –Emerging solution: data localization –Copy on write based technologies –Needs changes to SE, CE and data management systems?

Conclusion ATLAS Offline now has Upgrade Computing activity area –Focus of development, LoI etc –Suitable forum for trigger/offline constraint planning? Workshop meetings for regular technical exchange planned Roger Jones, Lancaster University12