16.07.2015HSM Meeting - HPC - FS High Performance Computing (HPC) Support from IT 1 Historical Overview of IT Computing Support & Present State Present.

Slides:



Advertisements
Similar presentations
Case Study: A Telephone Switching System [G7.2.1] zProblem: A big telecommunications company wishes to upgrade their existing telephone switching system.
Advertisements

IEFC Workshop – 22/03/2011 JJ Gras on behalf of BE-BI 1.
4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
Linux vs. Windows. Linux  Linux was originally built by Linus Torvalds at the University of Helsinki in  Linux is a Unix-like, Kernal-based, fully.
Java.  Java is an object-oriented programming language.  Java is important to us because Android programming uses Java.  However, Java is much more.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
1: Operating Systems Overview
IE 486 Work Analysis & Design II
IACT901 - Module 1 Planning Theory - Scope & Integration ABRS Hong Kong 2004 Penney McFarlane University of Wollongong.
IT Services Centrally Funded Workstations
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Development and Quality Plans
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
Chapter 9 – Software Evolution and Maintenance
Introduction Status of SC simulations at CERN
1 Building and Maintaining Information Systems. 2 Opening Case: Yahoo! Store Allows small businesses to create their own online store – No programming.
Spreadsheets in Finance and Forecasting Presentation 8: Problem Solving.
Software change  Managing the processes of software system change.
REVIEW OF NA61 SOFTWRE UPGRADE PROPOSAL. Mandate The NA61 experiment is contemplating to rewrite its fortran software in modern technology and are requesting.
The HiLumi LHC Design Study is included in the High Luminosity LHC project and is partly funded by the European Commission within the Framework Programme.
Developing a result-oriented Operational Plan Training
PTC ½ day – Experience in PS2 and SPS H. Bartosik, Y. Papaphilippou.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
How to write a literature review 1. 2 Purpose of Literature Review Provide some form of background to the research problem being studied; Describe the.
Ian Bird LHCC Referee meeting 23 rd September 2014.
Cloud Computing Characteristics A service provided by large internet-based specialised data centres that offers storage, processing and computer resources.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
1 Metrics for the Office of Science HPC Centers Jonathan Carter User Services Group Lead NERSC User Group Meeting June 12, 2006.
BE-IT Needs  ICE lxbatch Group  List of BE-IT Needs.
Cluster Management Scorecard FITT (Fostering Interregional Exchange in ICT Technology Transfer)
Radiation Tolerant Electronics New Policy? Ph. Farthouat, CERN.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
1 Planning for Reuse (based on some ideas currently being discussed in LHCb ) m Obstacles to reuse m Process for reuse m Project organisation for reuse.
Commodity Node Procurement Process Task Force: Status Stephen Wolbers Run 2 Computing Review September 13, 2005.
Preparation of Review R. Assmann et al CWG, CWG R. Assmann.
Principles, Importance and Role of Energy Management
Optimization of Field Error Tolerances for Triplet Quadrupoles of the HL-LHC Lattice V3.01 Option 4444 Yuri Nosochkov Y. Cai, M-H. Wang (SLAC) S. Fartoukh,
LHCbComputing Manpower requirements. Disclaimer m In the absence of a manpower planning officer, all FTE figures in the following slides are approximate.
MIS 7003 MBA Core Course in MIS Professor Akhilesh Bajaj The University of Tulsa Introduction to S/W Engineering © All slides in this presentation Akhilesh.
‘Computer power’ budget for the CERN Space Charge Group Alexander Molodozhentsev for the CERN-ICE ‘space-charge’ group meeting March 16, 2012 LIU project.
CLIC Beam Physics Working Group CLIC pre-alignment simulations Thomas Touzé BE/ABP-SU Update on the simulations of the CLIC pre-alignment.
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
Proposal for a Global Network for Beam Instrumentation [BIGNET] BI Group Meeting – 08/06/2012 J-J Gras CERN-BE-BI.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Advanced User Support in the Swedish National HPC Infrastructure May 13, 2013NeIC Workshop: Center Operations best practices.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
CMB & LSS Virtual Research Community Marcos López-Caniego Enrique Martínez Isabel Campos Jesús Marco Instituto de Física de Cantabria (CSIC-UC) EGI Community.
Farming Andrea Chierici CNAF Review Current situation.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Conflict Management Technique
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Conclusions on CS3014 David Gregg Department of Computer Science
ICE SECTION The coolest place to be! Elias Métral
Detector building Notes of our discussion
People who attended the meeting:
AFS/LSF Phase Out personal experience
Survey Lead – Pier Valitutti Presented by – Nader Afshar
Faster Data Structures in Transactional Memory using Three Paths
LIU, ABP-CWG, PBC, miscellaneous
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Chapter 27 Software Change.
Near Future Plans for the Space Charge Team after SC-13
Chapter 8 Software Evolution.
Types of Parallel Computers
Presentation transcript:

HSM Meeting - HPC - FS High Performance Computing (HPC) Support from IT 1 Historical Overview of IT Computing Support & Present State Present Situation for the LIU SC Studies Proposal for HPC Support from IT (Primary Goal) Main Secondary Goals Conclusion

HSM Meeting - HPC - FS Historical Overview of IT Computing Support (1/3) 2 The IT Support for the systematic dynamic aperture studies for the LHC can be considered ideal! Both Hardware & Software Support have been State of the Art at any moment in time. This support however is in large part due personal initiative between members of IT & ABP (and their predecessors). IT had tolerated this successful collaboration at best. They did provide help with installation & maintenance of the hardware.

HSM Meeting - HPC - FS Historical Overview of IT Computing Support (2/3) 3 Hardware CRAY Supercomputer DEC Workstation PC Cluster Screen Saver Distributed Computing (BOINC, (Play console chips, GPU) Financing LHC Project leader (Lyn Evans) paid a total of about 800’000 CHF for several options. BE had for the last PC farm upgrade ~150’000 CHF.

HSM Meeting - HPC - FS Historical Overview of IT Computing Support (3/3) 4 Software: Decade long collaboration with Eric McIntosh and to lesser but still significant level by Harry Renshall Making SixTrack a reliable high speed tracker now with world wide reputation. Speed and other Optimization of the code and continuous maintenance, E.G. Code quality, Vectorization, Speed optimization of critical loops, etc For BB studies: approximate but very fast complex error function in collaboration with George Erskine. Guaranteed hardware independent bit-by-bit precision of the results. ➔ BOINC Check-point/restart ➔ BOINC

HSM Meeting - HPC - FS Present Support State 1/2 5 IT no longer supports general package libraries like CERNLIB (done by PH until 2006) ➔ no more mathematicians like Erskine. IT also no longer provide any program support ➔ experts like McIntosh or Renshall can no longer be found in IT. ➔ In principle they simply cannot help us with our Code Development! They now have a mandate to support CERN critical computation efforts, E.G. since quite some time they maintain & update our ~400 PC cluster for LHC simulations without charging us. As I understand they will agree to “reasonable” enlargement of our cluster following a justified request (may take several months). They offer a fair-share procedure to avoid an idling system and one may boost the number of available boxes by a factor of 2 when urgent simulation campaigns require more computing facilities.

HSM Meeting - HPC - FS Present Support State 2/2 6 Due to severe lack of IT support and discussion with Oliver and Paul I have been asked to present our BE/ABP situation in the IT Service Review Meeting (ITSRM) end of 2009 resulting in: Keep AFS Tools No Software Support possible! Keep NAG tools Get back BOINC BE-IT Forum ➔ Accelerator Sector treated like a LHC experiment

HSM Meeting - HPC - FS Present Situation for the LIU SC Studies 1/3 7 During the last couple of years we have been worked hard to prepare for the systematic studies on the LHC pre- accelerators PSB, PS & SPS: Preparing several SC PIC and frozen SC codes Benchmarking various Codes Benchmarking the Codes with Experiments Optimization of the non-linear Models of the Machines Even advancements in our theoretical Understanding In fall we will summarize our results and prepare for systematic SC studies for LIU.

HSM Meeting - HPC - FS Present Situation for the LIU SC Studies 2/3 8 For our studies IT has provided 40 boxes with 48 cores, i.e. 2’000 cores total. These boxes are reasonable powerful but no longer top notch! These boxes are old and no longer produced and therefore there cannot be any upgrades we might get more of the same! This system was just enough to satisfy the base need of the PSB, while too slow for long-term studies both for the PSB & PS. On the other hand long-term simulation with the frozen SC in MAD-X over 800’000 turns take 10 days of sequential simulation while for the PIC simulations a few 1’000 take weeks on the 48 core machines. Since the SPS hasn’t yet fully started SC simulations we have to wait for the requirements for that machine.

HSM Meeting - HPC - FS Present Situation for the LIU SC Studies 3/3 9 In essence to reach about 10’000 turns to cover the initial phase where the self-consistent effects are most crucial we would need a system with; Better scaler speed More cores per box to take advantage of the scaling with the number of cores. This can be improved making use use of clusters that are geared for HPC like CNAF or EPFL. We are in contact with them. However, we would be required to pay for their services. If we could convince IT to provide at least the hardware of a sufficiently large system this would be of course be advantageous!

CERN, red CNAF-Bologna, blue (Please ignore the green curve!)

HSM Meeting - HPC - FS Proposal for HPC Support from IT (Primary Goal) 1/2 11 Bernd Panzer from IT has been in charge of providing computing resources at CERN since quite some time, E.G. he has provided us with the 48 core systems including the maintenance without charging BE (again close to zero help with the code issue!). During the last few months I have been discussing with him about a potential system of 16 core machines linked with INFINIBAND networking to create roughly 200 cores per system which fits with our scaling tests.

HSM Meeting - HPC - FS Proposal for HPC Support from IT (Primary Goal) 2/2 12 The idea is to provide as an initial system 10 of those systems, i.e. another 2’000 cores but 4 times faster (conservative estimate) than our 48 core system. The hope was that IT would decide to go for HPC but due to the financial considerations they have put it on ice. But there is hope that a request from our BE department head will be sufficient to convince IT to look into this more seriously.

HSM Meeting - HPC - FS Main Secondary Goals 13 Once we have convinced IT to agree to provide an HPC system there will be a long list of secondary requests: A.How long will it take ➔ Might be 9 months but that is still okay for the LIU studies. B.Fairshare ➔ It depends if there will be other users that will need a multi-core systems. In fact, it appears that there is another request from theory. NO COSTS! C.Upgrade Hardware ➔ This remains to be seen since due to parallel structure all machines must be at equal speed! D.Progression of the system ➔ Dependent on our usage we might ask for 50% growth per year of active work.

HSM Meeting - HPC - FS Conclusions 14 1.For the LIU systematic Studies a substantial Speed-Up would be highly desirable ➔ This can only be achieved with HPC facilities. 2.IT is on the verge to adopt HPC in their mandate but presently it is on ice. 3.IT’s mandate includes support of CERN’s critical computing needs ➔ Our proposal would be covered. 4.The primary Goal is to get HPC facilities from IT ➔ This will require a inter-departmental request from the BE head. 5.There are several secondary goals that will have to be addressed once IT accepts to cover HPC.