Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Slides:



Advertisements
Similar presentations
CGrid 2005, slide 1 Empirical Evaluation of Shared Parallel Execution on Independently Scheduled Clusters Mala Ghanesh Satish Kumar Jaspal Subhlok University.
Advertisements

Distributed Systems CS
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.
1 Software Testing and Quality Assurance Lecture 40 – Software Quality Assurance.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Amdahl's Law.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
NERCS Users’ Group, Oct. 3, 2005 NUG Training 10/3/2005 Logistics –Morning only coffee and snacks –Additional drinks $0.50 in refrigerator in small kitchen.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Introduction to HPC resources for BCB 660 Nirav Merchant
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
1 Introduction to Parallel Computing. 2 Multiprocessor Architectures Message-Passing Architectures –Separate address space for each processor. –Processors.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Tests and tools for ENEA GRID Performance test: HPL (High Performance Linpack) Network monitoring A.Funel December 11, 2007.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.
Faucets Queuing System Presented by, Sameer Kumar.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Scalable and Coordinated Scheduling for Cloud-Scale computing
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.
HPC need and potential of ANSYS CFD and mechanical products at CERN A. Rakai EN-CV-PJ2 5/4/2016.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Low-Cost High-Performance Computing Via Consumer GPUs
OpenMosix, Open SSI, and LinuxPMI
CS427 Multicore Architecture and Parallel Computing
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
Grid Computing.
CRESCO Project: Salvatore Raia
Is System X for Me? Cal Ribbens Computer Science Department
CPU SCHEDULING.
CARLA Buenos Aires, Argentina - Sept , 2017
Performance And Scalability In Oracle9i And SQL Server 2000
Cluster Computers.
Presentation transcript:

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Resources needed for applications arising from Nanotechnology  Large memory – Tbytes  High floating point computing speed – Tflops  High data throughput – state of the art …

SMP architecture P P PP Memory

Cluster architecture Interconnection network

Why not a cluster  Single SMP system easier to purchase/maintain  Ease of programming in SMP systems

Why a cluster  Scalability  Total available physical RAM  Reduced cost  But …

Having an application which exploits the parallel capabilities Studying the application or applications which will run on the cluster

Things to include in design Property of code Essential component CPU bound Fast computing unit Memory bound Large memory, fast access Global flow of data in parallel app Fast interconnect

Our choices Property of code Essential component Choice Computationn ally intensive,FP Fast computing unit 64 bit dual core,Opteron, Rev.F Large matrices Large memory, fast access 8 GB /node Finite element, spectral codes, Fast interconnect Infiniband DDR (20 Gb/s,low latency)

Other requirements  Space, power,cooling constraints, strength of floors  Software configuration: 1. Operating system 2. Compilers & application deve. tools 3. Load balancing and job scheduling 4. System management tools

Configuration PPPP PP MM M Infiniband Switch

Before finalizing our choice … One should check, on a similar system :  Single processor peak performance  Infiniband interconnect performance  SMP behaviour  Non commercial parallel applications behaviour

Parallel applications issues  Execution time  Parallel speedup Sp= T1/Tp  Scalability

Benchmark design  Must give a good estimate of performance of your application  Acceptance test -should match all its components

Comparison of performance NancoCarmelComputer Mflops Ratio of 7.8 !! 487 Mflops Lapack program, N=9000

Execution time of Monte-Carlo parallel code (MPI) Nanco Carmel1 Processes) 4389 (~1 hr) (~6hrs !)

What did work  Running MPI code interactively  Running a serial job through the queue  Compiling C code with MPI

What did not work  Compiling F90 or C++ code with MPI  Running MPI code through the queue  Queues do not do accounting per CPU

Parallel performance results Theoretical peak 2.1 Tflops Nanco performance on HPL: 0.58 Tflops

Comparison with Sun Benchmark

Execution time –comparison of compilers

Performance with different optimizations

Conclusions from acceptance tests  New gcc (gcc4) is faster than Pathscale for some applications  MPI collective communication functions are differently implemented in various MPI versions  Disk access times are crucial - use attached storage when possible

Scheduling decisions  Assessing priorities between user groups  Assessing parallel efficiency of different job types (MPI,serial,OPenMP) /commercial software and designing special queues for them  Avoiding starvation by giving weight to the urgency parameter

Observations during production mode  Assessing user ’ s understanding of machine – support in writing scripts and efficient parallelization  Lack of visualization tools – writing of script to show current usage of cluster

Utilization of cluster

Utilization of nanco sep08

Nanco jobs by type

Conclusion  Benchmark correct design is crucial to test capabilities of proposed architecture  Acceptance tests allow to negotiate with vendors and give insights on future choices  Only after several weeks and running of the cluster at full capacity can we make informed decisions on management of the cluster