COCOA(1/19) Real Time Systems LAB. COCOA MAY 31, 2001 김경임, 박성호.

Slides:



Advertisements
Similar presentations
CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.
Advertisements

Setting up Small Grid Testbed
Beowulf Supercomputer System Lee, Jung won CS843.
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
Types of Parallel Computers
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
SHARCNET. Multicomputer Systems r A multicomputer system comprises of a number of independent machines linked by an interconnection network. r Each computer.
Understanding Application Scaling NAS Parallel Benchmarks 2.2 on NOW and SGI Origin 2000 Frederick Wong, Rich Martin, Remzi Arpaci-Dusseau, David Wu, and.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
CS 550 Amoeba-A Distributed Operation System by Saie M Mulay.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Universidad Politécnica de Baja California. Juan P. Navarro Sanchez 9th level English Teacher: Alejandra Acosta The Beowulf Project.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Linux Basics CS 302. Outline  What is Unix?  What is Linux?  Virtual Machine.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Compaq - Indiana University Visit IU’s Compaq Parallel PC Cluster.
Chapter 8 Input/Output. Busses l Group of electrical conductors suitable for carrying computer signals from one location to another l Each conductor in.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Parallelization with the Matlab® Distributed Computing Server CBI cluster December 3, Matlab Parallelization with the Matlab Distributed.
STRATEGIES INVOLVED IN REMOTE COMPUTATION
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
C++ Programming Language Lecture 1 Introduction By Ghada Al-Mashaqbeh The Hashemite University Computer Engineering Department.
17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.
Beowulf Cluster Jon Green Jay Hutchinson Scott Hussey Mentor: Hongchi Shi.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Computing Resources at Vilnius Gediminas Technical University Dalius Mažeika Parallel Computing Laboratory Vilnius Gediminas Technical University
Cluster Software Overview
Application Software System Software.
Harnessing Grid-Based Parallel Computing Resources for Molecular Dynamics Simulations Josh Hursey.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Scientific Computing Facilities for CMS Simulation Shams Shahid Ayub CTC-CERN Computer Lab.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Background Computer System Architectures Computer System Software.
CS-EE 481 Spring Founder’s Day, 2004 University of Portland School of Engineering Oregon Chub Beowulf Cluster Authors A.J. Supinski Billy Sword Advisor.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Constructing a system with multiple computers or processors
CRESCO Project: Salvatore Raia
Chapter 16: Distributed System Structures
Web Server Administration
Cornell Theory Center Cornell Theory Center (CTC) is a high-performance computing and interdisciplinary research center at Cornell.
CLUSTER COMPUTING.
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Hybrid Programming with OpenMP and MPI
Designing a PC Farm to Simultaneously Process Separate Computations Through Different Network Topologies Patrick Dreher MIT.
Cluster Computers.
Presentation transcript:

COCOA(1/19) Real Time Systems LAB. COCOA MAY 31, 2001 김경임, 박성호

COCOA(2/19) Real Time Systems LAB. Contents Background COCOA Overview System Architecture Key Technologies Application Area Evaluation Conclusion References

COCOA(3/19) Real Time Systems LAB. Background A Thesis in Aerospace engineering, Pennsylvania State Univ. by Anirudh Modi, 1999 –“Unsteady separated flow simulations using a cluster of workstations” Need to a suitable platform for the efficiency & accuracy of PUMA(a parallel flow solver) –Resolving several steady solutions –A fully three-dimensional unsteady separated flow around a sphere PUMA : the Parallel Unstructured Maritime Aerodynamics Financial support : the Rotorcraft Center of Excellence(RCOE) at Penn State

COCOA(4/19) Real Time Systems LAB. COCOA Overview The COst effective COmputing Array(COCOA) A Beowulf cluster that have 50 processors To bring low cost parallel computing –The whole system cost approximately $100,000 (1998 US dollars) Performance –the benchmark shows that was almost twice as fast as the Penn State IBM SP (older RS/ nodes) supercomputer for this applications

COCOA(5/19) Real Time Systems LAB. System Architecture Computing Node(26 WS-410 Dell W/S ) –Dual 400MHz Intel Pentium II Processors w/512K L2 Cache –512MB SDRAM –4GB UW-SCSI2 Disk –3com 3c509B 100Mbits/sec Fast Ethernet Card –32x SCSI CD-ROM Drive –1.44MB FDD –Cables In addition, –One Baynetworks 450T 24-way 100Mbits/sec Switch –Two 16-way Monitor/keyboard/mouse Switches –Four 500 kVa APC UPS –For one server : one monitor, keyboard, mouse and 54GB extra UW-SCSI2 HDD

COCOA(6/19) Real Time Systems LAB. System Architecture cont. Setting up H/W

COCOA(7/19) Real Time Systems LAB. System Architecture cont. Operating System –RedHat Linux 5.1 Software –Base packages from RedHat Linux 5.1, Kernel# –Freeware GNU C/C++ compiler(gcc, pgcc) –Fortran77/90 compiler & Debugger by Portland Group –Freeware MPI libraries for parallel programming in C/C++/Fortran77/90 –ssh for secure access –DQS v3.0, a queueing system –Scientific Visualization Software TECPLOT from Amtec Corp.

COCOA(8/19) Real Time Systems LAB. Key Technologies Beowulf Cluster –A system which usually consists of one server node, and one or more client nodes connected together via Ethernet or some other fast network –Developed for large scale computing, such as aerodynamics, atmosphere, physics, etc. –First Developed at 1994 in NASA –Low price supercomputing is possible High performance/low price processors High speed network devices available –Numerous Beowulf clusters developed Used in various computational science fields

COCOA(9/19) Real Time Systems LAB. Key Technologies cont. DQS (Distributed Queuing System) –Developed to experiment batch queuing system at the Super-computer Computations Research Institute, Florida State Univ. –Provide a single coherent allocation and management MPI (Message Passing Interface) –Standard for parallel programming SSH (Secure Shell) –Program for logging & executing commands into/on a remote machine –Provides secure encrypted communication inter-un-trusted hosts over an insecure network

COCOA(10/19) Real Time Systems LAB. Application Area Analysis maritime aerodynamics –Analysis flows over complex configurations (like ships and helicopter fuselages) –Use PUMA –Details of problem: Helicopter can safely land on frigate in the North Sea only 10 percent of the time in winter

COCOA(11/19) Real Time Systems LAB. PUMA ( P arallel U nstructured M aritime A erodynamics) Program for analysis of internal and external non- reacting compressible flows over arbitrarily complex 3D geometries Written entirely in ANSI C using MPI library for message passing and hence highly portable giving good performance

COCOA(12/19) Real Time Systems LAB. PUMA ( P arallel U nstructured M aritime A erodynamics) cont. Use domain decomposition –Domain decomposition Distribute data across processes, and each process performing approximately same operation on the data Problem level parallelism, but loop level (not SIMD) Minimize communications cost –Functional decomposition Divides a problem into several distinct tasks that may be executed in parallel Parallelization in PUMA –Each compute node read its own portion of the grid file at startup –Each compute node generate the flow solution over the given grid, parallelly

COCOA(13/19) Real Time Systems LAB. PUMA ( P arallel U nstructured M aritime A erodynamics) cont.

COCOA(14/19) Real Time Systems LAB. PUMA ( P arallel U nstructured M aritime A erodynamics) cont. Modifications to PUMA –Modify PUMA to read several hundred lines at a time and broadcasting the combined data to every processor using a reasonably sized buffer –Modify MPI to combine several small messages into one before starting communication Mbits/sec vs Packet size on COCOA for MPI_Send/Recv test

COCOA(15/19) Real Time Systems LAB. PUMA ( P arallel U nstructured M aritime A erodynamics) cont. Improvement in PUMA performance after combining several small MPI messages into one

COCOA(16/19) Real Time Systems LAB. Evaluation Total Mflops vs Number of Processors on COCOA for PUMA test case Speed-up vs Number of Processors on COCOA for PUMA test case

COCOA(17/19) Real Time Systems LAB. Evaluation cont. NAS Parallel Benchmark on COCOA: comparison with other machines for Class “C” LU test

COCOA(18/19) Real Time Systems LAB. Conclusion Beowulf class supercomputer (PC, Linux, MPI, DQS, SSH) Cost effective supercomputer for numerical simulations –Almost twice as fast compared to the Penn State IBM-SP supercomputer, for our production codes including PUMA, given the same number of processors, while being built at a fraction of the cost ($100,000(1998 US dollars)). Be suitable for only numerical simulation part (weather, fluid...) that doesn’t have high communication to computation ratios, because of the high communication latency. Good scalability with most of the MPI applications used The Object, to build Cost effective supercomputer for numerical simulations dealt with at Penn State has been fulfilled.

COCOA(19/19) Real Time Systems LAB. References COCOA : NAS Parallel Benchmarks : Beowulf : RedHat : MPI : DQS : Tons of references…