Parallel Computers Past and Present Yenchi Lin Apr 17,2003.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Parallel computer architecture classification
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
CM-5 Massively Parallel Supercomputer ALAN MOSER Thinking Machines Corporation 1993.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Today’s topics Single processors and the Memory Hierarchy
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Parallel Computers Chapter 1
Parallel Architectures: Topologies Heiko Schröder, 2003.
Parallel Architectures: Topologies Heiko Schröder, 2003.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Parallel Computing Overview CS 524 – High-Performance Computing.
Hardware Basics: Inside the Box 2  2001 Prentice Hall2.2 Chapter Outline “There is no invention – only discovery.” Thomas J. Watson, Sr. What Computers.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
10-1 Chapter 10 - Trends in Computer Architecture Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
“The Architecture of Massively Parallel Processor CP-PACS” Taisuke Boku, Hiroshi Nakamura, et al. University of Tsukuba, Japan by Emre Tapcı.
Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.
HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Outline Course Administration Parallel Archtectures –Overview –Details Applications Special Approaches Our Class Computer Four Bad Parallel Algorithms.
Sun Fire™ E25K Server Keith Schoby Midwestern State University June 13, 2005.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Parallel Computing.
CS591x -Cluster Computing and Parallel Programming
1 THE EARTH SIMULATOR SYSTEM By: Shinichi HABATA, Mitsuo YOKOKAWA, Shigemune KITAWAKI Presented by: Anisha Thonour.
1 Basic Components of a Parallel (or Serial) Computer CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM.
Interconnection network network interface and a case study.
Outline Why this subject? What is High Performance Computing?
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
10-1 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of.
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
CS Design of Algorithms Parallel Computer Architecture and Software Models.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Array computers. Single Instruction Stream Multiple Data Streams computer There two types of general structures of array processors SIMD Distributerd.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Parallel Architecture
Parallel computer architecture classification
Overview of Earth Simulator.
Course Outline Introduction in algorithms and applications
The Earth Simulator System
What is Parallel and Distributed computing?
Guoliang Chen Parallel Computing Guoliang Chen
Lecture 1: Parallel Architecture Intro
Different Architectures
Part 2: Parallel Models (I)
Chapter 4 Multiprocessors
Presentation transcript:

Parallel Computers Past and Present Yenchi Lin Apr 17,2003

Outline Concepts/Background on Parallel Computers Connection Machines Earth Simulator Conclusion

Quick architecture overview SIMD, MIMD Shared memory, distributed memory MPP, PVP, SMP NOW  Network of Workstations (clusters)

SIMD, MIMD SIMD – Single Instruction Multiple Data  All processors perform same instruction on different pieces of data  Some processors can be masked out from executing certain instructions MIMD – Multiple Instruction Multiple Data  Each processor executes different instruction on different data

Memory Shared Memory  Single, unified address space across all processors Distributed Memory  Each processor has its own address space Hybrid  Multiple processors within a computing node share the same address space, while the whole system has many different address spaces.

Processors PVP – parallel vector processors  Cray, NEC, Hitachi MPP – massively parallel processors  Connection Machines SMP – symmetric multiple processor  Sun SunFire, DEC (Compaq/HP) AlphaServer

D.E. Culler, J.P. Singh, A. Gupta “Parallel Computer Architecture – A Hardware/Software Approach”

Trends (cont.) D.E. Culler, J.P. Singh, A. Gupta “Parallel Computer Architecture – A Hardware/Software Approach” The trend of MPP overtaking SMP has continued, as number of NOW (clusters) grow in TOP 500 list.

Connection Machines Invented by Dennis Hills of Thinking Machines Corp. while at MIT. Originally designed to run artificial intelligence applications  First working application on CM-1 : Game of Life CM-1(1985), CM-2 (1986) and CM-5 (1992) Richard Feynman helped in building the first CM-1s. At its peak, 70 machines were installed around the world and all in TOP 500 list. Thinking Machines Corp. filed bankruptcy in 1993, changed to pure software company in 1996, bought by Oracle in 1999.

CM-2 – 1986 SIMD hypercube connection 1bit processor in groups of dimension for 8192 processor configuration, 12 dimension for processor configuration. Programming languages – C*, * lisp, CM Fortran

Sprint Node in CM-2 1 bit-serial processors 16 in a group, two groups on the board Two groups share same memory and floating point unit Router has limited processing power 12 degree connectivity!

Hypercube Connection in CM-2 Maximum hop count in hypercube = dimension of hypercube Router randomly pick the next hop High wire count Four dimensional hypercube

CM-5 – 1992 Distributed memory multi- processor Sparc + custom vector units Fat Tree structure Programming Languages – C*, * lisp, CM Fortran, HPF, C++, etc Supports partitioning, multi-user

Processing Element in CM-5 33Mhz SPARC Vector processor Network interface 32MB memory Connected using Sun MBus Network access treated equally as memory access – expensive for larger message

Fat-Tree of CM-5 Three networks – data, control and diagnostic, synchronized on 40Mhz clock 4-ary fat tree, each processor as leaf  Two parents per child for the first two levels  Four parents per child for higher levels Data network of CM-5

Transition from CM-2 to CM-5 1-bit serial processors -> 64bit SPARCs SIMD -> MIMD  Use SPMD to emulate SIMD behavior Hypercube -> Fat-Tree  Randomness preserved by random routing

Earth Simulator – 2002 Collection of modified NEC SX nodes, 8 way each 12.3GB/s x 2 network Theoretical throughput 40TFlops Max throughput 36TFlops running Linpack

Programming Models of ES MPI/HPF on node level and process level OpenMP, threads Automatic Vectorization

Organization of ES 320 processor node (PN) cabinet, 2 nodes each 65 interconnect (IN) cabinet Crossbar of 640 nodes  12.3GB/s x 2 (bidirectional) node-to-node, 8TB/s aggregated 900TB disk space, 1.6 PB tape storage

PN of ES Arithmetic Processor (SX-6) Memory (512MB)

Arithmetic Processor Total of 640 x 8 = 5112 arithmetic processors

remarks Initial Cost:  Development: 40Billion Yen (USD $400M)  Physical Building: 7Billion Yen (USD $70M) Operating cost:  Maintenance: 8Billion Yen/Year (USD $80M) USD $2.54/sec  Electricity: 800Million Yen/Year (USD $8M)

Eye Candies PN cabinet, 9AP’s in one Back of a PN cabinet 1 AP, 9 in one cabinet SX-6i

Conclusion Connection machines were interesting Earth simulator is also interesting Early designs versus recent design  GigaFlops vs. TeraFlops When will Americans take back the crown in supercomputing?

references Top 500.org Earth simulator D.E. Culler, J.P. Singh and A. Gupta. “Parallel Computer Architecture – A Hardware/Software Approach” 1999 Hennessy, Patterson. “Computer Architecture – A Quantitative Approach, 2 nd Ed.” 2002 D. J. Kerbyson, A. Hoisie, H. Wasserman. “A Comparison Between the Earth Simulator and AlphaServer Systems using Predictive Application Performance Models” 2002 Thinking Machines Corp. “The Network Architecture of the Connection Machine CM-5” 1992 E. Blelloch, et. All. “A Comparison of Sorting Algorithms for the Connection Machine CM-2” 1991