The Architecture of Earth Simulator

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

Machine cycle.
Computer Organization and Architecture
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
The University of Adelaide, School of Computer Science
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Today’s topics Single processors and the Memory Hierarchy
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Parallel Computers Past and Present Yenchi Lin Apr 17,2003.
ELEC 6200, Fall 07, Oct 29 McPherson: Vector Processors1 Vector Processors Ryan McPherson ELEC 6200 Fall 2007.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Prince Sultan College For Woman
Advanced Computer Architectures
Basics and Architectures
“The Architecture of Massively Parallel Processor CP-PACS” Taisuke Boku, Hiroshi Nakamura, et al. University of Tsukuba, Japan by Emre Tapcı.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
PIPELINING AND VECTOR PROCESSING
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
1 THE EARTH SIMULATOR SYSTEM By: Shinichi HABATA, Mitsuo YOKOKAWA, Shigemune KITAWAKI Presented by: Anisha Thonour.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Vector computers.
Software Design and Development Computer Architecture Computing Science.
Chapter Overview General Concepts IA-32 Processor Architecture
CPU Lesson 2.
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Topics to be covered Instruction Execution Characteristics
Single Instruction Multiple Data
PARALLEL COMPUTER ARCHITECTURE
Advanced Architectures
Distributed Processors
Lecture 5: Computer systems architecture
A Closer Look at Instruction Set Architectures
Parallel Processing - introduction
Overview of Earth Simulator.
CS 147 – Parallel Processing
Assembly Language for Intel-Based Computers, 5th Edition
The Earth Simulator System
Architecture & Organization 1
Prof. Zhang Gang School of Computer Sci. & Tech.
COMP4211 : Advance Computer Architecture
Parallel and Multiprocessor Architectures
Pipelining and Vector Processing
Array Processor.
Architecture & Organization 1
Chapter 17 Parallel Processing
Instruction Level Parallelism and Superscalar Processors
Introduction and History of Cray Supercomputers
Multivector and SIMD Computers
Chapter 2: Data Manipulation
Overview Parallel Processing Pipelining
Chap. 9 Pipeline and Vector Processing
Chapter 1 Introduction.
Part 2: Parallel Models (I)
Computer Architecture
Chapter 2: Data Manipulation
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Memory System Performance Chapter 3
Compiler analyzes the instructions before and after the branch and rearranges the program sequence by inserting useful instructions in the delay steps.
6- General Purpose GPU Programming
COMPUTER ORGANIZATION AND ARCHITECTURE
Chapter 2: Data Manipulation
Presentation transcript:

The Architecture of Earth Simulator Gülfem IŞIKLAR 09.12.2004

Outline Introduction Supercomputers Vector Processing The Earth Simulator Conclusion

Creating New Chemical Substances Global Warming Effects Introduction Image Processing Testing Car Crashes Medical Diagnosis Creating New Chemical Substances More Powerful Computers Climate Changes Gene Technology Space Exploration Global Warming Effects

Supercomputers Supercomputer : A computer that is able to operate at a speed that places it at or near the top speed of currently produced computers. The first supercomputers were introduced in the 1960s, designed primarily by Seymour Cray at Control Data Corporation (CDC).

Supercomputers On November 2004, according to the TOP500, the first 5 supercomputers are : 1. BlueGene/L, Doe/IBM, USA with perf : 70.72 TFLOPS 2. Columbia, NASA/Ames, USA with perf : 51.87 TFLOPS 3. Earth Simulator, Earth Simulator Center, Yokohama with perf : 35.86 TFLOPS 4. MareNostrum, Barcelona Supercomputer Center, Spain with perf : 20.53 TFLOPS 5. Thunder, Lawrence Livermore National Lab, USA with perf : 19.94 TFLOPS

Vector Processing Vector Processing : A program roughly takes the form of applying the same computation to a big chunk of data. Vector computers have instructions which can operate on strings of numbers formed as one-dimensional arrays (vectors). One operation can be specfied on all elements of vector in a single instruction. Basically, vector processing is a version of the Single Instruction Multiple Data (SIMD) parallel processing technique.

The main point: A single vector instruction represents a lot of basic scalar operations.

Vector Processing Properties The computation of each result (in vector processor) is independent of the computation of previous results. A single vector instruction specifies a great deal of work - it is equivalent to executing an entire loop. Vector instructions that access memory have a known access pattern. If the vector's elements are all adjacent, then fetching the vector from a set of heavily interleaved memory banks works very well.

Vector Programming Conventional Computer Initialize I = 0 20 Read B(I) Read C(I) Store A(I) = B(I) + C(I) Increment I = i + 1 If I  100 goto 20 1. A vector of values in B(I) will be fetched from memory. 2. A vector of values in C(I) will be fetched from memory. 3. A vector add instruction will operate on pairs of B(I) and C(I) values. 4. Stream of A(I) values will be stored back to memory, one value every clock cycle.

Vector Programming Vector Computer A(1:100) = B(1:100) + C(1:100) 1. B(1) will be fetched from memory. 2. C(1) will be fetched from memory. 3. A scalar add instruction will operate on B(1) and C(1). 4. A(1) will be stored back to memory 5. Step (1) to (4) will be repeated 100 times.

Vector Processing The machine has to fetch and decode far fewer instructions, so the control unit overhead is greatly reduced and the memory bandwidth necessary to perform this sequence of operations is reduced a corresponding amount. The instruction provides the processor with a regular source of data. When the vector instruction is initiated, the machine knows it will have to fetch n pairs of operands which are arranged in a regular pattern in memory. With an interleaved memory, the pairs will arrive at a rate of one per cycle, at which point they can be routed directly to a pipelined data unit for processing.

The Earth Simulator Milestones of Development Initiation : In 1997, The Earth Simulator Research and Development Center has been established. Conceptual design : It has been proposed by NEC Corporation and has been selected by bidding. 2002: The ES has achieved the performance of 26.78 TFLOPS by using the atmospheric general circulation model (AFES) which was the highest performance record. November 2004: The ES is the third supercomputer in TOP500 list.

Design Concepts A vector architecture should be employed which is an efficient architecture for large-scale scientific simulations. The system design should be as compact as possible in order to limit the space and electric power. As a result, the vector processor should be realized as a one-chip LSI. The memory bandwidth achieved should be 128 TB/s in order to maintain the peak performance which is more than 32 TFLOPS/s. So, a distributed main memory system should be used. A single-stage crossbar network should be taken in order to make the system completely homogeneous. A multiple job environment should be provided at operation of the ES.

The Earth Simulator

The Earth Simulator Building

The Earth Simulator

The Arithmetic Processor (AP) Each AP consist of 4-way superscalar unit, a vector unit, a main memory control unit on a single LSI chip. Each SU is a super-scalar processor with 64KB instruction cache, 64KB data cache, and 128 general-purpose scalar registers. Each VU has 72 vector registers, each of which has 256 vector elements, along with 8 sets of six different types of vector pipelines.

The Processor Nodes (NP) The 640 processor nodes are connected via a 640 x 640 single-stage crossbar switched.

The Memory System The memory system in a PN is equally shared by 8 APs and is configured with 32 main memory units, each of which has one memory port and is interconnected with a crossbar switch. Each processor within a node can have access to 32 memory ports when vector load/store instruc-tions are issued. Each processor has a data transfer rate of 32 GB/s with memory devices, which results in the aggregate throughput of 256 GB/s per node.

The Interconnection System

Conclusion There are two important major application groups in the ES project: 1. High resolution atmospheric and oceanographic models which are global models to predict global warming and El Nino event, regional models to predict Asian Monsoon and typhoon, and local model to predict weather disasters such as torrential rain falls and downbursts. 2. The applications in the field of solid earth science which are global models to describe longrange crustal movements, regional models to understand mechanism of seismicity and seismic wave propagation, and local models to understand migration of underground water and materials transfer in strata.

References [1] Earth Simulator Home Page, http://www.es.jamstec.go.jp/esc/eng/ES/ [2] M. Yokokawa, S. Shingu, S. Kawai, K. Tani, and H. Miyoshi (1998), Performance Estimation of the Earth Simulator, Proceedings of 8th ECMWF Workshop, World Scientific, 34-53. [3] Shingu S., Takahara H., Fuchigami H., Yamada M., Tsuda Y., Ohfuchi W., Sasaki Y., Kobayashi K., Hagiwara T., Habata S., Yokowawa M., Itoh H., Otsuka K. (2002), A 26.58 Tflops Global Atmospheric Simulation with the Spectral Transform Method on the Earth Simulator, IEEE. [4] Yokokawa M. (2001), Present Status of Development of the Earth Simulator, IEEE.