Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Multiple Processor Systems
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
MEMORY ORGANIZATION Memory Hierarchy Main Memory Auxiliary Memory
OGO 2.1 SGI Origin 2000 Robert van Liere CWI, Amsterdam TU/e, Eindhoven 11 September 2001.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Operating System - Overview Lecture 2. OPERATING SYSTEM STRUCTURES Main componants of an O/S Process Management Main Memory Management File Management.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
Memory Design Example. Selecting Memory Chip Selecting SRAM Memory Chip.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Chapter 9 Virtual Memory Produced by Lemlem Kebede Monday, July 16, 2001.
Vacuum tubes Transistor 1948 –Smaller, Cheaper, Less heat dissipation, Made from Silicon (Sand) –Invented at Bell Labs –Shockley, Brittain, Bardeen ICs.
MEMORY MANAGEMENT By KUNAL KADAKIA RISHIT SHAH. Memory Memory is a large array of words or bytes, each with its own address. It is a repository of quickly.
Computer Organization and Architecture
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Parallel Computer Architectures
1 What is an operating system? CSC330Patricia Van Hise.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Computer Organization and Architecture Operating System Support Chapter 8.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Computers Central Processor Unit. Basic Computer System MAIN MEMORY ALUCNTL..... BUS CONTROLLER Processor I/O moduleInterconnections BUS Memory.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
 Introduction, concepts, review & historical perspective  Processes ◦ Synchronization ◦ Scheduling ◦ Deadlock  Memory management, address translation,
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
CS 1308 Computer Literacy and the Internet. Introduction  Von Neumann computer  “Naked machine”  Hardware without any helpful user-oriented features.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
CPU Computer Hardware Organization (How does the computer look from inside?) Register file ALU PC System bus Memory bus Main memory Bus interface I/O bridge.
IO Memory Management Hardware Goes Mainstream
Parallel Computer Architecture and Interconnect 1b.1.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
BSP on the Origin2000 Lab for the course: Seminar in Scientific Computing with BSP Dr. Anne Weill –
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.
1 Introduction ELG 6158 Digital Systems Architecture Miodrag Bolic.
Lecture on Central Process Unit (CPU)
Chapter 1: How are computers organized?. Software, data, & processing ? A computers has no insight or intuition A computers has no insight or intuition.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
1 Lecture: Virtual Memory Topics: virtual memory, TLB/cache access (Sections 2.2)
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
ECE 456 Computer Architecture Lecture #9 – Input/Output Instructor: Dr. Honggang Wang Fall 2013.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
The University of Adelaide, School of Computer Science
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Chapter 3 Getting Started. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. Objectives To give an overview of the structure of a contemporary.
Computer Architecture Chapter (8): Operating System Support
Overview Parallel Processing Pipelining
Applied Operating System Concepts
COMP SYSTEM ARCHITECTURE
From Address Translation to Demand Paging
Multiple Processor Systems
Peng Liu Lecture 14 I/O Peng Liu
Chapter 1: How are computers organized?
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Quick Tutorial on MPICH for NIC-Cluster
4.3 Virtual Memory.
Virtual Memory 1 1.
Presentation transcript:

Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg, Mar 2004 (v1.2)

Parallel Programming on the SGI Origin2000 1)Parallelization Concepts 2)SGI Computer Design 3)Efficient Scalar Design 4)Parallel Programming -OpenMP 5)Parallel Programming- MPI

2) SGI Computer Design

Origin2000/3000 architecture features Important hardware and software components: * node board: processors + memory * node interconnect topology and configurations * scalability of the architecture * directory-based cache coherency * single system image components

Origin2000 node board

Origin node board HUB crossbar ASIC: - Single chip integrates all four functions: * processor interface: two rxK processors on the same bus * memory interface, integrating the memory controller and (direct) cache coherency * interface to CrayLink Interconnect to other nodes in the system * interface to I/O defices with XIO-to-PCI bridges - Memory access characteristics: * read bandwidth single processor 460 MB/s sustained * average access latency 315 ns to restart processor pipeline

Origin2000 node components

Origin router interconnect - Router chip has 6 CrayLink interfaces: 2 for connections to nodes (HUBs) and 4 for connections to other routers in the network * 4-dimensional interconnect - The interconnect topology is determined by the size of the computer (number of nodes): * direct (back-to-back) connection for 2 nodes (4 cpu) * strongly connected cube up to 32 cpu * hypercube for up to 64 cpu * hypercube of hypercubes for up to 256 cpu

Origin2000 – two nodes

Origin2000 module connections

Origin2000 interconnect

32 processors 64 processors

Origin2000 interconnect

Directory-based uniform cache Cache line use is recorded in directory, which resides in memory

Origin cache coherence - Memory page is divided in data blocks of 32 words or 128 bytes each (L2 cache line size) - Each data request transfers one data block (128 bytes) - Each data block has associated presence and state information presence state 64 bits 3 bits directory data block (cache line) 128 bytes (32 words) memory - If a node (HUB) requests a data block, the corresponding presence bit is set and the state of that cache line is recorded - HUB runs the cache coherence protocol, updating the state of the data block and notifying nodes for which the presence bit is set

Origin address space - Physically the memory is distributed and not contiguous - Node id is assigned at boot time - Logically memory is a shared single contiguous address space, the virtual address space is 44 bits (16 TB) - A program (compiler) uses the virtual address space - CPU translates from virtual to physical address space node id 8 bits Node offset 32 bits (4 GB) k1n0k1n0 012n012n TLB Physical Virtual TLB – Translation Look-aside Buffer Node id Empty slot Memory present page

Summary: origin2000 properties - Single machine image * behaves like a large workstation * same compilers * time sharing * all SGI old code (binaries) will run * OS schedules the hardware resources on the machine - processor scalability cpu - I/O scalability - all memory and I/O devices are directly addressable * no limitations on the size of a single program, it can use all available memory * no limitations on the location of the data, all disks can be used in a single file system - 64 bit operating system and file system * HPC features: Checkpoint/restart, queueing system - machine stability

Origin2000/3000 architecture goal Hardware design – distributed memory But: to a programmer – It looks like shared memory

Example: Simple Memory Access

(1) NQS queues on parix (2) Interactive Maximum cputime = 15 minutes Parix run limits

Two ways to run a batch job (1) Parameters in command line (2) Parameters in script file

QSUB options

Output of command: “qstat –a”

Exercise 1 – login and submit a job