OGO 2.1 SGI Origin 2000 Robert van Liere CWI, Amsterdam TU/e, Eindhoven 11 September 2001.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture
Advertisements

Distributed Systems CS
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Structure of Computer Systems
SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Multiple Processor Systems
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
Background Computer System Architectures Computer System Software.
History of Distributed Systems Joseph Cordina
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
CS 284a, 7 October 97Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 7 October 1997.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
G Robert Grimm New York University Disco.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Sun FIRE Jani Raitavuo Niko Ronkainen. Sun FIRE 15K Most powerful and scalable Up to 106 processors, 576 GB memory and 250 TB online disk storage Fireplane.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
CS252/Patterson Lec /2/01 CS 213 Lecture 6: Multiprocessor 3: Measurements, Crosscutting Issues, Examples, Fallacies & Pitfalls.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Computer System Architectures Computer System Software
Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz Data Distilleries B.V. Amsterdam The Netherlands Stefan.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
1 Lecture 22 Multiprocessor Performance Adapted from UCB CS252 S01, Copyright 2001 USB.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Parallel Computer Architecture and Interconnect 1b.1.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
Embedded System Lab. 김해천 Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master Technologist,
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
BSP on the Origin2000 Lab for the course: Seminar in Scientific Computing with BSP Dr. Anne Weill –
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Sun Starfire: Extending the SMP Envelope Presented by Jen Miller 2/9/2004.
Outline Why this subject? What is High Performance Computing?
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
1 Parallel Applications Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Presented by: Nick Kirchem Feb 13, 2004
COMP SYSTEM ARCHITECTURE
Caches in Systems Feb 2013 COMP25212 Cache 4.
CMSC 611: Advanced Computer Architecture
Directory-based Protocol
STARFIRE Extending the SMP Envelope
Lecture 1: Parallel Architecture Intro
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
High Performance Computing
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Chip&Core Architecture
The University of Adelaide, School of Computer Science
Presentation transcript:

OGO 2.1 SGI Origin 2000 Robert van Liere CWI, Amsterdam TU/e, Eindhoven 11 September 2001

unite.sara.nl SGI Origin 2000 Located at SARA in Amsterdam Hardware configuration : –128 MIPS R Mhz –64 Gbyte main memory –1 Tbyte disk storage – Mbits –1 1 Gbit

Contents Architecture –Overview –Module interconnect –Memory hierarchies Programming –Parallel models –Data placement Pros and cons

Overview - Features 64 bit RISC microprocessors Large main memory “Scalable” in CPU, memory and I/O Shared memory programming model

Overview - Applications Worldwide : +/ systems –~ 50 with >128 CPUs –~ 100 with CPUs –~ 500 with CPUs Computing serving : many CPUs and memory Database serving : many disks Web serving : many I/O

System architecture – 1 CPU CPU + cache One system bus Memory I/O (network + disk) Cached data

System architecture – N CPU Symmetric multi- processing (SMP) Multi-CPU + caches One shared bus Memory I/O

N CPU – cache coherency Problem: –Inconsistent cached data Solution: –Snooping –Broadcasting Not scalable

Architecture – Origin 2000 Node board 2 CPU + cache Memory Directory HUB I/O

Origin 2000 Interconnect Node boards Routers –Six ports

Interconnect Topology

Sample Topologies

128 Topology

Virtual Memory One CPU, multi programs Page Paging disk Page replacement

O2000 Virtual Memory Multi CPU, Multi progs Non-Uniform Memory Access Efficient programs: –Minimize data movement –Data “close” to CPU

Latencies and Bandwidth

Application performance Scientific computing –LU, ocean, barnes, radiosity Linear speedup –More CPUs -> performance

Programming support IRIX operating system Parallel programming –C source level with compiler pragmas –Posix Threads –UNIX processes Data placement –dplace, dlock, dperf Profiling –timex, ssrun

Parallel Programs Functional Decomposition –Decompose the problem into different tasks Domain Decomposition –Partition the problem’s data structure Consider –Mapping tasks/parts onto CPUs –Coordinate work and communication of CPUs

Task Decomposition Decompose problem Determine dependencies

Task Decomposition Map tasks on threads Compare: –Sequential case –Parallel case

Efficient programs Use many CPUs –Measure speedups Avoid: –Excessive data dependencies –Excessive cache misses –Excessive inter-node communication

Pros vs Cons Multi-processor (128 ) Large memory (64 Gbyte) Shared memory programming Slow integer CPU Performance penalty: –Data dependencies –Off board memory