Download presentation
Presentation is loading. Please wait.
1
OGO 2.1 SGI Origin 2000 Robert van Liere CWI, Amsterdam TU/e, Eindhoven 11 September 2001
2
unite.sara.nl SGI Origin 2000 Located at SARA in Amsterdam Hardware configuration : –128 MIPS R10000 CPUs @ 250 Mhz –64 Gbyte main memory –1 Tbyte disk storage –11 ethernet @ 100 Mbits –1 ethernet @ 1 Gbit
3
Contents Architecture –Overview –Module interconnect –Memory hierarchies Programming –Parallel models –Data placement Pros and cons
4
Overview - Features 64 bit RISC microprocessors Large main memory “Scalable” in CPU, memory and I/O Shared memory programming model
5
Overview - Applications Worldwide : +/- 30.000 systems –~ 50 with >128 CPUs –~ 100 with 64-128 CPUs –~ 500 with 32-64 CPUs Computing serving : many CPUs and memory Database serving : many disks Web serving : many I/O
6
System architecture – 1 CPU CPU + cache One system bus Memory I/O (network + disk) Cached data
7
System architecture – N CPU Symmetric multi- processing (SMP) Multi-CPU + caches One shared bus Memory I/O
8
N CPU – cache coherency Problem: –Inconsistent cached data Solution: –Snooping –Broadcasting Not scalable
9
Architecture – Origin 2000 Node board 2 CPU + cache Memory Directory HUB I/O
10
Origin 2000 Interconnect Node boards Routers –Six ports
11
Interconnect Topology
12
Sample Topologies
13
128 Topology
14
Virtual Memory One CPU, multi programs Page Paging disk Page replacement
15
O2000 Virtual Memory Multi CPU, Multi progs Non-Uniform Memory Access Efficient programs: –Minimize data movement –Data “close” to CPU
16
Latencies and Bandwidth
17
Application performance Scientific computing –LU, ocean, barnes, radiosity Linear speedup –More CPUs -> performance
18
Programming support IRIX operating system Parallel programming –C source level with compiler pragmas –Posix Threads –UNIX processes Data placement –dplace, dlock, dperf Profiling –timex, ssrun
19
Parallel Programs Functional Decomposition –Decompose the problem into different tasks Domain Decomposition –Partition the problem’s data structure Consider –Mapping tasks/parts onto CPUs –Coordinate work and communication of CPUs
20
Task Decomposition Decompose problem Determine dependencies
21
Task Decomposition Map tasks on threads Compare: –Sequential case –Parallel case
22
Efficient programs Use many CPUs –Measure speedups Avoid: –Excessive data dependencies –Excessive cache misses –Excessive inter-node communication
23
Pros vs Cons Multi-processor (128 ) Large memory (64 Gbyte) Shared memory programming Slow integer CPU Performance penalty: –Data dependencies –Off board memory
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.