Download presentation
Presentation is loading. Please wait.
Published byGloria Shaw Modified over 9 years ago
1
BSP on the Origin2000 Lab for the course: Seminar in Scientific Computing with BSP Dr. Anne Weill – anne@tx.technion.ac.il,ph:4997anne@tx.technion.ac.il
2
Origin2000 (SGI) 32 processors
3
Origin2000/3000 architecture features Important hardware and software components: * node board: processors + memory * node interconnect topology and configurations * scalability of the architecture * directory-based cache coherency * single system image components
4
Origin2000 node board
5
Origin2000 – two nodes
6
Origin2000 interconnect
7
32 processors 64 processors
8
Origin router interconnect - Router chip has 6 CrayLink interfaces: 2 for connections to nodes (HUBs) and 4 for connections to other routers in the network * 4-dimensional interconnect - Router links are point-to-point connections 17+7 wires @ 400 MHz (that is, wire speed 800 MB/s) - Worm hole routing with static routing table loaded at boot - Router delay is 50 ns in one direction - The interconnect topology is determined by the size of the computer (number of nodes): * direct (back-to-back) connection for 2 nodes (4 cpu) * strongly connected cube up to 32 cpu * hypercube for up to 64 cpu * hypercube of hypercubes for up to 256 cpu
9
Origin address space - Physically the memory is distributed and not contiguous - Node id is assigned at boot time - Logically memory is a shared single contiguous address space, the virtual address space is 44 bits (16 TB) - A program (compiler) uses the virtual address space - CPU translates from virtual to physical address space node id 8 bits Node offset 32 bits (4 GB) 39 32 31 0 k1n0k1n0 012n012n TLB Physical Virtual TLB – Translation Look-aside Buffer 0 1 2 3.. Node id Empty slot Memory present page
10
Login to carmel 1. Open an ssh window to : carmel.technion.ac.il 2. Username : course01-course20 Password : bsp2006 Contact : Dr. Anne Weill – anne@tx.technion.ac.il,anne@tx.technion.ac.il phone :4997
11
Compiling and running codes 1.Setting path set path=($path /u/tcc/anne/BSP/bin) 2. Compiling %bspcc prog1.c –o prog1 %bspcc –flibrary-level 1 prog1.c –o prog1 (for non-dedicated machine) 3. Running %bsprun –npes 4 prog1
12
Running on carmel 1.Interactive mode : %./prog.exe 2. NQE queues: % qsub –q qcourse script.bat
13
BSP functions bsp_begin(maxpr)Start of program with at most maxpr processes bsp_end()End of program bsp_nprocs()Number of processes currently running bsp_pid()Returns process id` bsp_time()Returns elapsed wallclock time
14
Sample program
15
Output of hello program
16
How it works bsprun P0 P1 P2 P3 Prog.exe
17
SPMD – single program multiple data Each processor views only its local memory. Contents of variable X are different in different processors. Transfer of data can occur in principle through one-sided or two-sided communication.
18
DRMA- direct remote memory access All processors must register the space into which remote “read” and “write” will happen Calls to bsp_put Calls to bsp_get Call to bsp_sync – all processors synchronize, all communication is completed after the call
19
BSP functions for communication bsp_push_reg(var,nbytes)Registration of variable bsp_put(pid,source,dest,offs et,nbytes) Pid is destination processor bsp_get(pid, source,offset,dest,nbytes Pid is source processor bsp_pop_reg(var)`
20
Running on carmel 1.Interactive mode : %./prog.exe 2. NQE queues: % qsub –q qcourse script.bat
21
Script file for batch
22
Output of command: “qstat –a”
23
Another example *What does the following program ? What will the program print ?
25
Output of program
26
Another example * Is there a problem with the following example? What will the program print ?
28
Answer As it is written, the program will not print any output : the data is actually transferred only after the bsp_sync statement Additional question : what will the program print if bsp_sync is placed right after the put statement? NB : the programs are in directory /u/tcc/anne/BSPcourse, under prog2.c and prog2wrong.c – try them
29
Exercise1 (due Nov. 26d 2006) 1.Copy over to your directory the directory: /u/tcc/anne/BSPcourse. Take a look at the bspedupack.h file. 2.Write a C program in which each processor writes its pid into an array PIDS(0:p-1) on p0. (PIDS(i)=i). 3.Run the program for p=1,2,4,8,16 processors and print PIDS. You can run it interactively. 4. Same with a get instruction.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.