Download presentation
1
Special Course on Computer Architecture
#7 Simulation of Multi-Processors Hiroki Matsutani and Hideharu Amano June 3rd, 2011 Special Course on Computer Architecture
2
Outline: Simulation of Multi-Processors
Background Recent multi-core and many-core processors Network-on-Chip Shared-memory chip multi-processors Architecture Coherence protocols Simulation environment: GEMS/Simics Exercises [50min] Performance evaluation of parallel applications Performance evaluation of coherence protocols June 3rd, 2011 Special Course on Computer Architecture
3
Multi- and many-core architectures
4 8 16 32 64 128 256 2011 2004 2006 2008 2010 MIT RAW STI Cell BE Sun T1 Sun T2 TILERA TILE64 Intel Core, IBM Power7 AMD Opteron Intel 80-core ClearSpeed CSX600 ClearSpeed CSX700 picoChip PC102 picoChip PC205 UT TRIPS (OPN) Number of PEs (caches are not included) 2 Fujitsu SPARC64 Intel SCC
4
Network-on-Chip (NoC)
Interconnection network to connect many-cores Router Core 16-Core Tile Architecture June 3rd, 2011 Special Course on Computer Architecture
5
On-chip router architecture
2) arbitration for the selected output channel 1) selecting an output channel Input ports Output ports ARBITER X+ GRANT X+ FIFO 3) sending the packet to the output channel X- X- FIFO Y+ Y+ FIFO Y- Y- FIFO 5x5 CROSSBAR CORE CORE FIFO Routing, arbitration,&switch traversal are performed in pipeline manner June 3rd, 2011 Special Course on Computer Architecture
6
Outline: Simulation of Multi-Processors
Background Recent multi-core and many-core processors Network-on-Chip Shared-memory chip multi-processors Architecture Coherence protocols Simulation environment: GEMS/Simics Exercises [50min] Performance evaluation of parallel applications Performance evaluation of coherence protocols June 3rd, 2011 Special Course on Computer Architecture
7
Today’s target architecture
Chip multi-processors (CMPs) Multiple processors (each has private L1 cache) Shared L2 cache divided into multiple banks (SNUCA) Processor tile Cache tile UltraSPARC L1 cache (I & D) L2 cache bank June 3rd, 2011 Special Course on Computer Architecture
8
Today’s target architecture
Chip multi-processors (CMPs) Multiple processors (each has private L1 cache) Shared L2 cache divided into multiple banks (SNUCA) Processors and L2 cache banks are connected via NoC Processor tile Cache tile UltraSPARC L1 cache (I & D) L2 cache bank On-chip router June 3rd, 2011 Special Course on Computer Architecture
9
Cache coherence is maintained
Write back policy Cache-write updates the memory when block is evicted Write invalidate policy Cache-write invalidates all copies of the other sharers Processor tile Cache tile Main Memory June 3rd, 2011 Special Course on Computer Architecture
10
Cache coherence is maintained
A CPU wants to read a block cached at The CPU sends a read request to the memory controller The controller forwards the request to current owner The owner sends the block to the requestor Processor tile Cache tile Main Memory June 3rd, 2011 Special Course on Computer Architecture
11
Cache coherence: MOESI protocol class
Status of each cache block is represented with M/O/E/S/I Modified (M) Modified (i.e., dirty) Valid in one cache Shared (S) Shared by multiple CPUs Exclusive (E) Clean Exists in one cache Invalid (I) Owned (O) May or may not clean Exists in multiple caches Owned by one cache Owner Responsibility to respond any requests MOESI protocols MSI, MOSI, MESI, MOESI, … June 3rd, 2011 Special Course on Computer Architecture
12
Cache coherence protocols
MSI/MOSI directory protocol E state is not implemented S-to-M transition always updates the main memory MESI directory protocol O state is not implemented; Dirty sharing not allowed M-to-S transition always updates the main memory MOESI directory protocol MOESI token protocol [Martin ISCA03] There are tokens as many as the number of CPUs A CPU has one or more tokens It can read the block A CPU has all tokens It can modify (write) the block June 3rd, 2011 Special Course on Computer Architecture
13
MSI Protocol: State transition
CpuRd--- CpuWr--- CpuRd--- CpuRd--- M S M S CpuWrBusWr BusRdFlush CpuWr BusWr CpuRd BusRd BusWr Flush BusWr--- I I BusRd--- BusWr--- S-to-M transitions flush (update) the main memory Y. Solihin, "Fundamentals of Parallel Computer Architecture" (2009).
14
MESI Protocol: State transition
CpuRd--- CpuWr--- CpuRd--- M E BusRd FlushOpt M E BusWr Flush CpuWr--- BusRd Flush BusWr FlushOpt CpuWrBusWr CpuRd BusRd(!C) CpuWr BusUpgr S I S CpuRd BusRd(C) I BusRd FlushOpt BusRd--- BusWr--- BusUpgr--- CpuRd--- M-to-S transitions flush (update) the main memory Y. Solihin, "Fundamentals of Parallel Computer Architecture" (2009).
15
MOESI Protocol: State transition (1/2)
CpuRd--- CpuWr--- CpuRd--- CpuWr BusUpgr M E CpuWr--- CpuWrBusWr CpuRd BusRd(!C) CpuWr BusUpgr O S CpuRd BusRd(C) I CpuRd--- CpuRd--- Y. Solihin, "Fundamentals of Parallel Computer Architecture" (2009).
16
MOESI Protocol: State transition (2/2)
BusRd Flush BusRd FlushOpt BusWr Flush BusWr FlushOpt O S I BusRdFlush BusRd FlushOpt BusRd--- BusWr--- BusUpgr--- BusWrFlush BusUpgr--- Y. Solihin, "Fundamentals of Parallel Computer Architecture" (2009).
17
Outline: Simulation of Multi-Processors
Background Recent multi-core and many-core processors Network-on-Chip Shared-memory chip multi-processors Architecture Coherence protocols Simulation environment: GEMS/Simics Exercises [50min] Performance evaluation of parallel applications Performance evaluation of coherence protocols June 3rd, 2011 Special Course on Computer Architecture
18
Full-system simulation: GEMS/Simics
Wind River’s Simics Commercial detailed processor simulator Univ. of Wisconsin’s GEMS Cache, memory, and network module for Simics Processor tile Cache tile Main Memory UltraSPARC L1 cache (I & D) L2 cache bank On-chip router June 3rd, 2011 Special Course on Computer Architecture
19
Full-system simulation: GEMS/Simics
Today’s simulation target Solaris 9 OS on eight UltraSPARC processors Parallel application examples: Pi and Integer sort Various coherence protocols are supported Processor tile Cache tile Main Memory UltraSPARC L1 cache (I & D) L2 cache bank On-chip router June 3rd, 2011 Special Course on Computer Architecture
20
Full-system simulation: GEMS/Simics
Simulation target Solaris 9 OS on eight UltraSPARC processors Parallel application example: Integer Sort (IS) Solaris 9 is running on 8-core UltraSPARC Processor tile Cache tile Main Memory UltraSPARC L1 cache (I & D) L2 cache bank A parallel program Compile Execute it with 8-core On-chip router June 3rd, 2011 Special Course on Computer Architecture
21
Parallel application example: OpenMP
#include <stdio.h> #include <omp.h> int main() { #pragma omp parallel printf("hello world from %d of %d\n", omp_get_thread_num(), omp_get_num_threads()); return 0; } Hello from all threads
22
Parallel application example: OpenMP
int main() { int i; double start_time, end_time; start_time = omp_get_wtime(); omp_set_num_threads(num); #pragma omp parallel shared(A) private(i) { #pragma omp for for (i = 0; i < N; i++) A[i] = A[i] * A[i] - 3.0; } end_time = omp_get_wtime(); printf("Elapsed time: %f sec\n", end_time - start_time); return 0;
23
Parallel application example: OpenMP
int main() { int i; double s = 0.0; double start_time, end_time; start_time = omp_get_wtime(); #pragma omp parallel private(i) reduction(+:s) { #pragma omp for for (i = 0; i < N; i++) s += (4.0 / (4 * i + 1) - 4.0 / (4 * i + 3)); } printf("pi = %lf\n", s); end_time = omp_get_wtime(); printf("Elapsed time: %f sec\n", end_time - start_time);
24
Outline: Simulation of Multi-Processors
Background Recent multi-core and many-core processors Network-on-Chip Shared-memory chip multi-processors Architecture Coherence protocols Simulation environment: GEMS/Simics Exercises [50min] Performance evaluation of parallel applications Performance evaluation of coherence protocols June 3rd, 2011 Special Course on Computer Architecture
25
The first step: How to use the simulator
Please pick up your account information Log-in one of ICS cluster machines (id = 01…15) ssh –X Copy sample scripts and configuration files cp –r ~matutani/comparch2011/files work cd work June 3rd, 2011 Special Course on Computer Architecture
26
The first step: How to use the simulator
Start Simics ./start_ideal_memory.sh You can use the gray window as a console of the target system (i.e., Solaris 9 on 8-core UltraSPARCs). June 3rd, 2011 Special Course on Computer Architecture
27
The first step: How to use the simulator
In the target machine, for example, you can check the number of processors as follows. bash-2.05# /usr/sbin/psrinfo -v You will see that there are eight processors June 3rd, 2011 Special Course on Computer Architecture
28
Parallel application: “pi” calculation
You can execute a "pi" calculation program using eight, four, and one threads. bash-2.05# export OMP_NUM_THREADS=8 bash-2.05# ./pi bash-2.05# export OMP_NUM_THREADS=4 bash-2.05# export OMP_NUM_THREADS=1 June 3rd, 2011 Special Course on Computer Architecture
29
Parallel application: Integer Sort (IS)
You can execute an Integer Sort (IS) program using eight, four, and one threads. bash-2.05# export OMP_NUM_THREADS=8 bash-2.05# ./IS bash-2.05# export OMP_NUM_THREADS=4 bash-2.05# export OMP_NUM_THREADS=1 June 3rd, 2011 Special Course on Computer Architecture
30
Special Course on Computer Architecture
Exercise 1 Report the execution time of “pi” using 1, 4, 8, and 16 threads. Does the execution time linearly decrease as the number of threads increase? Discuss the results. June 3rd, 2011 Special Course on Computer Architecture
31
Coherence protocols: Integer Sort (IS)
The following scripts automatically perform the IS program with different cache coherent protocols. ./start_moesi_directory.sh ./start_mesi_directory.sh ./start_msi_mosi_directory.sh ./start_moesi_token.sh Each simulation takes five to ten minutes. Do not run more than one scripts at the same time! June 3rd, 2011 Special Course on Computer Architecture
32
Special Course on Computer Architecture
Exercise 2 Report the execution time of MSI/MOSI directory, MESI directory, MOESI directory, and MOESI token. Discuss the results. For more detail about the protocols, you can see pages 14—19. June 3rd, 2011 Special Course on Computer Architecture
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.