Presentation is loading. Please wait.

Presentation is loading. Please wait.

Module 1 Parallel Programming And Threads Parallel Programming And Threads.

Similar presentations


Presentation on theme: "Module 1 Parallel Programming And Threads Parallel Programming And Threads."— Presentation transcript:

1 Module 1 Parallel Programming And Threads Parallel Programming And Threads

2 Parallelism and Concurrency: System and Environment  Parallelism: exploit system resources to speed up computation  Concurrency: respond quickly/properly to events  from the environment  from other parts of system Practical Parallel and Concurrent Programming DRAFT: comments to msrpcpcp@microsoft.com 2 Environment System Events 1/3/2016

3 Components and Parallelism  A component can use parallelism internally to improve performance  Usually, clients need not be aware of internal parallelism  Why would the interface change because of internal parallelism? m(0)m(1) m(N-1) … A C Client call return

4 Encapsulating Parallelism  A component can have a parallel implementation  it is an “implementation” detail whether or not there is internal parallelism  Behavior of parallel implementation  should be the “same” as sequential, where  component specification defines “same”

5 Examples  Parallel parsing of HTML  Parallel XML query processing  Use of commands in Linux  Applying same command to multiple files  Searching different Internet sites

6 Main memory Processor core ALU 1 1 2 2 3 3 4 4 5 5... Instruction stream Clock: 0Clock: 1Clock: 2Clock: 3Clock: 4Clock: 5Clock: 6Clock: 7Clock: 8Clock: 9 2 2 4 4 6 6 9 9 Completion time A simple microprocessor model ~ 1985 Clock: 10 12 Clock: 11  Single h/w thread  Instructions execute one after the other  Memory access time ~ clock cycle time  Single h/w thread  Instructions execute one after the other  Memory access time ~ clock cycle time Clock: 12 ALU: arithmetic logic unit Practical Parallel and Concurrent Programming DRAFT: comments to msrpcpcp@microsoft.com 61/3/2016

7 Main memory Instruction stream 2 2 2 2 2 2 204 (main memory) Completion time FastFwd Two Decades (circa 2005): Power Hungry Superscalar with Caches 226 (hit in L2) Multiple levels of cache, 2 cycles for L1, 20 cycles for L2, 200 cycles for memory ALU 1 1 2 2 3 3 4 4 5 5... L2 cache (4MB) L1 cache (64KB)  Dynamic out-of- order I execution  Pipelined memory accesses  Speculation - ex I b4 branch resolved  Dynamic out-of- order I execution  Pipelined memory accesses  Speculation - ex I b4 branch resolved Practical Parallel and Concurrent Programming DRAFT: comments to msrpcpcp@microsoft.com 71/3/2016

8 Practical Parallel and Concurrent Programming DRAFT: comments to msrpcpcp@microsoft.com 8

9  Power wall  we can’t clock processors faster  Memory wall  many workload’s performance is dominated by memory access times  Instruction-level Parallelism (ILP) wall  we can’t find extra work to keep functional units busy while waiting for memory accesses Practical Parallel and Concurrent Programming DRAFT: comments to msrpcpcp@microsoft.com 91/3/2016

10 Core 1 1 2 2 3 3 4 4 5 5... Multi-core h/w – common L2 1 1 2 2 3 3 4 4 5 5... L2 cache Core Main memory L1 cache ALU Practical Parallel and Concurrent Programming DRAFT: comments to msrpcpcp@microsoft.com 101/3/2016

11 1 1 2 2 3 3 4 4 5 5... Multi-core h/w – additional L3 1 1 2 2 3 3 4 4 5 5... Main memory Single- threaded core L1 cache Single- threaded core L1 cache L2 cache L3 cache Practical Parallel and Concurrent Programming DRAFT: comments to msrpcpcp@microsoft.com 111/3/2016

12 SMP multiprocessor Single- threaded core 1 1 2 2 3 3 4 4 5 5... 1 1 2 2 3 3 4 4 5 5 L1 cache Single- threaded core L1 cache L2 cache Main memory Practical Parallel and Concurrent Programming DRAFT: comments to msrpcpcp@microsoft.com 121/3/2016

13 Interconnect NUMA multiprocessor non-uniform memory access Single- threaded core L1 cache Single- threaded core L1 cache Memory & directory L2 cache Single- threaded core L1 cache Single- threaded core L1 cache Memory & directory L2 cache Single- threaded core L1 cache Single- threaded core L1 cache Memory & directory L2 cache Single- threaded core L1 cache Single- threaded core L1 cache Memory & directory L2 cache Practical Parallel and Concurrent Programming DRAFT: comments to msrpcpcp@microsoft.com 131/3/2016

14 Three kinds of parallel hardware  Multi-threaded cores  Increase utilization of a core or memory b/w  Peak ops/cycle fixed  Multiple cores  Increase ops/cycle  Don’t necessarily scale caches and off-chip resources proportionately  Multi-processor machines  Increase ops/cycle  Often scale cache & memory capacities and b/w proportionately Practical Parallel and Concurrent Programming DRAFT: comments to msrpcpcp@microsoft.com 141/3/2016

15 Sequential Program int sum = 5; for (int i=0; i<5; i++) sum += i; int sum = 5; for (int i=0; i<5; i++) sum += i;

16 Sequential Program  Determinism  Given a current program state and a code fragment, determine the next program state  Termination  Prove that a program terminates  Usually depends on loop or procedure recursion termination conditions  Determinism  Given a current program state and a code fragment, determine the next program state  Termination  Prove that a program terminates  Usually depends on loop or procedure recursion termination conditions

17 Parallel Programs  Concurrent  Non-deterministic  Given state and code, next state is ???  Non-terminating  Distributed  Concurrent but can survive partial failure of thread or process  Byzantine  Distributed but can survive partial failure at the worst time and in the worst way  Concurrent  Non-deterministic  Given state and code, next state is ???  Non-terminating  Distributed  Concurrent but can survive partial failure of thread or process  Byzantine  Distributed but can survive partial failure at the worst time and in the worst way

18 Program Representation #include static int X=5; int main(int argc, char *argv[]) { printf(“%d %s \n”, argc, argv[0]); return 0; } #include static int X=5; int main(int argc, char *argv[]) { printf(“%d %s \n”, argc, argv[0]); return 0; } preprocessor Allocated space and initialized at compile time Local variables String constant

19 Program Representation (after compilation)  Object Module (.o.obj)  Code  Uninitialized static data  Initialized static data  X, 32 bits, 0x00000005  String, 64 bits, “%d %s \n”  Symbol Table  Defined  main  Referenced  printf  Object Module (.o.obj)  Code  Uninitialized static data  Initialized static data  X, 32 bits, 0x00000005  String, 64 bits, “%d %s \n”  Symbol Table  Defined  main  Referenced  printf

20 Linking and Loading  Linker (can also create libraries)  Combine multiple object modules into one  Satisfies any symbol references among the combined modules  Loader  Combine object modules and libraries into an executable file (a.out or.exe)  All symbol references must be satisfied  Symbol table used by debuggers  Dynamic linking  Stops program on reference to an undefined symbol, finds obj in file system, links and continues  Demand loading  Symbol ref satisfied before execution but load delayed  Linker (can also create libraries)  Combine multiple object modules into one  Satisfies any symbol references among the combined modules  Loader  Combine object modules and libraries into an executable file (a.out or.exe)  All symbol references must be satisfied  Symbol table used by debuggers  Dynamic linking  Stops program on reference to an undefined symbol, finds obj in file system, links and continues  Demand loading  Symbol ref satisfied before execution but load delayed

21 Program Representation at Runtime  Same as in the object modules  Code  Static Data  Created at runtime  Procedure call frame stack  Heap to support new/delete on dynamic variables  & Ref -- implicit pointer variable to data in the heap  * -- explicit pointer variable to data in the heap  Same as in the object modules  Code  Static Data  Created at runtime  Procedure call frame stack  Heap to support new/delete on dynamic variables  & Ref -- implicit pointer variable to data in the heap  * -- explicit pointer variable to data in the heap

22 Coroutine, state vector  hardware registers that must be saved when losing control of a physical processor and that must be restored when gaining control of a physical processor.  Coroutine - data structure for saved  State Vector - hardware registers  hardware registers that must be saved when losing control of a physical processor and that must be restored when gaining control of a physical processor.  Coroutine - data structure for saved  State Vector - hardware registers

23 Intel x86 State Vector AX=0000 BX=0000 CX=0000 DX=0000 SI=0000 DI=0000 SP=FFEE top-of-stack pointer BP=0000 procedure call frame pointer DS=0AD5 data segment pointer SS=0AD5 stack segment pointer CS=0AD5 code segment pointer ES=0AD5 IP=0100 instruction pointer (next instruction to execute) NV UP EI PL NZ NA PO NC (processor status bits) CS:IP Code Bytes Instruction 0AD5:0100 E8 FD 00 CALL 0200 AX=0000 BX=0000 CX=0000 DX=0000 SI=0000 DI=0000 SP=FFEE top-of-stack pointer BP=0000 procedure call frame pointer DS=0AD5 data segment pointer SS=0AD5 stack segment pointer CS=0AD5 code segment pointer ES=0AD5 IP=0100 instruction pointer (next instruction to execute) NV UP EI PL NZ NA PO NC (processor status bits) CS:IP Code Bytes Instruction 0AD5:0100 E8 FD 00 CALL 0200

24 C Procedure Call Frame

25 Command Line./a.out apple do* pear The shell expands command-line arguments../a.out apple donut doright pear argc = 5 argv[0] = “/Users/bobcook/home/bin/a.out” argv[1] = “apple” argv[2] = “donut” argv[3] = “doright” argv[4] = “pear”./a.out apple do* pear The shell expands command-line arguments../a.out apple donut doright pear argc = 5 argv[0] = “/Users/bobcook/home/bin/a.out” argv[1] = “apple” argv[2] = “donut” argv[3] = “doright” argv[4] = “pear”

26 Apple Xcode Debugger

27 Macintosh-5:fact bobcook$ gcc -g main.c Macintosh-5:fact bobcook$ gdb a.out (gdb) list 1#include 2 3int factorial(int n) { 4if (n<2) 5return 1; 6return n*factorial(n-1); 7} 8 9int main(int argc, char *argv[]) { 10printf("%d\n", factorial(atoi(argv[1]))); Macintosh-5:fact bobcook$ gcc -g main.c Macintosh-5:fact bobcook$ gdb a.out (gdb) list 1#include 2 3int factorial(int n) { 4if (n<2) 5return 1; 6return n*factorial(n-1); 7} 8 9int main(int argc, char *argv[]) { 10printf("%d\n", factorial(atoi(argv[1])));

28 (gdb) set args 5 (gdb) b 5 Breakpoint 1 at 0x1f0a: file main.c, line 5. (gdb) r Starting program: /Users/bobcook/Desktop/fact/a.out 5 Reading symbols for shared libraries +. done Breakpoint 1, factorial (n=1) at main.c:5 5return 1; (gdb) bt #0 factorial (n=1) at main.c:5 #1 0x00001f1f in factorial (n=2) at main.c:6 #2 0x00001f1f in factorial (n=3) at main.c:6 #3 0x00001f1f in factorial (n=4) at main.c:6 #4 0x00001f1f in factorial (n=5) at main.c:6 #5 0x00001f52 in main (argc=2, argv=0xbffff840) at main.c:10 (gdb) set args 5 (gdb) b 5 Breakpoint 1 at 0x1f0a: file main.c, line 5. (gdb) r Starting program: /Users/bobcook/Desktop/fact/a.out 5 Reading symbols for shared libraries +. done Breakpoint 1, factorial (n=1) at main.c:5 5return 1; (gdb) bt #0 factorial (n=1) at main.c:5 #1 0x00001f1f in factorial (n=2) at main.c:6 #2 0x00001f1f in factorial (n=3) at main.c:6 #3 0x00001f1f in factorial (n=4) at main.c:6 #4 0x00001f1f in factorial (n=5) at main.c:6 #5 0x00001f52 in main (argc=2, argv=0xbffff840) at main.c:10

29 Context Block User struct in UNIX  Operating system information to define its virtual processor  Coroutine  Code, data, stack, heap segments  User id, group id, process id, parent id  Resource usage information  Scheduling information (priority)  Operating system information to define its virtual processor  Coroutine  Code, data, stack, heap segments  User id, group id, process id, parent id  Resource usage information  Scheduling information (priority)

30 Process  A program in execution  Thread -- entity within a process that can be scheduled for execution  Coroutine, thread id, thread priority, thread local storage, a unique call stack  All threads in a process share code, data, heap  A program in execution  Thread -- entity within a process that can be scheduled for execution  Coroutine, thread id, thread priority, thread local storage, a unique call stack  All threads in a process share code, data, heap

31 #include void *p(void *arg) { int i; for (i=0; i<5; i++) { printf("X\n"); sleep(1); } pthread_exit((void *)99); } int main() { //X Y interleaving is unpredictable pthread_t x; void *r; int i; assert(pthread_create(&x, NULL, p, (void *)34) == 0); for (i=0; i<5; i++) { printf("Y\n"); sleep(1); } assert(pthread_join(x, &r) == 0); return 0; } #include void *p(void *arg) { int i; for (i=0; i<5; i++) { printf("X\n"); sleep(1); } pthread_exit((void *)99); } int main() { //X Y interleaving is unpredictable pthread_t x; void *r; int i; assert(pthread_create(&x, NULL, p, (void *)34) == 0); for (i=0; i<5; i++) { printf("Y\n"); sleep(1); } assert(pthread_join(x, &r) == 0); return 0; }

32 Thread State Transitions

33 Multi-Thread Debugging Thread ID Thread ID Thread ID Call Stack Nth frame … 1st frame Call Stack Nth frame … 1st frame Local variables


Download ppt "Module 1 Parallel Programming And Threads Parallel Programming And Threads."

Similar presentations


Ads by Google