Download presentation
Presentation is loading. Please wait.
1
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 Lecture 10 Patterns for Parallel Programming III
2
CISC 879 : Software Support for Multicore Architectures Lecture 10: Overview Cell B.E. Clarification Design Patterns for Parallel Programs Finding Concurrency Algorithmic Structure Organize by Tasks Organize by Data Supporting Structures
3
CISC 879 : Software Support for Multicore Architectures LS-LS DMA transfer (PPU) int main() { pthread_t pts[N]; spe_context_ptr_t spe[N]; struct thread_args t_args[N]; int i; spe_program_handle_t *program; program = spe_image_open("../spu/hello"); for (i = 0; i < N; i++) { spe[i] = spe_context_create(0,NULL); spe_program_load(spe[i],program); t_args[i].spe = spe[i]; t_args[i].spuid = i; pthread_create(&pts[i],NULL, &my_spe_thread,&t_args[i]); } void *ls = spe_ls_area_get(spe[1]); unsigned int mbox_data = (unsigned int)ls; printf ("mbox_data %x\n", mbox_data); int rc; rc = spe_in_mbox_write(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); rc = spe_out_intr_mbox_read(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); for (i = 0; i < N; i++) { rc = spe_in_mbox_write(spe[i], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); } for (i = 0; i < N; i++) { pthread_join(pts[i],NULL); } spe_image_close(program); for (i = 0; i < N; i++) { spe_context_destroy(spe[i]); } return 0; }
4
CISC 879 : Software Support for Multicore Architectures LS-LS DMA transfer (PPU) int main() { pthread_t pts[N]; spe_context_ptr_t spe[N]; struct thread_args t_args[N]; int i; spe_program_handle_t *program; program = spe_image_open("../spu/hello"); for (i = 0; i < N; i++) { spe[i] = spe_context_create(0,NULL); spe_program_load(spe[i],program); t_args[i].spe = spe[i]; t_args[i].spuid = i; pthread_create(&pts[i],NULL, &my_spe_thread,&t_args[i]); } void *ls = spe_ls_area_get(spe[1]); unsigned int mbox_data = (unsigned int)ls; printf ("mbox_data %x\n", mbox_data); int rc; rc = spe_in_mbox_write(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); rc = spe_out_intr_mbox_read(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); for (i = 0; i < N; i++) { rc = spe_in_mbox_write(spe[i], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); } for (i = 0; i < N; i++) { pthread_join(pts[i],NULL); } spe_image_close(program); for (i = 0; i < N; i++) { spe_context_destroy(spe[i]); } return 0; }
5
CISC 879 : Software Support for Multicore Architectures LS-LS DMA transfer (PPU) int main() { pthread_t pts[N]; spe_context_ptr_t spe[N]; struct thread_args t_args[N]; int i; spe_program_handle_t *program; program = spe_image_open("../spu/hello"); for (i = 0; i < N; i++) { spe[i] = spe_context_create(0,NULL); spe_program_load(spe[i],program); t_args[i].spe = spe[i]; t_args[i].spuid = i; pthread_create(&pts[i],NULL, &my_spe_thread,&t_args[i]); } void *ls = spe_ls_area_get(spe[1]); unsigned int mbox_data = (unsigned int)ls; printf ("mbox_data %x\n", mbox_data); int rc; rc = spe_in_mbox_write(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); rc = spe_out_intr_mbox_read(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); for (i = 0; i < N; i++) { rc = spe_in_mbox_write(spe[i], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); } for (i = 0; i < N; i++) { pthread_join(pts[i],NULL); } spe_image_close(program); for (i = 0; i < N; i++) { spe_context_destroy(spe[i]); } return 0; }
6
CISC 879 : Software Support for Multicore Architectures LS-LS DMA transfer (SPU) int main() { gettimeofday(&tv,NULL); printf("spu %lld; t.tv_usec %ld\n", spuid,tv.tv_usec); if (spuid == 0) { unsigned int ea; unsigned int tag = 0; unsigned int mask = 1; ea = spu_read_in_mbox(); printf("ea = %p\n",(void*)ea); mfc_put(&tv,ea + (unsigned int)&tv, sizeof(tv),tag,1,0); mfc_write_tag_mask(mask); mfc_read_tag_status_all(); spu_write_out_intr_mbox(0); } spu_read_in_mbox(); printf("spu %lld; tv.tv_usec = %ld\n", spuid,tv.tv_usec); return 0; }
7
CISC 879 : Software Support for Multicore Architectures LS-LS Output -bash-3.2$./a.out spu 0; t.tv_usec = 875360 spu 1; t.tv_usec = 876446 spu 2; t.tv_usec = 877443 spu 3; t.tv_usec = 878459 mbox_data f7764000 ea = 0xf7764000 spu 0; tv.tv_usec = 875360 spu 1; tv.tv_usec = 875360 spu 2; tv.tv_usec = 877443 spu 3; tv.tv_usec = 878459
8
CISC 879 : Software Support for Multicore Architectures Organize by Data Operations on core data structure Geometric Decomposition Recursive Data
9
CISC 879 : Software Support for Multicore Architectures Geometric Deomposition Arrays and other linear structures Divide into contiguous substructures Example: Matrix multiply Data-centric algorithm and linear data structure (array) implies geometric decomposition
10
CISC 879 : Software Support for Multicore Architectures Recursive Data Lists, trees, and graphs Structures where you would use divide-and-conquer May seem that can only move sequentially through data structure But, there are ways to expose concurrency
11
CISC 879 : Software Support for Multicore Architectures Recursive Data Example Find the Root: Given a forest of directed trees find the root of each node Parallel approach: For each node, find its successor’s successor Repeat until no changes O(log n) vs O(n) Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007
12
CISC 879 : Software Support for Multicore Architectures Organize by Flow of Data Organize By Flow of Data RegularIrregular Event-Based Coordination Pipeline
13
CISC 879 : Software Support for Multicore Architectures Organize by Flow of Data Computation can be viewed as a flow of data going through a sequence of stages Pipeline: one-way predictable communication Event-based Coordination: unrestricted unpredictable communication
14
CISC 879 : Software Support for Multicore Architectures Pipeline performance Concurrency limited by pipeline depth Balance computation and communication (architecture dependent) Stages should be equally computationally intensive Slowest stage creates bottleneck Combine lightly loaded stages or decompose heavily- loaded stages Time to fill and drain pipe should be small
15
CISC 879 : Software Support for Multicore Architectures Supporting Structures Single Program Multiple Data (SPMD) Loop Parallelism Master/Worker Fork/Join
16
CISC 879 : Software Support for Multicore Architectures SPMD Pattern Create single program that runs on each processor Initialize Obtain a unique identifier Run the same program each processor Identifier and input data can differentiate behavior Distribute data (if any) Finalize Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007
17
CISC 879 : Software Support for Multicore Architectures SPMD Challenges Split data correctly Correctly combine results Achieve even work distribution If programs require dynamic load balancing, another pattern may be more suitable (Job Queue) Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007
18
CISC 879 : Software Support for Multicore Architectures Loop Parallelism Pattern Many programs expressed as iterative constructs Programming models like OpenMP provide pragmas to automatically assign loop iterations to processors Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007
19
CISC 879 : Software Support for Multicore Architectures Master/Work Pattern Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007
20
CISC 879 : Software Support for Multicore Architectures Master/Work Pattern Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007 Relevant where tasks have no dependencies Embarrassingly parallel Problem is determining when entire problem complete
21
CISC 879 : Software Support for Multicore Architectures Fork/Join Pattern Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007 Parent creates new tasks (fork), then waits until they complete (join) Tasks created dynamically Tasks can create more tasks Tasks managed according to relationships
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.