Download presentation
Presentation is loading. Please wait.
Published byPercival Stafford Modified over 9 years ago
1
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov email: nikolay.kurtov@intel.com Software and Services Group October 23, 2008
2
2 Software and Services Group 2 SEC(R) 2008 October 23, 2008 Agenda Existing parallel programming models Key concepts, Blackscholes example Performance results
3
3 Software and Services Group 3 SEC(R) 2008 October 23, 2008 Parallel programming is important Number of multi-core machines is growing Developers want to fully exploit architecture capabilities But Parallel programming is hard: Users must reason about parallelism Thread synchronization Embedded in serial languages − Data Overwriting − Arbitrary Serialization Tuning Performance Depends on a platform
4
4 Software and Services Group 4 SEC(R) 2008 October 23, 2008 Parallel Programming Models Improve productivity of programming Hide low-level details Provide high-level abstractions The following models are very popular: OpenMP Cilk Intel® Threading Building Blocks
5
5 Software and Services Group 5 SEC(R) 2008 October 23, 2008 OpenMP Perfect for Data-parallel algorithms Basics are easy to be applied: #pragma omp parallel for for (int i = 0; i < N; i++) doSomething(i); Advanced usage is complicated and error-prone Requires compiler support
6
6 Software and Services Group 6 SEC(R) 2008 October 23, 2008 Cilk The programmer identifyes elements that can safely be executed in parallel int fibonacci(int n) { if (n < 2) return n; int x = cilk_spawn fib(n-1); int y = cilk_spawn fib(n-2); cilk_sync; return (x+y); } Explicit spawning of tasks and synchronization with barriers
7
7 Software and Services Group 7 SEC(R) 2008 October 23, 2008 Intel® Threading Building Blocks Implemented as a C++ library Requires an excellent knowledge of C++ Provides excellent high-level abstractions Provides basic parallel algorithms: −parallel_for −parallel_sort −parallel_while −parallel_reduce −parallel_do −parallel_scan
8
8 Software and Services Group 8 SEC(R) 2008 October 23, 2008 Existing models - summary The programmer explicitly expresses parallelism Provide an imperative algorithm description Many low-levels questions are solved by the programmer Good control over performance
9
9 Software and Services Group 9 SEC(R) 2008 October 23, 2008 Agenda Existing parallel programming models Key concepts, Blackscholes example Performance results
10
10 Software and Services Group 10 SEC(R) 2008 October 23, 2008 The application problem: Serial code Semantic correctness Intel® Concurrent Collections: Architecture Actual parallelism Load balancing Distribution among processors Ideal Parallel programming model Domain Expert (person) Only domain knowledge No tuning knowledge Tuning Expert (person, runtime, static analysis) No domain knowledge Only tuning knowledge
11
11 Software and Services Group 11 SEC(R) 2008 October 23, 2008 How people think about their application What are high level operations? What are the chunks of data? What are the producer/consumer relationships? What are the inputs and outputs? Parameters Result Solve Blackscholes A data-parallel application Solves an equation independently for each parameters set
12
12 Software and Services Group 12 SEC(R) 2008 October 23, 2008 Step – a single high-level operation Item – a single data element Tag – an identifier of a step or an item Inputs/Outputs – items or tags produced or consumed by the environment Intel® Concurrent Collections Key Concepts
13
13 Software and Services Group 13 SEC(R) 2008 October 23, 2008 // Declarations ; [OptionData* Parameters: int n]; [float Result: int n]; // Step prescription :: (Solve); // Step execution [Parameters] -> (Solve) -> [Result]; // Input from the environment: // initialize all tags and data env ->, [Parameters]; // Output to the environment [Result] -> env; Textual Graph Representation
14
14 Software and Services Group 14 SEC(R) 2008 October 23, 2008 Graph definition Translator Translates a graph definition into a declaration of a class A generated class contains properly named item collections, tag collections and step collections Generates a coding hints file – a template for steps definition Checks correctness of a graph class blackscholes_graph_t : public Graph_t { public: ItemCollection_t Parameters; ItemCollection_t Result; TagCollection_t SolveTags; StepCollection_t SolveStepCollection;... };
15
15 Software and Services Group 15 SEC(R) 2008 October 23, 2008 Items identifiers Items are stored in a graph in an item collection Put stores an item, associates it with a tag Get accesses items by a tag Items are immutable Tags Steps identifiers Steps are prescribed by tags Put stores a tag, instantiates prescribed steps The same tag is passed to each instantiated step
16
16 Software and Services Group 16 SEC(R) 2008 October 23, 2008 Specifying Computation 1. StepReturnValue_t Solve( 2. Blackscholes_graph_t& graph, 3. const Tag_t& step_tag) 4. { 5. OptionData* data = 6. graph.Parameters.Get(step_tag); 7. float result = solveEquation(data); 8. graph.Result.Put(step_tag, result); 9. return CNC_Success; 10. }
17
17 Software and Services Group 17 SEC(R) 2008 October 23, 2008 Using the graph in your C++ application 1. Blackscholes_graph_t my_graph; 2. for (int i = 0; i < N; i++) { 3. my_graph.SolveTags.Put(Tag_t(i)); 4. my_graph.Parameters.Put(Tag_t(i), data[i]); 5. } 6. my_graph.run(); 7. for (int i = 0; i < N; i++) { 8. float result = my_graph.Result.Get(Tag_t(i)); 9. std::cout << result << std::endl; 10. }
18
18 Software and Services Group 18 SEC(R) 2008 October 23, 2008 Steps Rescheduling A step may begin execution before its input items are available It will be rescheduled and started again from the beginning when the corresponding item is added to the collection Image : k Image Tag : k Block : i, j Block Tag : i,j Result : i, j Split : k Process : i, j
19
19 Software and Services Group 19 SEC(R) 2008 October 23, 2008 Constraints required by the application 1. Steps have no side-effects 2. Steps call Gets before any Puts 3. Steps call Gets before allocating any memory
20
20 Software and Services Group 20 SEC(R) 2008 October 23, 2008 Benefits from using Intel® Concurrent Collections Improves programming productivity −Only serial code −No knowledge of parallel technologies required −Determinism −Race-free Portability Scalability Expert-tuning system
21
21 Software and Services Group 21 SEC(R) 2008 October 23, 2008 Summary: How to write an application using Intel® Concurrent Collections? 1. Draw the algorithm on a chalkboard 2. Define Data structures 3. Represent the algorithm in the textual notation 4. Implement high-level operations in C++ 5. Instantiate a Graph and run it
22
22 Software and Services Group 22 SEC(R) 2008 October 23, 2008 Agenda Existing parallel programming models Key concepts, Blackscholes example Performance results
23
23 Software and Services Group 23 SEC(R) 2008 October 23, 2008 Blackscholes benchmark Calculations for a single set of parameters are less than 500 CPU instructions Steps should be grouped to reduce the overhead and improve cache locality Automatic grain selection is an area for future research
24
24 Software and Services Group 24 SEC(R) 2008 October 23, 2008 Dedup benchmark Algorithm is a pipeline The last pipeline stage is serial Feature “Steps Priorities” makes Dedup run 1.4 times faster
25
25 Software and Services Group 25 SEC(R) 2008 October 23, 2008 Possible model improvements Memory management Garbage collection Automatic grain selection Streaming data input
26
26 Software and Services Group 26 SEC(R) 2008 October 23, 2008 Getting More Information Intel® Concurrent Collections for C/C++ on WhatIf.intel.com: http://software.intel.com/en-us/articles/ intel-concurrent-collections-for-cc
27
27 Software and Services Group 27 SEC(R) 2008 October 23, 2008 Questions & Answers
28
28 Software and Services Group 28 SEC(R) 2008 October 23, 2008 Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.