Download presentation
Presentation is loading. Please wait.
Published byChad Benjamin Stafford Modified over 9 years ago
1
Evaluation of a DAG with Intel® CnC Mark Hampton Software and Services Group CnC Tutorial @ MIT July 27, 2010
2
2 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 2 Applications are frequently modeled as directed acyclic graph (DAG) evaluations Each node represents a task to be executed Edges represent the flow of data, i.e. node 11’s inputs are the results produced by nodes 3 and 8 We want to minimize the total graph evaluation time by exploiting available parallelism 1 16 15 20 14 11 8 9 12 13 18 2 3 7 10 5 4 6 17 19 21 22 23 24
3
3 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 3 We are currently using CnC to speed up the evaluation of a DAG for a customer application The DAG can be broken down into two main components: −The first component is essentially a linear chain of heavyweight nodes, each of which contains a for-loop that can be parallelized −The second component contains a large number of very lightweight nodes (i.e. sub-microsecond runtime) with many dependences between them The main focus of our work thus far has been how to improve performance for the second component −The problem with the second component is that the short per-node evaluation time magnifies the overhead of the CnC runtime system −The DAG is evaluated repeatedly over a long period of time (e.g. days or weeks), which permits some heavyweight analyses, although we have been working to improve performance without any extra analysis Throughout the various performance experiments, the application- level CnC graph has been essentially unchanged, as only the semantic ordering constraints need to be specified
4
4 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4 The DAG application structure has a straightforward mapping to a CnC graph (static graph is shown) Available dataReady item from node predecessors indicates dependence has been satisfied Environment initializes tags for nodes to be evaluated (currently tags are pointers to the actual nodes) Step invokes node evaluation Once node has evaluated, it puts a dataReady item to indicate to successors that its real data is available The full topology for the dynamic instance graph (with each node mapped to a single tag/step/item instance) is known before CnC begins evaluation We use pointers to nodes and pass around dataReady flags to indicate node completion, since we don’t construct the actual node objects or handle memory transfers; the only purpose of CnC is to act as a scheduling engine (so it’s minimally intrusive) Each node maintains its own copy of input data; even though “real” item collections aren’t used, overwriting is avoided [dataReady] (nodeEvaluate)
5
5 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 5 Here are recent performance results from some experimental optimizations Results are shown for a 12-core Westmere system with hyper-threading disabled This workload is very lightweight, with many of the nodes evaluating in less than a microsecond The above results rely on optimizations that we have been experimenting with to reduce the overhead of the CnC runtime system We expect greater speedups from more heavyweight workloads; we’re also continuing to pursue other performance optimizations such as graph partitioning
6
6 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 6 We are exploring how to generalize this work beyond the specific application There may frequently be cases where programmers want to use CnC for scheduling task execution without transferring all data through item collections Our goal is to provide the programmer with a robust set of performance hooks that can guide scheduling Additionally, the problem of runtime system overhead with very lightweight steps has come up in multiple applications, so we continue to investigate ways to reduce overhead
7
7 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 7 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products.www.intel.com/software/products Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2010. Intel Corporation. 7 http://intel.com/software/products
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.