Download presentation
Presentation is loading. Please wait.
Published byParker Yarnall Modified over 9 years ago
1
Optimizations for a Simulator Construction System Supporting Reusable Components David A. Penry and David I. August The Liberty Architecture Research Group Princeton University
2
2 Architectural Simulator Architectural Exploration Architectural options are studied using simulators More iterations = better decisions Need fast path to simulator Need fast simulator Architecture Options
3
3 Simulator Construction Systems Reuse simulator infrastructure Architectural Simulator Instance Architecture Description Simulator Builder But still must be able to reuse descriptions Structural composition Medium-grained components Standard communication contracts High parameterizability Separation of concerns
4
4 The Reuse Penalty Reusability leads to a speed penalty: more component instances more signals more general code Therefore: reusable systems are often slower How can we mitigate the reuse penalty?
5
5 Liberty Simulation Environment Simulator construction system for high reuse Two-tiered specifications Leaf module templates in C Netlisting language for instantiation and customization Three-signal standard communications contract with overrides (control functions) Code is generated Enable Data Ack
6
6 Contrast: SystemC Simulator construction libraries (C++) Partially supports reuse: + Structural composition + Module granularity varies ? Communications contracts by convention - Low parameterizability - Separation of concerns Description is a C++ program
7
7 AC D B AC D B AC D B AC D B AC D B AC D B AC D B Models of Computation System C uses Discrete Event (DE) LSE uses Heterogenous Synchronous Reactive (HSR) Edwards (1997) Unparsed code blocks (black boxes) Values begin unresolved and resolve monotonically Chaotic scheduling
8
8 Potential HSR Benefits vs. DE Static schedules possible Lower per-signal overhead Use of unresolved value to avoid redundant computation AC D B
9
9 Three models of a 4-way out-of-order microprocessor SystemC using custom speed-optimized components LSE model using custom speed-optimized components LSE model using standard reusable components 9 benchmarks (CPU 2000/MediaBench) See paper for compiler, etc. Experimental methodology 481383Custom LSE 42348911Reusable LSE 32714 Custom SystemC Non-edge signals Signals Instances Model
10
10 Custom LSE vs. SystemC Custom LSE outperforms custom SystemC Reduction in overhead Use of unresolved signal value Static instantiation and code specialization Dynamic schedule for both ModelCycles/secSpeedup Custom SystemC 53722- Custom LSE 1551112.88
11
11 Reuse Penalty Reusable model suffers large reuse penalty (0.26) Many more signals Many more non-edge signals More components All dynamic schedules ModelCycles/secSpeedup Custom SystemC 53722- Custom LSE 1551112.88 Reusable LSE 406490.76
12
12 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce AC D B
13
13 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce AC D B 12 3 4 1 2 4 3
14
14 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce 1 2 4 3 a b c Schedule: a b c AC D B 12 3 4
15
15 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce 1 2 4 3 a b c Schedule: 1 b 4 H T AC D B 12 3 4
16
16 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce 1 2 4 3 a b c Schedule: 1 2 3 2 4 H T AC D B 12 3 4
17
17 Creating Static Schedules Edward’s algorithm (1997) Construct a signal dependency graph Break into strongly-connected components (SCC). Schedule in topological order Partition each SCC into a head and tail Schedule tail recursively, then repeat head (any order) and tail’s schedule Coalesce 1 2 4 3 A B C H T Choosing an optimal partition is exponential AC D B 12 3 4 Schedule: 1 2 3 2 4 A B C B (D)
18
18 Dynamic sub-schedule embedding SCCs arise due to incomplete information “Optimal” schedules are optimal w.r.t. information “Optimal” schedule may be worse than dynamic A BC When an SCC is “too big”, just schedule that section dynamically
19
19 Dependency information enchancement In practice, we see big SCCs Peek in the black box Simple parsing of communication overrides (control functions) Can ask user to tell about internal dependencies Not too painful because it is reused A BC
20
20 Evaluation of Information Enhancement Control function parsing more useful alone Not principally through scheduling It is important to have both kinds of enhancement OptimizationCycles/secSpeedup No static scheduling 40649- With control function parsing 478501.18 With internal dependencies 413061.02 With both 570461.40
21
21 Reuse Penalty Revisited Reuse penalty mitigated in part ModelCycles/secSpeedup Build time (s) Custom SystemC 53722-49.1 Custom LSE 1551112.8815.4 Reusable LSE w/o optimization 406490.7633.9 Reusable LSE with optimization 570461.0634.4 Reusable LSE model 6% faster than custom SystemC
22
22 Conclusions A tradeoff exists between speed and reuse The simulator construction system can help Higher base speed makes reuse penalty less painful Optimizations are possible with HSR model Ability of scheduler adapt to information available is powerful This adaptation is not possible with DE You can have high reuse at reasonable speeds
23
23 Future Work Release of LSE Fall 2003 http://liberty.princeton.edu Hybrid model of computation Embed HSR in DE, DE in HSR Automatic extraction of HSR portions from DE
24
24 Other optimizations Improved block coalescing See paper Code specialization Implementation of APIs depends upon environment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.