Download presentation
Presentation is loading. Please wait.
1
An Application-Specific Design Methodology for STbus Crossbar Generation Author: Srinivasan Murali, Giovanni De Micheli Proceedings of the DATE’05,pp.1176-1181,2005 Presenter : Ching-Yuan Lin Date : 2007/1/22 Seminar book :P120
2
2 Abstract As the communication requirements of current and future Multiprocessor Systems on Chips (MPSoCs) continue to increase, scalable communication architectures are needed to support the heavy communication demands of the system. This is reflected in the recent trend that many of the standard bus products such as STbus, have now introduced the capability of designing a crossbar with multiple buses operating in parallel. The crossbar configuration should be designed to closely match the application traffic characteristics and performance requirements. In this work we address this issue of application-specific design of optimal crossbar (using STbus crossbar architecture), satisfying the performance requirements of the application and optimal binding of cores onto the crossbar resources. We present a simulation based design approach that is based on analysis of actual traffic trace of the application, considering local variations in traffic rates, temporal overlap among traffic streams and criticality of traffic streams. Our methodology is applied to several MPSoC designs and the resulting crossbar platforms are validated for performance by cycle-accurate SystemC simulation of the designs. The experimental case studies show large reduction in packet latencies (up to 7×) and large crossbar component savings (up to 3.5×) compared to traditional design approaches.
3
3 STbus Crossbar Architecture Low-latency, high bandwidth infrastructure Interface components: arbiters and frequency/data width adapters I1 I2 I3 A1 A2 A3 Bus 1 Bus 2 Bus 3 T1 T2 T3 Initiators TargetsArbitersBuses I1 I2 I3 A1 A2 A3 Bus 1 Bus 2 Bus 3 T1 T2 T3 Initiators Targets Arbiters Buses Initiator-Target crossbar Target-Initiator crossbar I1 I2 I3 A1 A2 Bus 1 Bus 2 T1 T2 T3 InitiatorsTargetsArbitersBuses Initiator-Target crossbar ※ Full crossbar architecture ※ Partial crossbar architecture
4
4 What’s the Problem Full crossbar is expensive Lot of wires and gates Partial crossbar is a compromise solution Optimum partial crossbar Latency close to full crossbar Fewer component and area How to design best partial crossbar for applications?
5
5 Application Traffic analysis Example traffic trace from 3 Targets ※ Traffic trace Merge t1&t2 T1: T2 T3: overlap Simulation period: Overlap is increase average and peak latency T1: T2: Simulation period:
6
6 Consider criticality of streams Targets with overlapping real-time stream should not share the same bus ※ Traffic trace T1: T2 T3: Simulation period: Real time constraint
7
7 Crossbar design approach Simulation time window for analysis Split to fixed sized windows In each simulation window Satisfy bandwidth requirement The total receives data of every core (place on same bus) must less or equal than window size Minimize overlaps among streams Consider criticality of streams T1: T2 T3: overlap Simulation period: Windows 1Windows 2
8
8 Design flow for partial crossbar design
9
9 Phase1 Full crossbar traffic in perfect communication Data collection hardware add to arbiters Traffic collection on each window Data rate for each core Overlap among streams Criticality of streams
10
10 Phase 2: Pre-processing Core that should be different buses Cores with large overlap (above threshold) (1) Cores with overlapping criticality streams (2) Non-satisfy bandwidth requirements (4) Maximum number of cores on bus To bound maximum latency (8) Worst case: packets to all the target onto a bus can arrive in the same cycle T1: T2: Simulation period: T3: One packet (burst)
11
11 Phase 3: Crossbar Design Start with a single bus Check for feasible solution Satisfy window bandwidth constraints (4) Place forbidden core on different buses (1)(2) Fewer than maximum number of cores on each bus(8) Repeat step2, incrementing the number of buses by 1 Optimal binding: Minimize overlap on each bus (11)
12
12 Experiment result Application benchmark Matrix suite-1 (25 cores) Matrix suite-2 (21 cores) FFT suite (29 cores) Quick sort suite (15 cores) DES encryption system (19 cores) ※ Matrix multiplication benchmark-2 (21 cores) Initiator-target full crossbar Need 12 bus Target-initiator full crossbar Need 9 bus FC bus count = 21
13
13 The average and maximum packet latencies Win : optimal partial crossbar Avg: crossbar base communication traffic flow,by relaxing overlap constraints and using a single window Latencies of crossbar (avg) are 4x to 7x higher than crossbar designed using our scheme
14
14 Effect of window size variations a) Small window size: Finer control of the performance parameters and crossbars have lower latencies Disadvantage: over-design of the network component b) the acceptable window size for various burst size
15
15 Overlap threshold setting From experiments, threshold value can be set: 30%-40% of window size for conservative design 10% of window size for conservative designs
16
16 Conclusion Presented methodology for STbus crossbar design local variations in traffic Overlap of streams Criticality of traffic streams Large saving in components, good performance Approach can be extended to other bus designs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.