Exploiting Detachability Hashem H. Najaf-abadi Eric Rotenberg
Different jobs, Different tools Different applications have different characteristics and therefore different resource needs. Therefore a single fixed architecture compromises the performance of the individual, for the performance of all.
Architectural changeability (Transformation) In silicon-based technology; performance of a changeable (polymorphic) design in a fixed configuration is less than a non-changeable implementation of the same configuration.
Changeability in subcomponent Changeability at any level of the design hierarchy Changeability in interconnect Sub-component Interconnect Interacting subcomponents may need to change too
Changeability at logic-circuit level In an adder for instance; F.A. carry Or the bypasses; F.U.
Changeability at the pipeline level In the execution for instance; fetch decode dispatch issue execute write-back execute
Changeability at the processor level L2 cache Core A Core B Core C At least there’s no higher level for changeability to spread to.
Heterogeneity Pros: No low-level changeability Cons: Poor scalability (die area is consumed, burdening access to system resources) Inflexible (once configurations are placed in the system, they are permanent, while their need is user dependent)
Spread Heterogeneity to numerous chips Pros: Increases the overall die area, thus ameliorating the unscalability Cons: Exacerbates the burdening of access to system resources Remains inflexible in the forms of architectural diversity that are made available
Exploiting Detachability Detachability: a property that already exists (due to marketing and packaging issues). Pros: No suboptimality due to limited die are or burdening of access to system resources. Flexible in the forms of architectural diversity
Exploiting Detachability Other advantages: A substrate for gradual employment of alternate technologies (which tend to be application dependent) A paradigm where architects can focus on innovations for enhancing architectures for specific applications, rather than tweaking the same old design.
Changeability in real world applications Rough automatic design-space exploration for the integer SPEC2000 benchmarks Randomly varied the L1 and L2 cache sizes, the processor width, issue queue size, and clock period.
Customization results bzipgapgccgzipmcfparserperltwolfvortexvprcrafty No. mem. access cycles No. front-end cycles Processor width Issue queue size B-to-Back lat. of dep. inst Clock period No. L1 access cycles No. L1-cache lines L1-cache line-size L1-cache associativity No. L2 access cycles No. L2-cache lines L2-cache line-size L2-cache associativity
On each other’s bzipgapgccgzipmcfparserperltwolfvortexvprcrafty bzip * gap * gcc gzip * mcf parser perl * twolf * vortex vpr * crafty * average Rows indicate benchmarks, and columns indicate the their customized architectures
Representative architectures Assigning surrogates: gcc gzip parser perl twolf vortex vpr crafty gap 7 6 bzip 5 mcf
Customization results gccmcfparservortexcrafty No. mem. access cycles No. front-end cycles Processor width35526 Issue queue size64 B-to-Back lat. of dep. inst Clock period No. L1 access cycles No. L1-cache lines L1-cache line-size L1-cache associativity11128 No. L2 access cycles No. L2-cache lines L2-cache line-size L2-cache associativity1641 8