CISC Machine Learning for Solving Systems Problems Arch Explorer Lecture 5 John Cavazos Dept of Computer & Information Sciences University of Delaware
CISC Machine Learning for Solving Systems Problems Motivation 2 [MICRO 2004, Gracia-Pérez et al.] Need for systematic quantitative comparison Need for systematic quantitative comparison
CISC Machine Learning for Solving Systems Problems Computer Arch Research 3
CISC Machine Learning for Solving Systems Problems Design space exploration 4 Time-to-market area power execution time Multi-objectives Need more than intuition and experience!
CISC Machine Learning for Solving Systems Problems ArchExplorer 5 archexplorer.org database simulation cluster upload daily update pick design points add results test Server-side InfrastructureWebsite FULLY AUTOMATIC
CISC Machine Learning for Solving Systems Problems How to compare? 1. Custom simulator 2. Hardware compatibility 3. Software compatibility 4. Upload 6 Wrapped Simulator & Parameter ranges Custom Simulator DL1 CPU D SM EXWBCM M F $TLB $ $ $ MEM DSEXWBCM M F SM IL1BP L2 S MEM
CISC Machine Learning for Solving Systems Problems Hardware compatibility 7 Instruction caches Data caches Branch predictors Interconnects Main memory Accelerators...
CISC Machine Learning for Solving Systems Problems Software compatibility 8 Isolate the hardware block, possibly by from centralized control to distributed control
CISC Machine Learning for Solving Systems Problems Software compatibility 9 Self-Configuration and parameters legality Models of computation Wrapping in SystemC-based on UNISIM communication layer
CISC Machine Learning for Solving Systems Problems Case study Memory sub-system for embedded processor PowerPC405 8 different cache modules available Complex hierarchies automatically explored Ranking designs for performance, power, energy, area,... Victim Cache Timekeeping Victim cache Stride Prefetcher Content-Directed Prefetcher Stride + Content Directed Prefetcher Tag Prefetcher Global History Prefetcher Skewed associtiative cache 10
CISC Machine Learning for Solving Systems Problems Accurate comparison needs compiler tuning as well P1 P2 < P1 P2 > baseline Tuned to P1, tuned to P2
CISC Machine Learning for Solving Systems Problems Best data cache mechanisms per area 12 CONCLUSIONS: 1.Contrast to Gracia-Pérez et al. [MICRO 2004] 2.No clear winner 3.Close to tuned parametric cache
CISC Machine Learning for Solving Systems Problems Best data cache mechanisms per area 13 CONCLUSIONS: 1.Contrast to Gracia-Pérez et al. [MICRO 2004] 2.No clear winner 3.Close to tuned parametric cache
CISC Machine Learning for Solving Systems Problems Composing cache hierarchies 14
CISC Machine Learning for Solving Systems Problems Speedup and Energy Improvement 15
CISC Machine Learning for Solving Systems Problems ARCHEXPLORER.ORG Check out this website: 16
CISC Machine Learning for Solving Systems Problems 17
CISC Machine Learning for Solving Systems Problems Conclusion Permanent open competition(s) Future: superscalar processor branch predictor repository multi-cores Open for your ideas! NoC, compiler extensions,... 18
CISC Machine Learning for Solving Systems Problems ARCHEXPLORER.ORG Check out this website: 19
CISC Machine Learning for Solving Systems Problems Genetic Search Algorithm Convergence Permanently ranks all designs per area bucket speedup or power assigning higher probability to better points Picking a point according to distribution Mutations & crossover Natural selection 20 Veerle Desmet – Sylvain Girbal – Olivier Temam 6th HiPEAC Industrial Workshop – Thales Nov 26th, 2008 Statistical Exploration $ BP CPU $ $ MEM
CISC Machine Learning for Solving Systems Problems Standardized Interfaces Module Repository Features for Systematic DSE Module parameter tuningModule exploration Compiler Exploration Design Space Exploration Compatibility Database Parameter Check Parameter Introspection Compatibility database Compiler Flag Database benchmarks datasets PPCARM WB$VC$SP$ NB WB$ TVC$ CDP$ CD PSP$ TagP$GHB$ BUS DRAM Module category Module interfaces Known models Probing neighbors parameters Configuration validity Ranges Params. relationship DRAM nBanks {2;4;8} tRAS+tCD<tRCD focused search algorithm configs Selection probability Fast convergence configs Predictive modeling compiler flags Machine description