Gagandeep Singh, Juan Gomez-Luna, Giovanni Mariani, Geraldo F

Slides:



Advertisements
Similar presentations
Lazy Paired Hyper-Parameter Tuning
Advertisements

A Preliminary Attempt ECEn 670 Semester Project Wei Dang Jacob Frogget Poisson Processes and Maximum Likelihood Estimator for Cache Replacement.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
Fast Algorithms For Hierarchical Range Histogram Constructions
Data Mining Classification: Alternative Techniques
Experimental Design, Response Surface Analysis, and Optimization
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
CISC673 – Optimizing Compilers1/34 Presented by: Sameer Kulkarni Dept of Computer & Information Sciences University of Delaware Phase Ordering.
Efficient Model Selection for Support Vector Machines
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.
Chapter 11Design & Analysis of Experiments 8E 2012 Montgomery 1.
Predictive Design Space Exploration Using Genetically Programmed Response Surfaces Henry Cook Department of Electrical Engineering and Computer Science.
11 Experimental and Analytical Evaluation of Available Bandwidth Estimation Tools Cesar D. Guerrero and Miguel A. Labrador Department of Computer Science.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
The American University in Cairo Interdisciplinary Engineering Program ENGR 592: Probability & Statistics 2 k Factorial & Central Composite Designs Presented.
A Hyper-heuristic for scheduling independent jobs in Computational Grids Author: Juan Antonio Gonzalez Sanchez Coauthors: Maria Serna and Fatos Xhafa.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
Reservoir Uncertainty Assessment Using Machine Learning Techniques Authors: Jincong He Department of Energy Resources Engineering AbstractIntroduction.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Paul Beraud, Alen Cruz, Suzanne Hassell, Juan Sandoval, Jeffrey J Wiley November 15 th, 2010 CRW’ : NETWORK MANEUVER COMMANDER – Resilient Cyber.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Department of Computer, Control, and Management Engineering Providing Transaction Class-Based QoS in in-Memory Data Grids Via Machine Learning Pierangelo.
Hang Zhang1, Xuhao Chen1, Nong Xiao1,2, Fang Liu1
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
Chapter 14 Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012  John Wiley & Sons, Inc.
CSE 4705 Artificial Intelligence
Basic machine learning background with Python scikit-learn
Machine Learning Basics
Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric.
A Review of Processor Design Flow
Tosiron Adegbija and Ann Gordon-Ross+
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Machine Learning Techniques for Data Mining
Stefano Grassi WindEurope Summit
A High Performance SoC: PkunityTM
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Lecture 2 Part 3 CPU Scheduling
Boltzmann Machine (BM) (§6.4)
Ensemble learning Reminder - Bagging of Trees Random Forest
Predicting Unroll Factors Using Supervised Classification
On applying pattern recognition to systems management
6- General Purpose GPU Programming
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
Phase based adaptive Branch predictor: Seeing the forest for the trees
Funded by the Horizon 2020 Framework Programme of the European Union
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning Gagandeep Singh, Juan Gomez-Luna, Giovanni Mariani, Geraldo F. Oliveira, Stefano Corda, Sander Stuijk, Onur Mutlu, Henk Corporaal Funded by the Horizon 2020 Framework Programme of the European Union MSCA-ITN-EID   Motivation Exorbitant amount of data The high cost of energy for data movement A paradigm shift towards processing close to the data i.e., near-memory computing (NMC) However in early design-stage, simulation are extremely slow, imposing long run-time NAPEL: Performance Prediction via Ensemble Machine Learning Fast and accurate performance and energy prediction for a previously-unseen application Microarchitecture-independent characterization with architectural simulation responses to train an ensemble algorithm Intelligent statistical techniques to extract meaningful data with minimum experimental runs Phase 1: LLVM Kernel Analysis Microarchitecture-independent kernel analysis to generate an application profile independent of the NMC architecture Phase 2: Central Composite Design Design of experiment techniques 1 are used to reduce the number of experiments to train NAPEL Central composite design (CCD) is applied to minimize the uncertainty of a nonlinear polynomial model that accounts for parameter interactions In CCD, each input parameter can have five levels: min, low, central, high, maximum Phase 3: Ensemble Machine Learning We employ random forest (RF) as our ML algorithm, which embeds procedures to screen input features With hyper-parameters tuning to optimize the accuracy of ML algorithm NAPEL Prediction Cross-platform prediction of a completely unseen application by only using micro-architectural independent application features 220x faster, on average, than our NMC simulator (min. 33x, max. 1039x) NMC Architecture Evaluation MRE of 8.5% and 11.6% for performance and energy prediction NAPEL is 1.7x (1.4x) and 3.2x (3.5x) better in terms of performance (energy) estimation than ANN and decision tree NMC Suitability Analysis NAPEL provides an accurate prediction of NMC suitability MRE between 1.3% to 26.3% (average 14.1) for EDP prediction Workloads with EDP<1, are not suitable for NMC and can leverage the host cache hierarchy References 1D. C. Montgomery, Design and anlysis of experiments, (2017) 2https://github.com/CMU-SAFARI/ramulator-pim/ 256 DoE configurations for 12 evaluated applications 256 1 IBM POWER9 Ramulator-PIM2 (a) Performance prediction (b) Energy prediction Application Feature Description Instruction Mix The fraction of instruction types (integer, floating point, memory, etc.) ILP Instruction-level parallelism on an ideal machine Data/Instruction reuse distance For a given distance δ, probability of reusing one data element/instruction (in a certain memory location) before accessing δ other unique data elements/instructions (in different memory locations) Memory traffic Percentage of memory reads/writes that need to access the main memory, assuming a cache of size equal to the maximum reuse distance Register traffic An average number of registers per instruction Memory footprint Total memory size used by the application y x Max High Central Low Min Min Low Central High Max