Project Proposal Presented by Michael Kazecki. Outline Background –Algorithms Goals Ideas Proposal –Introduction –Motivation –Implementation.

Slides:

Advertisements

Similar presentations

Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC

Advertisements

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.

A Framework for Dynamic Energy Efficiency and Temperature Management (DEETM) Michael Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas University of Illinois.

Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.

Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.

Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Increasing the Cache Efficiency by Eliminating Noise Philip A. Marshall.

1 Enabling autonomic behavior in systems software with hot swapping By J. Appavoo, K. Hui, C. A. N. Soules, R. W. Wisniewski, D. M. Da Silva, O. Krieger,

A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang, Vahid F., Lysecky R. Proceedings of Design, Automation and Test in Europe Conference.

Student – Nathan Beckmann Advisor – Glenn Reinman

Virtual Memory:Part 2 Kashyap Sheth Kishore Putta Bijal Shah Kshama Desai.

An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.

Automatic Generation of Parallel OpenGL Programs Robert Hero CMPS 203 December 2, 2004.

Processor Frequency Setting for Energy Minimization of Streaming Multimedia Application by A. Acquaviva, L. Benini, and B. Riccò, in Proc. 9th Internation.

ECE 510 Brendan Crowley Paper Review October 31, 2006.

CPACT – The Conditional Parameter Adjustment Cache Tuner for Dual-Core Architectures + Also Affiliated with NSF Center for High- Performance Reconfigurable.

Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.

Evaluating Impact of Storage on Smartphone Energy Efficiency David T. Nguyen.

Analysis of Branch Predictors

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.

Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.

Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.

2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, Yi Zou Computer.

ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

Thread-Level Speculation Karan Singh CS

Robin McDougall Scott Nokleby Mechatronic and Robotic Systems Laboratory 1.

Kyushu University Koji Inoue ICECS'061 Supporting A Dynamic Program Signature: An Intrusion Detection Framework for Microprocessors Koji Inoue Department.

Predicting Coherence Communication by Tracking Synchronization Points at Run Time Socrates Demetriades and Sangyeun Cho 45 th International Symposium in.

Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill Proceedings. The 25th Annual International Symposium.

Computer Science Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs Min Yeol Lim Computer Science Department Sep.

Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,

Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

Application Heartbeats Henry Hoffmann, Jonathan Eastep, Marco Santambrogio, Jason Miller, Anant Agarwal CSAIL Massachusetts Institute of Technology Cambridge,

Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc.

Min Lee, Vishal Gupta, Karsten Schwan

Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.

1 SoC CAD Dynamic Power-Performance Adaptation of Parallel Computation on Chip Multiprocessors 陳品杰 Department of Electrical Engineering National Cheng.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.

Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction University of California MICRO ’03 Presented by Jinho Seol.

BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.

Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.

Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.

ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009.

Evaluating and Optimizing IP Lookup on Many Core Processors Author: Peng He, Hongtao Guan, Gaogang Xie and Kav´e Salamatian Publisher: International Conference.

Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:

On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.

Re-configurable Bus Encoding Scheme for Reducing Power Consumption of the Cross Coupling Capacitance for Deep Sub-micron Instructions Bus Siu-Kei Wong.

The CRISP Performance Model for Dynamic Voltage and Frequency Scaling in a GPGPU Rajib Nath, Dean Tullsen 1 Micro 2015.

1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth.

Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker ： Chun-Chung Chen Single-ISA.

An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos.

Agile Paging: Exceeding the Best of Nested and Shadow Paging

Oracle Tuning Practice

Genomic Data Clustering on FPGAs for Compression

Characterization and Evaluation of Hardware Loop Unrolling

Fine-Grain CAM-Tag Cache Resizing Using Miss Tags

EECE.4810/EECE.5730 Operating Systems

Tosiron Adegbija and Ann Gordon-Ross+

CARP: Compression-Aware Replacement Policies

Sampoorani, Sivakumar and Joshua

Hardware Counter Driven On-the-Fly Request Signatures

ContinuStreaming: Achieving High Playback Continuity of Gossip-based Peer-to-Peer Streaming IPDPS 2008 LI Zhenhua Dept. Computer, Nanjing University.

rePLay: A Hardware Framework for Dynamic Optimization

Phase based adaptive Branch predictor: Seeing the forest for the trees

2019/10/19 Efficient Software Packet Processing on Heterogeneous and Asymmetric Hardware Architectures Author: Eva Papadogiannaki, Lazaros Koromilas, Giorgos.

Presentation transcript:

Project Proposal Presented by Michael Kazecki

Outline Background –Algorithms Goals Ideas Proposal –Introduction –Motivation –Implementation

Background

Hill-Climbing Mid-point No. of Processors (P) Target miss Reduce DVFS Choose optimum DVFS for that P just above miss Yes Continue until miss Continue for another P` Better than previous Yes No Disregard and switch sides Cost: L lg N No Optimum DVFS operating point

Hill-Climbing Optimization Mid-point No. of Processors (P) Apply DVFS formula for that P Continue for another P` Optimum DVFS operating point Better than previous Yes Disregard and switch sides No

Final Algorithm Hill-climbing + DVFS optimization Cost: alpha lg N alpha grows slower than L

Goal  Dynamically optimizing power consumption of a parallel application  Low-overhead algorithms for dynamic optimization that produces optimal power savings.

Ideas  Improve on existing algorithm developed to overcome local optima situations, such as the use of “jitter” or momentum.  Expand research in another dimension by varying each individual processor core DVFS  Propose a prediction based algorithm using past results to calculate DVFS in real-time

Proposal Introduction  Develop a history based algorithm that is optimized for highly variable applications  Build a history table to track patterns in application’s behavior  Use a lookup registry to match current pattern to a suitable DVFS level

Proposal Motivation  Target run-time power-performance adaptation of future processors that run non- repetitive parallel applications.  Maximizing power savings while delivering a specified level of performance in real-time with low overhead.

Proposal Implementation  Hardware support  Requires cache and a register to support the history table and recent lookup  Requires real-time hardware power/performance monitoring  Software support  After search phase, mechanism needed to control DVFS and number of processors

End Questions?