Implications of Emerging Hardware Tom Wenisch (University of Michigan) Nikos Hardavellas (Northwestern University) Sangyeun Cho (University of Pittsburgh)

Slides:



Advertisements
Similar presentations
Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.
Avenues for Research The Microarchitecture of Future Microprocessors.
The Case for Enterprise Ready Virtual Private Clouds Timothy Wood, Alexandre Gerber *, K.K. Ramakrishnan *, Jacobus van der Merwe *, and Prashant Shenoy.
10 REASONS Why it makes a good option for your DB IN-MEMORY DATABASES Presenter #10: Robert Vitolo.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
From Yale-45 to Yale-90: Let Us Not Bother the Programmers Guri Sohi University of Wisconsin-Madison Celebrating September 19, 2014.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Energy-efficient Cluster Computing with FAWN: Workloads and Implications Vijay Vasudevan, David Andersen, Michael Kaminsky*, Lawrence Tan, Jason Franklin,
Kevin Lim*, Jichuan Chang +, Trevor Mudge*, Parthasarathy Ranganathan +, Steven K. Reinhardt* †, Thomas F. Wenisch* June 23, 2009 Disaggregated Memory.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
DaMoN 2011 Paper Preview Organized by Stavros Harizopoulos and Qiong Luo Athens, Greece Jun 13, 2011.
The SNIA NVM Programming Model
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Research Directions for On-chip Network Microarchitectures Luca Carloni, Steve Keckler, Robert Mullins, Vijay Narayanan, Steve Reinhardt, Michael Taylor.
Power is Leading Design Constraint Direct Impacts of Power Management – IDC: Server 2% of US energy consumption and growing exponentially HPC cluster market.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
Analyzing the Energy Efficiency of a Database Server Hanskamal Patel SE 521.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis.
Abstract Load balancing in the cloud computing environment has an important impact on the performance. Good load balancing makes cloud computing more.
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
A performance analysis of multicore computer architectures Michel Schelske.
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware presented by Tianyuan Chen.
Improving Network I/O Virtualization for Cloud Computing.
Wireless Networks Breakout Session Summary September 21, 2012.
CS 395 Last Lecture Summary, Anti-summary, and Final Thoughts.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
The University of Adelaide, School of Computer Science
Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
U N I V E R S I T Y O F S O U T H F L O R I D A The basic idea is to start from a difference equation with unknown parameters and orders in the following.
GreenSched: An Energy-Aware Hadoop Workflow Scheduler
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
Investigating Survivability Strategies for Ultra-Large Scale (ULS) Systems Vanderbilt University Nashville, Tennessee Institute for Software Integrated.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
“NVM Duet: Unified Working Memory and Persistent Store Architecture”
CREATED BY – UPENDRA SHARMA
Data Center & Large-Scale Systems (updated) Luis Ceze, Bill Feiereisen, Krishna Kant, Richard Murphy, Onur Mutlu, Anand Sivasubramanian, Christos Kozyrakis.
Hierarchies, Clouds, and Specialization Phillip B. Gibbons Intel Labs Pittsburgh June 28, 2012 NSF Workshop on Research Directions in the Principles of.
1 Efficient Mixed-Platform Clouds Phillip B. Gibbons, Intel Labs Michael Kaminsky, Michael Kozuch, Padmanabhan Pillai (Intel Labs) Gregory Ganger, David.
Resource Management Model of Data Storage Systems Oriented on Cloud Computing Elena Kaina Yury Korolev.
1 Cache-Oblivious Query Processing Bingsheng He, Qiong Luo {saven, Department of Computer Science & Engineering Hong Kong University of.
Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.
© 2013 IBM Corporation 1 Title of presentation goes Elisa Martín Garijo IBM Distinguish Engineer and CTO for IBM Spain. Global Technology.
Spiros Papadimitriou Google Research Project re:Cycle Recycling CPU Cycles Stavros Harizopoulos HP Labs.
Retele de senzori Curs 1 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
Embedded Computing Lab Chia-Lin Yang Department of Computer Science and Information Engineering National Taiwan University Website:
Hathi: Durable Transactions for Memory using Flash
Adaptive Cache Partitioning on a Composite Core
File Share Dependencies
Dagstuhl Seminar on Dark Silicon: From Embedded to HPC Feb 3, 2016
Software Architecture in Practice
Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, Onur Mutlu
Digital Processing Platform
Power is Leading Design Constraint
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Stavros Harizopoulos M.I.T.
Introduction to Heterogeneous Parallel Computing
Query Processing.
CS 295: Modern Systems Organizing Storage Devices
Presentation transcript:

Implications of Emerging Hardware Tom Wenisch (University of Michigan) Nikos Hardavellas (Northwestern University) Sangyeun Cho (University of Pittsburgh) Kirk Pruhs (University of Pittsburgh) Phillip Gibbons (Intel Labs) Stavros Harizopoulos (HP Labs) Spiros Papadimitriou (Google Research) Ashwin Kumar Kayyoor (University of Maryland) Xiaorui Wang (University of Tennessee)

Emerging Memory Technologies Observation – Existing buffer pool and storage mgmt optimized for slow disk / volatile DRAM – Emerging memories change power/energy/reliability trade-offs – e.g., access latency, energy per access, access granularity, wearout, non-volatility Action: Rethink data management to exploit these devices – e.g., new index structures (access granularity) – e.g., new recovery mechanisms (non-volatility) – e.g., jointly optimize for energy & performance (energy/access) – e.g., new query processing strategies (access latency)

Data Placement Observation: “Not all memory addresses are created equal” – Memory/storage is major/growing piece of power breakdown – Devices and power modes create heterogeneity – Power management is exposed to software – However, current practice is oblivious to hardware power knobs – Where data is placed within and across nodes impacts efficiency Action: Proactively place data to optimize for energy – e.g., consider moving computation to the data – e.g., cluster data with similar locality to enable power down – e.g., consider trading compute for storage (compression)

Specialization Observation – Hardware specialization provides greatest leverage for efficiency – Hardware is moving towards specialization In the chip (dark silicon) In the system (e.g., GPUs) In the cluster (e.g., wimpy nodes) – More examples: mobile CPUs, embedded cores, GPUs, SIMD engines, vector units, reconfigurable hardware, wimpy nodes, etc. Action: Software should influence how hardware specializes – Identify important specializations (in particular for energy) Action: Software should embrace specialized hardware – Devise techniques to map/migrate/schedule tasks at correct grain

QoS Resiliency Observation – Data processing tasks have variable QoS demands with respect to latency / throughput with respect to data quality – Hardware knobs can trade QoS for energy efficiency – However, these knobs are susceptible to QoS and efficiency cliffs Action: Carefully trade QoS for energy – Need to design interfaces to express QoS objectives – Optimizers must tune HW knobs to lowest power that meets QoS – However, algorithms must provide robust QoS in the face of unexpectedly & rapidly changing demand

Energy-Constrained Data Mgmt Observation – Proliferation of high-capability energy-constrained devices (smart phones) – Local computation & communication cost energy Action: Find energy-minimal client + cloud solutions – Partition data storage & processing between “client” and “cloud”