Hierarchies, Clouds, and Specialization Phillip B. Gibbons Intel Labs Pittsburgh June 28, 2012 NSF Workshop on Research Directions in the Principles of.

Slides:

Advertisements

Similar presentations

Multi-core Computing Lecture 3 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh.

Advertisements

1 Copyright © 2012 Oracle and/or its affiliates. All rights reserved. Convergence of HPC, Databases, and Analytics Tirthankar Lahiri Senior Director, Oracle.

Distributed Data Processing

Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.

KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.

Hadi Goudarzi and Massoud Pedram

Distributed Systems CS

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.

Chuang, Sang, Yoo, Gu, Killian and Kulkarni, “EventWave” SoCC ‘13 EventWave: Programming Model and Runtime Support for Tightly-Coupled Elastic Cloud Applications.

10 REASONS Why it makes a good option for your DB IN-MEMORY DATABASES Presenter #10: Robert Vitolo.

Performance Anomalies Within The Cloud 1 This slide includes content from slides by Venkatanathan Varadarajan and Benjamin Farley.

Flash: An efficient and portable Web server Authors: Vivek S. Pai, Peter Druschel, Willy Zwaenepoel Presented at the Usenix Technical Conference, June.

Trumping the Multicore Memory Hierarchy with Hi-Spade Phillip B. Gibbons Intel Labs Pittsburgh April 30, 2010 Keynote talk at 10 th SIAM International.

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

Hierarchy-conscious Data Structures for String Analysis Carlo Fantozzi PhD Student (XVI ciclo) Bioinformatics Course - June 25, 2002.

COMMA: Coordinating the Migration of Multi-tier applications 1 Jie Zheng* T.S Eugene Ng* Kunwadee Sripanidkulchai† Zhaolei Liu* *Rice University, USA †NECTEC,

Application Models for utility computing Ulrich (Uli) Homann Chief Architect Microsoft Enterprise Services.

Analysis of Web Caching Architectures: Hierarchical and Distributed Caching Pablo Rodriguez, Christian Spanner, and Ernst W. Biersack IEEE/ACM TRANSACTIONS.

Theory: Asleep at the Switch to Many-Core Phillip B. Gibbons Intel Research Pittsburgh Workshop on Theory and Many-Core May 29, 2009 Slides are © Phillip.

Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 1: The new mainframe.

1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )

By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.

© 2011 IBM Corporation 11 April 2011 IDS Architecture.

1 CSE544 Database Architecture Tuesday, February 1 st, 2011 Slides courtesy of Magda Balazinska.

Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.

ET E.T. International, Inc. X-Stack: Programming Challenges, Runtime Systems, and Tools Brandywine Team May2013.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.

Introduction. Outline What is database tuning What is changing The trends that impact database systems and their applications What is NOT changing The.

Network Aware Resource Allocation in Distributed Clouds.

Cloud Computing Energy efficient cloud computing Keke Chen.

Storage Management in Virtualized Cloud Environments Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu Student Workshop on Frontiers of Cloud Computing,

Improving Network I/O Virtualization for Cloud Computing.

Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.

Multi-Core Architectures

المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.

COMMA: Coordinating the Migration of Multi-tier Applications Jie Zheng, T. S. Eugene Ng, Zhaolei Liu Rice University Kunwadee Sripanidkulchai NECTEC, Thailand.

Never Down? A strategy for Sakai high availability Rob Lowden Director, System Infrastructure 12 June 2007.

A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Enabling Large-Scale Storage in Sensor Networks with the Coffee File System ISPN 2009 Lawrence.

Multi-core Computing Lecture 2 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh.

U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Performance of Work Stealing in Multiprogrammed Environments Matthew Hertz Department.

1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.

Reporter ： Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.

Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.

The Fresh Breeze Memory Model Status: Linear Algebra and Plans Guang R. Gao Jack Dennis MIT CSAIL University of Delaware Funded in part by NSF HECURA Grant.

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.

Trumping the Multicore Memory Hierarchy with Hi-Spade Phillip B. Gibbons Intel Labs Pittsburgh September 22, 2011.

Virtual Machines Created within the Virtualization layer, such as a hypervisor Shares the physical computer's CPU, hard disk, memory, and network interfaces.

Cache Memory Chapter 17 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.

Martin Kruliš by Martin Kruliš (v1.1)1.

REMINDER Check in on the COLLABORATE mobile app Best Practices for Oracle on VMware - Deep Dive Darryl Smith Chief Database Architect Distinguished Engineer.

Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

Tools and Libraries for Manycore Computing Kathy Yelick U.C. Berkeley and LBNL.

LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

The Post Windows Operating System

Multi-core Computing Lecture 2 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh.

Resource Aware Scheduler – Initial Results

Task Scheduling for Multicore CPUs and NUMA Systems

Architecture Background

(A Research Proposal for Optimizing DBMS on CMP)

Towards Predictable Datacenter Networks

Presentation transcript:

Hierarchies, Clouds, and Specialization Phillip B. Gibbons Intel Labs Pittsburgh June 28, 2012 NSF Workshop on Research Directions in the Principles of Parallel Computation Hi-Spade

2 up to 1 TB Main Memory 4…4… 24MB Shared L3 Cache 2 HW threads 32KB 256KB 2 HW threads 32KB 256KB 8…8… socket Research Direction #1: Hierarchies 24MB Shared L3 Cache 2 HW threads 32KB 256KB 2 HW threads 32KB 256KB 8…8… socket Attach: Magnetic Disks & Flash Devices Xeon 7500 Series (Nehalem)

3 Hierarchy Trends Good performance [energy] increasingly requires effective use of hierarchy Hierarchy getting richer –More levels of cache –More cores –New memory/storage technologies Flash/SSDs, emerging PCM Bridge gaps in hierarchies – can’t just look at last level of hierarchy

4 How to Model the Hierarchy? … … … … … … … … … … … … … … … … … … … … … … … … … … General Abstraction: Tree of Caches Specific Example: Xeon 7500 PMH model [ACF93]

5 How to Design Algorithms? Design to Tree-of-Caches abstraction: Multi-BSP Model [L.G. Valiant, ESA’08] –4 parameters/level: cache size, fanout, latency/sync cost, transfer bandwidth –Bulk-Synchronous Our Goal: ~ Simplicity of Cache-Oblivious Model –Handles dynamic, irregular parallelism –Co-design with smart thread schedulers … … … … … … … … … … … Fundamentals: Spatial locality, Temporal locality, Constructive sharing, Minimize communication

6 What is Algorithm Designer’s Model? Memory M,B P Assuming this task fits in cache All three subtasks start with same state Merge state and carry forward Carry forward cache state according to some sequential order Differs from CO model in how cache state is carried forward Parallel Cache-Oblivious Model If task does not fit, subtasks start with empty cache

7 What is the Right Scheduler? Following [CSBR10], we study “Space-Bounded Schedulers” –Pin tasks to cache where fits –Allocate work to cores sharing that cache Can prove good cache bounds, but overheads in current implementation too high: –Faster to use Work-Stealing Scheduler What algorithm model + thread scheduler can rule the world?

8 Research Direction #2: Clouds Trend: Most computing is moving to the Cloud Trend: Large-scale Parallel Computing done using MapReduce/Hadoop or follow-ons –Bulk-synchrony is more popular than ever Trend: Computations span client to cloud –Moving towards tiered clouds Trend: Multi-tenancy using virtual machines –Impact on alg. design / schedulers?

9 Dealing with Multi-tenancy & VMs? Algs must be suited for dynamic resources – Avoid HPC-think: don’t know network topology Schedulers must provide throughput + fairness –Failed steal attempts not useful work –Yielding at failed steal attempts leads to unfairness –BWS [Eurosys’12] decreases average unfairness from 124% to 20% and increases thruput by 12% How is VM an obstacle to high level performance abstractions? –What hooks are needed to remove the obstacles? –Algs/PL community has vision of goal—can inform

10 Research Direction #3: Specialization Specialization is key to cloud efficiency Data center with mix of server architectures Heterogeneity within each processor Algorithm designers should study specialization For algorithms to be run in the cloud For algs that support cloud infrastructure

11 BACK-UP SLIDES: REFERENCES

12 © Phillip B. Gibbons References - I Smart thread schedulers can enable simple, hierarchy-savvy abstractions PDF scheduler for shared caches [SPAA’04] Scheduling for constructive sharing [SPAA’07] Controlled-PDF scheduler [SODA’08] Work stealing overheads beyond fork-join [SPAA’09] Hierarchy-savvy parallel algorithms [SPAA’10] Parallel cache-oblivious model & scheduler [SPAA’11] Tools, Hooks, Determinism simplify programming Memory-block transactions [SPAA’08] Semantic space profiling /visualization [ICFP’08, JFP2010] Efficient internal determinism [PPoPP’12]

13 © Phillip B. Gibbons Flash/PCM-savvy (database) systems maximize benefits of Flash/PCM Flash-savvy algorithms [VLDB’08, PVLDB 2010] Flash-based OLTP transactional logging [SIGMOD’09] Non-blocking joins for Data Warehouses [SIGMOD’10] Efficient online updates for Data Warehouses [SIGMOD’11] PCM-savvy database algorithms [CIDR’11] Cloud Computing / Specialization Scheduling for multi-tenancy [EuroSys’12] Cloud ISTC whitepaper [ cc.cmu.edu/publications/papers/2011/ISTC-Cloud-Whitepaper.pdf] cc.cmu.edu/publications/papers/2011/ISTC-Cloud-Whitepaper.pdf References - II See for authors & paper titles of each referencewww.pittsburgh.intel-research.net/people/gibbons/gibbons-cv.pdf