The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T. Clements, M. Frans Kaashoek, Nickolai Zeldovich, Robert.

Slides:

Advertisements

Similar presentations

Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems Exokernel: An Operating System Architecture for Application-Level Resource Management.

Advertisements

Improving Integer Security for Systems with KINT Xi Wang, Haogang Chen, Zhihao Jia, Nickolai Zeldovich, Frans Kaashoek MIT CSAIL Tsinghua IIIS.

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.

Raphael Eidenbenz Roger Wattenhofer Roger Wattenhofer Good Programming in Transactional Memory Game Theory Meets Multicore Architecture.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Phase Reconciliation for Contended In-Memory Transactions Neha Narula, Cody Cutler, Eddie Kohler, Robert Morris MIT CSAIL and Harvard 1.

Effects of Virtual Cache Aliasing on the Performance of the NetBSD Operating System Rafal Boni CS 535 Project Presentation.

Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.

Performance Evaluation of Open Virtual Routers M.Siraj Rathore

Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan

Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.

Components for high performance grid programming in the GRID.it project 1 Workshop on Component Models and Systems for Grid Applications - St.Malo 26 june.

Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.

CS533 Concepts of Operating Systems Class 14 Virtualization.

Phase Reconciliation for Contended In-Memory Transactions Neha Narula, Cody Cutler, Eddie Kohler, Robert Morris MIT CSAIL and Harvard.

6/10/2005 FastOS PI Meeting/Workshop K42 Internals Dilma da Silva for K42 group IBM TJ Watson Research.

Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association.

Optimizing RAM-latency Dominated Applications

To run the program: To run the program: You need the OS: You need the OS:

IETF 90: VNF PERFORMANCE BENCHMARKING METHODOLOGY Contributors: Sarah Muhammad Durrani: Mike Chen:

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

The Impact of Performance Asymmetry in Multicore Architectures Saisanthosh Ravi Michael Konrad Balakrishnan Rajwar Upton Lai UW-Madison and, Intel Corp.

Wind River VxWorks Presentation

KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.

Computer System Architectures Computer System Software

LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania

Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.

A Unified, Low-overhead Framework to Support Continuous Profiling and Optimization Xubin (Ben) He Storage Technology & Architecture Research(STAR)

Silo: Speedy Transactions in Multicore In-Memory Databases

Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3: Operating Systems Computer Science: An Overview Tenth Edition.

1 LabVIEW DSP Test Integration Toolkit. 2 Agenda LabVIEW Fundamentals Integrating LabVIEW and Code Composer Studio TM (CCS) Example Use Case Additional.

Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.

Exploiting Data Parallelism in SELinux Using a Multicore Processor Bodhisatta Barman Roy National University of Singapore, Singapore Arun Kalyanasundaram,

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

HARDWARE SUPPORT FOR ENFORCING INFORMATION FLOW CONTROL ON MANYCORE SYSTEMS Sarah Bird David McGrogan.

张俊 BTLab Embedded Virtualization Group Outline  Introduction  Performance Analysis  PerformanceTuning Methods.

Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology, Fall 2010 Performance.

Outline Basic VM Concepts Formal Definitions Virtualization Theorems

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Comparison of Distributed Operating Systems. Systems Discussed ◦Plan 9 ◦AgentOS ◦Clouds ◦E1 ◦MOSIX.

PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.

Scalable Multi-core Sonar Beamforming with Computational Process Networks Motivation Sonar beamforming requires significant computation and input/output.

Linux Kernel Management. Module 9 – Kernel Administration ♦ Overview The innermost layer of Linux operating system is the kernel, which is a thin layer.

On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.

Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.

Improving Xen Security through Disaggregation Derek MurrayGrzegorz MilosSteven Hand.

Full and Para Virtualization

By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.

Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.

Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 1: Software and Software Engineering.

MSC.Linux Dr. Stefan Mayer European Technical Sales Manager, Linux Division MSC.Software GmbH, Munich.

Presented by: Xianghan Pei

Background Computer System Architectures Computer System Software.

CS203 – Advanced Computer Architecture Performance Evaluation.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

Linux Optimization Kit Many developers need to get a performance increase from their Linux OS Linux OK allows users to achieve higher performance.

4- Performance Analysis of Parallel Programs

Scaling a file system to many cores using an operation log

Many-core Software Development Platforms

Presented by Remzi Can Aksoy

Presenter (Group 40) Asheen Richards / 鄧芷喬 S

Department of Computer Science University of California, Santa Barbara

AN ANALYSIS OF LINUX SCALABILITY TO MANY CORES

Architecture & System Performance

Operating Systems (COL 331)

Lecture 2 The Art of Concurrency

Department of Computer Science University of California, Santa Barbara

A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E

Presentation transcript:

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T. Clements, M. Frans Kaashoek, Nickolai Zeldovich, Robert T. Morris, and Eddie Kohler MIT CSAIL and Harvard University EECS 582 – W1611/27/16

Outline Introduce the problem Contributions Define the rule Commuter Evaluation Conclusion EECS 582 – W162

What is the usual way to evaluate the scalability of multicore software? Workload Plot Scalability Fix bottleneck Differential profile Any drawbacks?

Drawbacks New workloads expose new bottlenecks More cores expose new bottlenecks The real bottlenecks may be in the interface design

Question: Can scalability opportunities be identified even before any implementation exists, simply by considering interface specifications? Yes!

The Scalable Commutativity Rule Whenever interface operations commute, they can be implemented in a way that scales.

Outline Introduce the problem Contributions Define the rule Commuter Evaluation Conclusion EECS 582 – W167

Contributions The scalable commutativity rule Formalization of the rule and proof of its correctness State-dependent, interface-based commutativity Commuter: An automated scalability testing tool Sv6: A scalable POSIX-like kernel

SIM Commutativity The formalism of the rule relies on SIM commutativity Consider a history H = X || Y (where || concatenates action sequences) * Y SI-commutes in H when given any reordering Y’ of Y, and any action sequence Z, X || Y || Z is a well-formed histories iff X || Y’ || Z is also well formed * Y SIM-commutes in H when for any prefix P of some reordering of Y, P SI-commutes in X || P

Outline Introduce the problem Contributions Define the rule Commuter Evaluation Conclusion EECS 582 – W1610

What scales on today’s multicore? We say two or more operations are scalable if they are conflict-free

The intuition behind the rule Whenever interface operations commute, they can be implemented in a way that scales. Operations commute => results independent of order => communication is unnecessary => without communication, no conflicts

Outline Introduce the problem Contributions Define the rule Commuter Evaluation Conclusion EECS 582 – W1613

Commuter Interface specification (e.g., POSIX) All scalability bottlenecks Implementation (e.g., Linux)

Components of Commuter

Outline Introduce the problem Contributions Define the rule Commuter Evaluation Conclusion EECS 582 – W1616

Commuter finds non-scalable cases in Linux

Sv6: A scalable OS POSIX-like operating system File system and virtual memory system follow commutativity rule Implementing using standard parallel programming techniques, but guided by Commuter

Commutative operations can be made to scale

Benchmark throughput in operations per second per core with varying core counts on sv6

Outline Introduce the problem Contributions Define the rule Commuter Evaluation Conclusion EECS 582 – W1621

Conclusion Whenever interface operations commute, they can be implemented in a way that scales. This rule helps developers in building more scalable software starting from interface design and carrying on through implementation, testing, and evaluation.

Q&A