Timothy Zhu and Huapeng Zhou

Slides:

Advertisements

Similar presentations

Copyright © 2000, Daniel W. Lewis. All Rights Reserved. CHAPTER 8 SCHEDULING.

Advertisements

CS Lecture 4 Programming with Posix Threads and Java Threads George Mason University Fall 2009.

Matching Memory Access Patterns and Data Placement for NUMA Systems Zoltán Majó Thomas R. Gross Computer Science Department ETH Zurich, Switzerland.

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)

Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.

Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison

- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.

1 Characterizing the Sort Operation on Multithreaded Architectures Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad* The Advanced Computer Architecture.

Evaluating Non-deterministic Multi-threaded Commercial Workloads Computer Sciences Department University of Wisconsin—Madison

Scheduler Activations Jeff Chase. Threads in a Process Threads are useful at user-level – Parallelism, hide I/O latency, interactivity Option A (early.

CS252: Systems Programming Ninghui Li Final Exam Review.

Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.

University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.

AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.

Lecture 2 Process Concepts, Performance Measures and Evaluation Techniques.

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.

Cosc 4740 Chapter 6, Part 3 Process Synchronization.

Threads in Java. History  Process is a program in execution  Has stack/heap memory  Has a program counter  Multiuser operating systems since the sixties.

Scheduling Basic scheduling policies, for OS schedulers (threads, tasks, processes) or thread library schedulers Review of Context Switching overheads.

Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises.

Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Performance of Work Stealing in Multiprogrammed Environments Matthew Hertz Department.

On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.

Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.

Chapter 6 – Process Synchronisation (Pgs 225 – 267)

Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.

Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad*

Practice Chapter Five.

What is an Operating System? Various systems and their pros and cons –E.g. multi-tasking vs. Batch OS definitions –Resource allocator –Control program.

Sunpyo Hong, Hyesoon Kim

Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.

Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises: Lab 1 (Performance measurement)

Title of Your Project Team Member 1 Team Member 2.

CSE 451: Operating Systems Section 6 Project 2b. Midterm  Scores will be on Catalyst and midterms were handed back on Friday(?) in class  Talk to Ed,

Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)

Using the VTune Analyzer on Multithreaded Applications

Processes and Threads Processes and their scheduling

ISPASS th April Santa Rosa, California

Midterm Review David Ferry, Chris Gill

Multithreading Tutorial

HPE Persistent Memory Microsoft Ignite 2017

Anshu Raina Suhas Pai Mohit Verma Vikas Goel Yuvraj Patel

Transparent Adaptive Resource Management for Middleware Systems

Chapter 6: CPU Scheduling

Some challenges in heterogeneous multi-core systems

Algorithm Analysis (not included in any exams!)

HyperLoop: Group-Based NIC Offloading to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh.

George Prekas, Marios Kogias, Edouard Bugnion

Multithreading Tutorial

TDC 311 Process Scheduling.

Admission Control and Request Scheduling in E-Commerce Web Sites

Lecture 2 Part 2 Process Synchronization

Fast Communication and User Level Parallelism

CSE 451 Autumn 2003 Section 3 October 16.

Multithreading Tutorial

Architecture & System Performance

Multithreading Tutorial

EE 4xx: Computer Architecture and Performance Programming

Levels of Parallelism within a Single Processor

Uniprocessor scheduling

Department of Computer Science University of California, Santa Barbara

CSE 153 Design of Operating Systems Winter 2019

COT 4600 Operating Systems Fall 2009

CSC Multiprocessor Programming, Spring, 2011

Nikola Grcevski Testarossa JIT Compiler IBM Toronto Lab

CS Introduction to Operating Systems

Presentation transcript:

Timothy Zhu and Huapeng Zhou Critical Section Characterization and Acceleration in Real World Applications Timothy Zhu and Huapeng Zhou

Motivation Performance of multithreaded applications is limited by critical sections We provide an analytical model to analyze the impact of critical sections on performance

Model Theoretical Bounds E[R] ≥ max(D, N*Dmax – E[Z]) N = number of threads (e.g. 4) R = time waiting on and executing critical section Z = time executing non-critical section X = throughput = number of iterations around loop per sec Theoretical Bounds E[R] ≥ max(D, N*Dmax – E[Z]) X ≤ min(N/(D+E[Z]), 1/Dmax) Dmax is the maximum duration of executing a critical section D is the sum of durations executing critical sections

Methodology Implemented a hooking library around pthread Interpose common mutex and condition variable calls We also experiment with an alternative spinlock implementation to lower latency Raw measurements mutex address, thread id, return address, start time, lock time, unlock time Benchmarks memcached MySQL (oltp-simple, oltp-complex, oltp-nontrx) Runs on dual socket 6-core Xeon processors (with HT OS sees 24 cores) with 48 Gb RAM

Experimental Results I Memcached has one bottleneck critical section with demand Dmax E[R] increases as N increases since more threads are waiting Overall throughput is also affected by other shared resources N E[R] E[Z] X D Dmax 2 1020 8739 0.000205 867 767 3 2174 7697 0.000304 1373 1197 5 4485 7117 0.000431 1668 1359 7 7710 7462 0.000461 1792 1439 11 21668 9878 0.000349 2385 1876 13 28092 12118 0.000323 2527 2005 21 61033 19086 0.000262 3311 2570

Experimental Results II The operation point falls within theoretical bounds

Next Steps We have a method of identifying a bottleneck critical section Develop a data-driven simulator Manipulates logged critical section data Provides insights on what can be improved Evaluate architecture ideas Accelerated Critical Sections (ACS) Scheduling algorithms