Thread-Specific Storage (TSS) Chris Gill and Venkita Subramonian E81 CSE 532S: Advanced Multi-Paradigm Software Development.

Slides:



Advertisements
Similar presentations
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Advertisements

Exploiting Distributed Version Concurrency in a Transactional Memory Cluster Kaloian Manassiev, Madalin Mihailescu and Cristiana Amza University of Toronto,
The Linux Kernel: Memory Management
Computer Organization CS224 Fall 2012 Lesson 12. Synchronization  Two processors or threads sharing an area of memory l P1 writes, then P2 reads l Data.
Paging Hardware With TLB
4/14/2017 Discussed Earlier segmentation - the process address space is divided into logical pieces called segments. The following are the example of types.
COSC 1P03 Data Structures and Abstraction 5.1 Linear Linked Structures.
Observer Method 1. References Gamma Erich, Helm Richard, “Design Patterns: Elements of Reusable Object- Oriented Software” 2.
CSE506: Operating Systems Block Cache. CSE506: Operating Systems Address Space Abstraction Given a file, which physical pages store its data? Each file.
(1) ICS 313: Programming Language Theory Chapter 10: Implementing Subprograms.
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat.
Chapter 5 Threads os5.
490dp Synchronous vs. Asynchronous Invocation Robert Grimm.
Tutorial 6 & 7 Symbol Table
Review for Test 2 i206 Fall 2010 John Chuang. 2 Topics  Operating System and Memory Hierarchy  Algorithm analysis and Big-O Notation  Data structures.
CSCI 4550/8556 Computer Networks Comer, Chapter 19: Binding Protocol Addresses (ARP)
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
DTHREADS: Efficient Deterministic Multithreading
1 Contents. 2 Run-Time Storage Organization 3 Static Allocation In many early languages, notably assembly and FORTRAN, all storage allocation is static.
Java Methods By J. W. Rider. Java Methods Modularity Declaring methods –Header, signature, prototype Static Void Local variables –this Return Reentrancy.
Data Access Patterns. Motivation Most software systems require persistent data (i.e. data that persists between program executions). In general, distributing.
Operating Systems CSE 411 CPU Management Sept Lecture 11 Instructor: Bhuvan Urgaonkar.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
CEN Network Fundamentals Chapter 19 Binding Protocol Addresses (ARP) To insert your company logo on this slide From the Insert Menu Select “Picture”
E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science Washington University, St. Louis
Smart Reference Proxy Provides additional actions whenever an object is referenced (e.g., counting the number of references to the object) Firewall Proxy.
Design patterns. What is a design pattern? Christopher Alexander: «The pattern describes a problem which again and again occurs in the work, as well as.
C++ for Engineers and Scientists Second Edition Chapter 6 Modularity Using Functions.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.
Scheduling Basic scheduling policies, for OS schedulers (threads, tasks, processes) or thread library schedulers Review of Context Switching overheads.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
E81 CSE 532S: Advanced Multi-Paradigm Software Development Venkita Subramonian, Christopher Gill, Guandong Wang, Zhenning Hu, Zhenghui Xie Department of.
Proactor Pattern Venkita Subramonian & Christopher Gill
1 Symbol Tables The symbol table contains information about –variables –functions –class names –type names –temporary variables –etc.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Copyright 2004 Scott/Jones Publishing Alternate Version of STARTING OUT WITH C++ 4 th Edition Chapter 7 Structured Data and Classes.
C++ Memory Overview 4 major memory segments Key differences from Java
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
Threads G.Anuradha (Reference : William Stallings)
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
COMP3190: Principle of Programming Languages
Ronny Krashinsky Erik Machnicki Software Cache Coherent Shared Memory under Split-C.
Patterns and Tools for Achieving Predictability and Performance with Real-time Java Presenter: Ehsan Ghaneie.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Chapter 10: Classes and Data Abstraction. Objectives In this chapter, you will: Learn about classes Learn about private, protected, and public members.
Lecture 3 Threads Erick Pranata © Sekolah Tinggi Teknik Surabaya 1.
Design Patterns Software Engineering CS 561. Last Time Introduced design patterns Abstraction-Occurrence General Hierarchy Player-Role.
1 Becoming More Effective with C++ … Day Two Stanley B. Lippman
E81 CSE 532S: Advanced Multi-Paradigm Software Development Venkita Subramonian, Christopher Gill, Ying Huang, Marc Sentany Department of Computer Science.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Chapter 10: Classes and Data Abstraction. Classes Object-oriented design (OOD): a problem solving methodology Objects: components of a solution Class:
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
©2004 Joel Jones 1 CS 403: Programming Languages Lecture 3 Fall 2004 Department of Computer Science University of Alabama Joel Jones.
XSLT 3.0 Packages Michael H. Kay. XSLT 3.0 Themes Streaming –support for large documents Packaging –support for large (sets of) stylesheets Maps –support.
Introduction to threads
Current Generation Hypervisor Type 1 Type 2.
Event Handling Patterns Asynchronous Completion Token
Processes and Threads Processes and their scheduling
Computer Engg, IIT(BHU)
C++ for Engineers and Scientists Second Edition
Names, Binding, and Scope
Multiple Processor Systems
Dr. Bhargavi Dept of CS CHRIST
Half-Sync/Half-Async (HSHA) and Leader/Followers (LF) Patterns
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
Ch 17 - Binding Protocol Addresses
Presentation transcript:

Thread-Specific Storage (TSS) Chris Gill and Venkita Subramonian E81 CSE 532S: Advanced Multi-Paradigm Software Development

Thread Local Storage in C++11 A variable can be declared thread_loca l as of C++11 –Lifetime is the lifetime of the thread Useful for data that are logically global to the thread –Good for avoiding passing references to it up and down call stack –E.g., if data are made extern, or static, or put in a namespace, etc. Good fences make good neighbors –Not visible to other threads (unless a pointer/reference is given away) What if there are many different thread-specific data? –If all threads use instances of all the same types all the time, can put them in a struct and make instances of the struct thread local –Otherwise, thread-specific storage (TSS) pattern can help

A More Complete and General Solution: Thread-Specific Storage (TSS) Pattern Logically thread-global access point –Maps index to object –Index is a 2-tuple e.g., an STL pair of Avoids lock overhead –Separate copy per Logically a m x n table –Sparse/dense, small/large –Implement accordingly A TSS table points to different kinds of thread-specific objects key1 key2 key3 TSS table tid1tid2tid3tid4 errno values connections

Alternative Table Implementations 2-D array is good for many use-cases –Small #s of threads, keys –And/or densely populated –May avoid data races Hash map, skip-list, etc. may be better for others –Large row/column sizes –Sparsely populated –But, adds some overhead –Data races may occur key1 Hash Map tid2 tid4 key3tid4 key3tid3 key3tid1 key1

TSS and Resource Indexing Multiple object lookup keys –Each key in a thread is for a different object Explicit tid indexing –Used when a thread needs to cross-reference another’s TSS Watch out for race conditions Avoid locking if at all possible Benefit of thread id indexing –Threads remain mostly unaware of each other’s TSS resources –As if each were the only thread in the process that uses TSS –Unless a thread compares the thread id it is given with its own via std::this_thread::get_id() distinguished by thread ids distinguished by keys thread-specific objects

Host 1 RTCORBA 2.0 Scheduler Host 2 RTCORBA 2.0 Scheduler Binding of a single DT to different local OS threads Remote call carries DT’s parameters with it Key issues –Identity of the distributable thread abstraction (GUID) –Mapping and remapping DT to different local threads E.g., when DT makes a remote call, release local thread to reactor E.g., when DT makes a nested call back onto the same host remote calls and returns Distributable Thread (DT) TSS Variant

A distributable thread can use thread-specific storage –Avoids locking of global data Context: OS provided TSS is efficient, uses OS thread id Problem: distributable thread may span OS threads –Difficult to access prior storage Solution: TSS emulation –based on pair –also useful idea on platforms that don’t provide native TSS Key question to answer –What is the cost of TSS emulation compared to the OS provided version of TSS? Distributable Thread (DT) TSS Variant

TSS Emulation Costs (Mgeta, RTAS04) Pentium tick timestamps –Nanosecond resolution on 2.8 GHz P4, 512KB cache, 512MB memory –RedHat 7.3, real-time class –Called create repeatedly –Then, called write/read repeatedly on one key Upper graph shows scalability of key creation –Cost scales linearly with number of keys in OS, ACE TSS –Emulation costs ~2usec more per key creation Lower graph shows the emulated write costs ~1.5usec, read ~.5usec more

Conclusions Benefits of using Thread-Specific Storage Pattern –Efficiency of access (no locking) –Reusability (via Wrapper Fa ç ade) –Ease of use (hides complexity) Liabilities of the pattern –Potential cluttering of the TSS map Objects not used by multiple threads don’t belong in the map Putting them there wastes space, adds program complexity –“Yet another” factor obscuring system structure/behavior E.g., have to understand map during multi-threaded debugging –Language-specific implementation options May reduce portability E.g., templates and operator overloading