Optimizing dynamic dispatch with fine-grained state tracking Salikh Zakirov, Shigeru Chiba and Etsuya Shibayama Tokyo Institute of Technology Dept. of.

Slides:



Advertisements
Similar presentations
Method Shelters: Avoiding Conflicts among Class Extensions Caused by Local Rebinding Shumpei Akai, Shigeru Chiba Tokyo Institute of Technology 1.
Advertisements

Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems Jason D. Hiser, Daniel Williams, Wei Hu, Jack W. Davidson, Jason.
IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Resurrector: A Tunable Object Lifetime Profiling Technique Guoqing Xu University of California, Irvine OOPSLA’13 Conference Talk 1.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
File System Implementation
Aarhus University, 2005Esmertec AG1 Implementing Object-Oriented Virtual Machines Lars Bak & Kasper Lund Esmertec AG
Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.
File System Structure §File structure l Logical storage unit l Collection of related information §File system resides on secondary storage (disks). §File.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Virtual Memory I Chapter 8.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Umbra: Efficient and Scalable Memory Shadowing CGO 2010, Toronto, Canada April 26, 2010.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
Chapter 3 Memory Management: Virtual Memory
OOPs Object oriented programming. Based on ADT principles  Representation of type and operations in a single unit  Available for other units to create.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
An Introduction to Design Patterns. Introduction Promote reuse. Use the experiences of software developers. A shared library/lingo used by developers.
Presentation of Failure- Oblivious Computing vs. Rx OS Seminar, winter 2005 by Lauge Wullf and Jacob Munk-Stander January 4 th, 2006.
Web Caching By Neeraj Agrawal. Caching Caching is widely used for improving performance in many context( e.g processor caches in hardware, buffer pool.
Computer Architecture Memory Management Units Iolanthe II - Reefed down, heading for Great Barrier Island.
2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large.
How to select superinstructions for Ruby ZAKIROV Salikh*, CHIBA Shigeru*, and SHIBAYAMA Etsuya** * Tokyo Institute of Technology, dept. of Mathematical.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
SOCSAMS e-learning Dept. of Computer Applications, MES College Marampally VIRTUALMEMORY.
Objects & Dynamic Dispatch CSE 413 Autumn Plan We’ve learned a great deal about functional and object-oriented programming Now,  Look at semantics.
OO as a language for acm l OO phrase l Mental model of key concepts.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Module 4.0: File Systems File is a contiguous logical address space.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Ronny Krashinsky Erik Machnicki Software Cache Coherent Shared Memory under Split-C.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Determina, Inc. Persisting Information Across Application Executions Derek Bruening Determina, Inc.
OOPs Object oriented programming. Abstract data types  Representationof type and operations in a single unit  Available for other units to create variables.
Testing OO software. State Based Testing State machine: implementation-independent specification (model) of the dynamic behaviour of the system State:
Processes and Virtual Memory
ISBN Object-Oriented Programming Chapter Chapter
Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736.
Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 5th Edition Copyright © 2015 John Wiley & Sons, Inc. All rights.
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
11-1 © Prentice Hall, 2004 Chapter 11: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.
File-System Management
Chapter 11: File System Implementation
Hash-Based Indexes Chapter 11
SAVI Objects: Sharing and Virtuality Incorporated
TIM 58 Chapter 8: Class and Method Design
Design IV Chapter 18 11/14/2018 Crowley OS Chap. 18.
Optimizing Malloc and Free
Query Optimization Techniques
Objects and Aspects: What we’ve seen so far
Inlining and Devirtualization Hal Perkins Autumn 2011
Towards JIT compiler for IO language Dynamic mixin optimization
Inlining and Devirtualization Hal Perkins Autumn 2009
Code Shape IV Procedure Calls & Dispatch
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
José A. Joao* Onur Mutlu‡ Yale N. Patt*
Operating Systems CMPSC 473
Chapter 14: File-System Implementation
CS703 - Advanced Operating Systems
Query Optimization Techniques
Presentation transcript:

Optimizing dynamic dispatch with fine-grained state tracking Salikh Zakirov, Shigeru Chiba and Etsuya Shibayama Tokyo Institute of Technology Dept. of Mathematical and Computing Sciences

code composition technique Mixin 2 Server BaseServer Server BaseServer Additional Security Additional Security Mixin use declarationMixin semantics

Temporary change in class hierarchy Available in Ruby, Python, JavaScript Dynamic mixin 3 Server BaseServer Server BaseServer Additional Security

Dynamic mixin (2) Powerful technique of dynamic languages Enables ▫ dynamic patching ▫ dynamic monitoring Can be used to implement ▫ Aspect-oriented programming ▫ Context-oriented programming Widely used in Ruby, Python ▫ e.g. Object-Relational Mapping 4

Dynamic mixin in Ruby Ruby has dynamic mixin ▫ but only “install”, no “remove” operation “remove” can be implemented easily ▫ 23 lines 5

Target application Mixin is installed and removed frequently Application server with dynamic features 6 class BaseServer def process() … end end class Server < BaseServer def process() if request.isSensitive() Server.class_eval { include AdditionalSecurity } end super # delegate to superclass … # remove mixin end module AdditionalSecurity def process() … # security check super # delegate to superclass end

Overhead is high Reasons Invalidation granularity ▫ clearing whole method cache ▫ invalidating all inline caches  next calls require full method lookup Inline caching saves just 1 target ▫ which changes with mixin operations ▫ even though mixin operations are mostly repeated 7

Our research problem Improve performance of application which frequently uses dynamic mixin ▫ Make invalidation granularity smaller ▫ Make dynamic dispatch target cacheable in presence of dynamic mixin operations 8

Proposal Reduce granularity of inline cache invalidation ▫ Fine-grained state tracking Cache multiple dispatch targets ▫ Polymorphic inline caching Enable cache reuse on repeated mixin installation and removal ▫ Alternate caching 9

Basics: Inline caching 10 ic method cat.speak() class consider a call site cat.speak() (executable code) method = lookup(cat, ”speak”) method(cat) Dynamic dispatch implementation if (cat has type ic.class) { ic.method(cat) } else { ic.method = lookup(cat, ”speak”) ic.class = cat.class ic.method(cat) } Inline caching Expensive! But the result is mostly the same Cat Animal subclass cat instance speak() { … } method implementation speak Cat

Inline caching: problem 11 ic method cat.speak() class if (cat has type ic.class) { ic.method(cat) } else { ic.method = lookup(cat, ”speak”) ic.class = cat.class ic.method(cat) } Inline caching Cat Animal cat instance Training speak() { … } speak Cat What if the method has been overridden?

Inline caching: invalidation 12 ic method cat.speak() class Cat Animal cat instance Training speak() { … } speak Cat if (cat has type ic.class && state == ic.state) { ic.method(cat) } else { ic.method = lookup(cat, ”speak”) ic.class = cat.class; ic.state = state ic.method(cat) } 1 Global state state1 speak 2 2 Single global state object too coarse invalidation granularity

Fine-grained state tracking Many state objects ▫ small invalidation extent ▫ share as much as possible One state object for each family of methods called from the same call site State objects associated with lookup path ▫ links updated during method lookups Invariant ▫ Any change that may affect method dispatch must also trigger change of associated state object 13

method class pstate speak *1* State object allocation 14 speak() { *1* } Animal Cat 1 speak ic No implemmentation here if (cat has type ic.class && ic.pstate.state == ic.state ) { ic.method(cat) } else { ic.method, ic.pstate = lookup(cat, ”speak”, ic.pstate) ic.class = cat.class; ic.state = state method(cat) } inline caching code 1 cat.speak() state1 Cat

speak() { *1* } Animal Cat speak ic method class pstate cat.speak() state speak *1* speak *2* 112 Mixin installation 15 1 Training speak() { *2* } 22 Cat if (cat has type ic.class && ic.pstate.state == ic.state ) { ic.method(cat) } else { ic.method, ic.pstate = lookup(cat, ”speak”, ic.pstate) ic.class = cat.class; ic.state = state method(cat) } inline caching code

Training speak() { *2* } Cat speakspeak() { *1* } Animal pstate if (cat has type ic.class && ic.pstate.state == ic.state ) { ic.method(cat) } else { ic.method, ic.pstate = lookup(cat, ”speak”, ic.pstate) ic.class = cat.class; ic.state = state method(cat) } inline caching code method class cat.speak() state2 speak *2* 2 3 speak *1* 3 Mixin removal ic Cat

speak() { *1* } AnimalCat speak Training speak() { *2* } method pstate state Detect repetition Conflicts detected by state check speak *1* speak *2* 34 Alternate caching 17 A 34 superAnimal alternate cache speak … 34 Training ic class cat.speak() Cat Inline cache contents oscillates

speak() { *1* } AnimalCat speak Training speak() { *2* } method class pstate state Use multiple entries in inline cache Polymorphic caching 18 4 ic 3 superAnimal alternate cache speak … 34 Training cat.speak() Cat *1**2* 34

QQ Cat speak Training speak() { *2* }speak() { *1* } Animal State object merge 19 executable code cat.speak() S Overridden by One-time invalidation animal.speak() cat instance animal instance while(true) { remove mixin }

Overheads of proposed scheme Increased memory use ▫ 1 state object per polymorphic method family ▫ additional method entries ▫ alternate cache ▫ polymorphic inline cache entries Some operations become slower ▫ Lookup needs to track and update state objects ▫ Explicit state object checks on method dispatch 20

Generalizations (beyond Ruby) Delegation object model ▫ track arbitrary delegation pointer change Thread-local delegation ▫ allow for thread-local modification of delegation pointer ▫ by having thread-local state object values Details in the article… 21

Evaluation Implementation based on Ruby Hardware ▫ Intel Core i GHz 22

Evaluation: microbenchmarks Single method call overhead ▫ Inline cache hit  state checks 1%  polymorphic inline caching 49% overhead ▫ Full lookup  2x slowdown 23

Dynamic mixin-heavy microbenchmark 24 (smaller is better)

Evaluation: application Application server with dynamic mixin on each request 25 (smaller is better)

Evaluation Fine-grained state tracking considerably reduces overhead Alternate caching brings only small improvement ▫ Number of call sites affected by mixin is low ▫ Lookup cost / inline cache hit cost is low  about 1.6x on Ruby 26

Related work Dependency tracking in Self ▫ focused on reducing recompilation, rather than reducing method lookups Inline caching for Objective-C ▫ state object associated with method, no dynamic mixin support 27

Conclusion We proposed combination of techniques ▫ Fine-grained state tracking ▫ Alternate caching ▫ Polymorphic inline caching To increase efficiency of inline caching ▫ with frequent dynamic mixin installation and removal 28

Thank you for your attention 29

Method caching in Ruby Global hashtable ▫ indexed by method name and class On method lookup ▫ gives answer in 1 hash lookup On miss ▫ answer obtained by recursive lookup ▫ result stored in method cache On method redefinition or mixin operation ▫ method cache cleared completely 30