An Efficient Compilation Framework for Languages Based on a Concurrent Process Calculus Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa Yonezawa Laboratory.

Slides:

Advertisements

Similar presentations

Type Analysis and Typed Compilation Stephanie Weirich Cornell University.

Advertisements

Computer Organization and Architecture

Delivering High Performance to Parallel Applications Using Advanced Scheduling Nikolaos Drosinos, Georgios Goumas Maria Athanasaki and Nectarios Koziris.

8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.

Request Scheduling for Multiactive Objects StudentJustine Rochas SupervisorLudovic Henrio Research lab INRIA-I3S-CNRS-UNS Research teamOASIS Master Thesis.

ESP: A Language for Programmable Devices Sanjeev Kumar, Yitzhak Mandelbaum, Xiang Yu, Kai Li Princeton University.

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.

1 Reversibility for Recoverability Ivan Lanese Computer Science Department FOCUS research group University of Bologna/INRIA Bologna, Italy.

Program Representations. Representing programs Goals.

Paraμ A Partial and Higher-Order Mutation Tool with Concurrency Operators Pratyusha Madiraju AdVanced Empirical Software Testing and Analysis (AVESTA)

Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.

Chapter 5 Threads os5.

Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.

490dp Synchronous vs. Asynchronous Invocation Robert Grimm.

Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.

Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.

1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.

Survey of Typed Assembly Language (TAL) Introduction and Motivation –Conventional untyped compiler < Typed intermediate languages –Typed intermediate language.

Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.

3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.

3.5 Interprocess Communication

Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.

Models of Computation as Program Transformations Chris Chang

To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,

Foundations of Programming Languages – Course Overview Xinyu Feng Acknowledgments: some slides taken or adapted from lecture notes of Stanford CS242

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.

1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.

Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.

May/01/2000HIPS Online Computation of Critical Paths for Multithreaded Languages Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa University of Tokyo.

Programming Languages –14 David Watt (Glasgow) Steven Wong (Singapore) Moodle : Computing Science → Level 3 → Programming Languages 3 © 2012 David.

Java Meets Fine-grain Multithreading Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa University of Tokyo.

Executing Parallel Programs with Potential Bottlenecks Efficiently University of Tokyo Yoshihiro Oyama Kenjiro Taura (visiting UCSD) Akinori Yonezawa.

Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++

An Implementation and Performance Evaluation of Language with Fine-Grain Thread Creation on Shared Memory Parallel Computer Yoshihiro Oyama, Kenjiro Taura,

1 Causal-Consistent Reversible Debugging Ivan Lanese Focus research group Computer Science and Engineering Department University of Bologna/INRIA Bologna,

How to select superinstructions for Ruby ZAKIROV Salikh*, CHIBA Shigeru*, and SHIBAYAMA Etsuya** * Tokyo Institute of Technology, dept. of Mathematical.

COP4020 Programming Languages Names, Scopes, and Bindings Prof. Xin Yuan.

CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 9.

The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.

Compiler design Lecture 1: Compiler Overview Sulaimany University 2 Oct

Overview of An Efficient Implementation Scheme of Concurrent Object-Oriented Languages on Stock Multicomputers Tony Chen, Sunjeev Sikand, and John Kerwin.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.

Compilers: Overview/1 1 Compiler Structures Objective – –what are the main features (structures) in a compiler? , Semester 1,

Executing Parallel Programs with Potential Bottlenecks Efficiently Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa {oyama, tau,

1 A Secure Access Control Mechanism against Internet Crackers Kenichi Kourai* Shigeru Chiba** *University of Tokyo **University of Tsukuba.

Sep/05/2001PaCT Fusion of Concurrent Invocations of Exclusive Methods Yoshihiro Oyama (Japan Science and Technology Corporation, working in University.

Multithreaded Programing. Outline Overview of threads Threads Multithreaded Models  Many-to-One  One-to-One  Many-to-Many Thread Libraries  Pthread.

CS 838: Pervasive Parallelism Introduction to pthreads Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.

CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.

On Implementing High Level Concurrency in Java G Stewart von Itzstein Mark Jasiunas University of South Australia.

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

CS533 Concepts of Operating Systems Jonathan Walpole.

Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From

Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.

CS 326 Programming Languages, Concepts and Implementation

Parallel Programming By J. H. Wang May 2, 2017.

Review: Chapter 5: Syntax directed translation

Chapter 4: Threads.

Department of Computer Science University of California, Santa Barbara

“just-in-time” compilation (JIT) technique prepared by - Harshada Hole

Intermediate Representations

Closure Representations in Higher-Order Programming Languages

Intermediate Representations

Department of Computer Science University of California, Santa Barbara

A Virtual Machine Monitor for Utilizing Non-dedicated Clusters

Compiler Structures 1. Overview Objective

CSc 453 Interpreters & Interpretation

Presentation transcript:

An Efficient Compilation Framework for Languages Based on a Concurrent Process Calculus Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa Yonezawa Laboratory Department of Information Science University of Tokyo

implementation is difficult analysis is not general Compiling Programming Languages machine code surface language many primitives implementation becomes easy analysis becomes general several essential primitives intermediate language

Intermediate Languages n Sequential languages –Quadruple –Lambda calculus –Continuation Passing Style (CPS) [Appel 92] n Concurrent languages –Pi-calculus [Milner 93] –HACL [Kobayashi et al. 94] process calculus

Concurrent Process Calculus n Key component of calculation –Asynchronous process –Communication channel n Advantages –Clear syntax and clear semantics –Many theoretical results for optimization P P P P C P 3 3

Goal n Efficient implementation of a process calculus ML-like code Schematic code HACL code ….. focus of our research straightforward translation traditional techniques

Motivation Process calculus has some overheads which are not in sequential languages –Dynamic process scheduling –Channel communication Low efficiency with naïve implementation A demand on a sophisticated implementation reducing the overheads

Contribution n A framework for compiling process calculus efficiently n Optimization techniques applicable for software multithreading

Overview of This Presentation n Target language HACL n A basic execution n Our enhanced execution –To reduce scheduling overhead –To reduce communication overhead n Experimental results

Target Language HACL e ::= x | c | op ( e 1, …, e n ) expression P ::= P 1 | P 2 $ x. P$ x. P e ( x )=> P e 1 <= e 2 if e then P 1 else P 2 e 0 ( e 1, …, e n ) parallel execution channel creation receive from e send e 2 to e 1 process instantiation conditional process expression x 0 ( x 1, …, x n ) = P process definition

Basic Execution Model (1/2) - process scheduling - P 1 | P 2 ……… dynamic process creation P１P１ P２P２ both schedulable scheduling pool | P P P continuing execution P２P２ schedulable process

Basic Execution Model (2/2) - channel - channel ⇔ pointer to memory space r Q r ( y )=> Q r <= 8 r <= 12 r ( x )=> P r <= 12 r ( x )=> Pr ( y )=> Q r <= 8 12 value queue process queue { 12 / x } P {8 / y} Q

Inefficiencies of Basic Model n Scheduling overhead –Scheduling pool is manipulated every time a process is created n Communication overhead –Channel communication is always performed through memory

Our Enhanced Execution Model n Static process scheduling –reduces the frequency of the runtime scheduling pool manipulation –lines up code fragments for multiple process expressions n Unboxed channel framework –enables us to communicate values without memory –initially creates a channel on register –later allocates a channel on heap as necessary

Compilation Overview HACL program ML-like program translation rule execution flow scheduling pool execution flow scheduling pool = explicit = implicit F { P 1, P 2,..., P n } = a sequential ML-like program which schedules P 1, P 2,..., P n a set of schedulable process expressions

Compilation with Static Scheduling (1/2) = F P1P1 P2P2 P3P3 code fragment for P 1 F P2P2 P3P3 code fragment for P 2 F P3P3 code fragment for P 3

Compilation with Static Scheduling (2/2) = F P2P2 P1’P1’ r ( x )=> if (r has a value) then (* get the value from r *) else (* allocate a closure for on heap *) F P1’P1’P2P2 F P2P2 P1’P1’ code fragment for P 1 ’ F P2P2 code fragment for P 2

F { (r P) } F { ( $ r.( r P ) ) } F { r ( x )=> P } Compilation Example $ r.( r P ) r = new_channel(); if (r has a value) then x = get_value( r ); F { P } else (* allocate a closure P in heap and... *) F { } F { ( r P) } if (r has a waiting process) then (* wake up the process and … *) else put_value( r, 5 ) ;

Unboxed Channel Scheme n Unboxed channel = channel allocated on register –No memory access to manipulate an unboxed channel n All channels are created as an unboxed channel n An unboxed channel is elevated to a heap-allocated one as necessary

Example r = new_channel(); if (r has a process) then... else put_value( r, 5 ); if (r has a value) then x = get_value( r ); F { P } else... r = EMPTY_CHANNEL; if (...) {... } else { r = 5 + VAL_TAG; if (...) { x = r - VAL_TAG;... } else {... }... } ML-like codeCorresponding C code Channel allocation and communication on a register

When to allocate a space on heap? 6 8 P1P1 P2P2 123 two values are sent an unboxed channel is duplicated ???

Experimental Results (1/2) n HACL is used as an intermediate language of Schematic [Taura et al. 96] n ML-like program is compiled to assembly-like C n A native code is generated by GNU C compiler n Runtime type checks are omitted n SS20 (HyperSparc 150MHz) n Now implementing on a large-scale SMP

Experimental Results (2/2)

Related Work (1/2) n Id [Schauser et al. 95], Fleng [Araki et al. 97] –Static scheduling ≒ sequentialization –A does not depend on B ⇒ A | B → A ; B n Pict [Turner 96] –All channel communications need memory operation –A receive (input) expression always allocate a closure whether a value is present or not

Related Work (2/2) n StackThreads [Taura et al. 94, 97] –An original proposal of unboxed channel n Linear channel [Kobayashi et al. 96, Igarashi 97] –Linear channel = channel used only once –Some communications for linear channel is statically eliminated n CPS [Appel 92] –A compilation framework for sequential language

Conclusion n We proposed a framework compiling process calculus efficiently –Static scheduling –Unboxed channel n A language based on a process calculus is executed only a few times slower than C

Surface Languages for Process Calculi surface language intermediate language machine code concurrent functional language, concurrent OO language, etc... Schematic [Taura et al. 96] Pict [Turner 96] process calculus HACL Pi-calculus

A Schedulable Closure and The Scheduling Stack F { f ( r, x ), P } schedule f ( r, x ) first Can we schedule f and P statically ??? scheduling stack ≒ a set of schedulable closures Difficult. The scheduling pool is still necessary. P f(r, x)f(r, x)