Implementing Remote Procedure Call Landon Cox February 3, 2017
Modularity so far Procedures as modules What is private and what is shared between procedures? Local variables are private Stack, heap, global variables are shared Module 1 Module 2 Code Code Private state Shared state Private state
Modularity so far Procedures as modules How is control transferred between procedures? Caller adds arguments and RA to stack, jumps into callee code Callee sets up local variables, runs code, jumps to RA Module 1 Module 2 Code Code Private state Shared state Private state
Modularity so far Procedures as modules Is isolation between procedures enforced? No, either module can corrupt the other No guarantee that callee will return to caller either Module 1 Module 2 Code Code Private state Shared state Private state
Modularity so far MULTICS processes as modules What is private, shared btw MULTICS processes? Address spaces are private Segments can be shared Module 1 Module 2 Code Code Private state Shared state Private state
Modularity so far MULTICS processes as modules How is control transferred between MULTICS processes? Use synchronization primitives from supervisor Lock/unlock, wait/notify Module 1 Module 2 Code Code Private state Shared state Private state
Modularity so far MULTICS processes as modules Is isolation btw MULTICS processes enforced? Yes, modules cannot corrupt private state of the other Isolate shared state inside common segments Module 1 Module 2 Code Code Private state Shared state Private state
Modularity so far UNIX processes as modules What is private and what is shared btw UNIX processes? Address spaces are private File system and pipes are shared Module 1 Module 2 Code Code Private state Shared state Private state
Modularity so far UNIX processes as modules How is control transferred between UNIX processes? Use synchronization primitives from supervisor Block by reading from pipe, notify by writing to pipe Module 1 Module 2 Code Code Private state Shared state Private state
Modularity so far UNIX processes as modules Is isolation between UNIX processes enforced? Yes, modules cannot corrupt private state of the other Protect shared state using pipe buffer and FS access control Module 1 Module 2 Code Code Private state Shared state Private state
Network programming Now say two modules are on different machines What is the standard abstraction for communication? Sockets Each end of socket is bound to an <address, port > pair Module 1 Module 2 Code Code Private state Shared state Private state
Network programming Now say two modules are on different machines Sockets should look familiar Very similar to pipes Use read/write primitives for synchronized access to buffer Downsides of socket programming? Adds complexity to a program Blocking conditions depend on data received Data structures copied into and out of messages or streams All of this work can be tedious and error-prone Idea: programmers are used to local procedures Try to make network programming as easy as procedure calls
Remote procedure call (RPC) RPC makes request/response look local Provides the illusion of a function call RPC isn’t a really a function call In a normal call, the PC jumps to the function Function then jumps back to caller This is similar to request/response though Stream of control goes from client to server And then returns back to the client
The RPC illusion How to make send/recv look like a function call? Client wants Send to server to look like calling a function Reply from server to look like function returning Server wants Receive from client to look like a function being called Wants to send response like returning from function
Implementing RPC Primary challenges How to name, locate the remote code to invoke? How to handle arguments containing pointers? How to handle failures?
RPC architecture Client Server Import Export Export Import Network Client code Server code Import Export Interface Interface Client stub Server stub Export Import Network RPC runtime RPC runtime Who imports and who exports the interface?
RPC architecture Client Server Import Export Export Import Network Client code Server code Import Export Interface Interface Client stub Server stub Export Import Network RPC runtime RPC runtime Who defines the interface? The programmer
RPC architecture Client Server Import Export Export Import Network Client code Server code Import Export Interface Interface Client stub Server stub Export Import Network RPC runtime RPC runtime Who writes the client and server code? The programmer
RPC architecture Client Server Import Export Export Import Network Client code Server code Import Export Interface Interface Client stub Server stub Export Import Network RPC runtime RPC runtime Who writes the stub code? An automated stub generator (rmic in Java)
RPC architecture Client Server Import Export Export Import Network Client code Server code Import Export Interface Interface Client stub Server stub Export Import Network RPC runtime RPC runtime Why can stub code be generated automatically? Interface precisely defines behavior What data comes in, what is returned
RPC architecture Client Server Import Export Export Import Network Client code Server code Import Export Interface Interface Client stub Server stub Export Import Network RPC runtime RPC runtime Where else have we seen automated control transfer? Compilers + procedure calls
RPC stub functions Client stub Server stub call send recv return send
RPC stub functions Client stub Server stub 1) Builds request message with server function name and parameters 2) Sends request message to server stub Transfer control to server stub: clients-side code is paused 8) Receives response message from server stub 9) Returns response value to client Server stub 3) Receives request message 4) Calls the right server function with the specified parameters 5) Waits for the server function to return 6) Builds a response message with the return value 7) Sends response message to client stub
Binding What is binding? In an RPC system what needs to be bound? Establishing map from symbolic name object In an RPC system what needs to be bound? Client code uses interface as a symbolic name RPC system must bind those names to real code instances In Cedar what managed this mapping? The Grapevine distributed database Types are listed as symbolic names Instances are listed as machine addresses
Binding Grapevine Is anyone allowed to export any interface? No, this is regulated through Grapevine access controls Users allowed to export an interface are explicit in group Only group owner can allow someone to export Is anyone allowed to import an interface? Yes, clients authorized at higher level What other distributed database is Grapevine like? Domain name service (DNS) Contains mapping from names to IP addrs Grapevine Group map: interfaces user ids Individual map: user id network address
Binding Grapevine Is anyone allowed to export any interface? No, this is regulated through Grapevine access controls Users allowed to export an interface are explicit in group Only group owner can allow someone to export Is anyone allowed to import an interface? Yes, clients authorized at higher level Are permissions same or different than DNS (reads and writes)? Basically the same DNS updates are controlled DNS retrievals are not Grapevine Group map: interfaces user ids Individual map: user id network address
Shared state What is the shared state of the RPC abstraction? Arguments passed through function call What is the actual shared state in RPC? The underlying messages between client and server Client Server Code Code Private state Shared state Private state
Shared state Why is translating arguments into messages tricky? Data structures have pointers Client and server run in different address spaces Need to ensure that pointer on client = pointer on server Client Server Code Code Private state Shared state Private state
Shared state How do we ensure that a data structure is safely transferred? Must know the semantics of data structure (typed object references) Must then replace pointers on client with valid pointers on server Requires explicit help of programmer to get right Cannot just pass arbitrary C-style structs and hope to work correctly Client Server Code Code Private state Shared state Private state
Shared state What about after server code completes? Must synchronize updates to arguments Changes by server must be reflected in client before returning Client Server Code Code Private state Shared state Private state
Faults With procedures, what happens if a module faults? No isolation, program crashes Result of sharing the same address space With pipes, what happens if a module faults? Faulting module (process) crashes OS makes pipe unreadable and unwritable Cannot just return an error code through client stub Bad idea to overload errors Want to distinguish network failures from incorrectness
Faults How are RPC faults handled in practice? Usually through a software exception Often supported by language So how “pure” is the RPC abstraction? Not totally pure Programmer still knows which calls are local vs remote Have to write code for handling failures So is RPC a good abstraction? In some cases yes, hides a lot of the complexity However, it often comes at a steep performance penalty What part of RPC is slowest? Argument packing and unpacking Java class introspection for shipping data structures is particularly painful
Structuring a concurrent system Talked about two ways to build a system
Alternative structure Can also give cooperating threads own address spaces Each thread is basically a separate process Use messages instead of shared data to communicate Why would you want to do this? Protection Each module runs in its own address space Reasoning behind micro-kernels Each service runs as a separate process Mach from CMU (influenced parts Mac OS X) Vista/Win7’s handling of device drivers
Augmenting the mobile experience through code offload Eduardo Cuervo - Duke Aruna Balasubramanian - U Washington Dae-ki Cho - UCLA Alec Wolman, Stefan Saroiu, Ranveer Chandra, Paramvir Bahl – Microsoft Research
Battery is a scarce resource CPU performance during same period: 246X A solution to the battery problem seems unlikely Just 2X in 15 years
Mobile apps can’t reach their full potential Not on par with desktop counterparts Slow, Limited or Inaccurate Power Intensive Speech Recognition and Synthesis Interactive Games Too CPU intensive Limited Augmented Reality
One Solution: Remote Execution Remote execution can reduce energy consumption Challenges: What should be offloaded? How to dynamically decide when to offload? How to minimize the required programmer effort?
MAUI: Mobile Assistance Using Infrastructure MAUI Contributions: Combine extensive profiling with an ILP solver Makes dynamic offload decisions Optimize for energy reduction Profile: device, network, application Leverage modern language runtime (.NET CLR) To simplify program partitioning Reflection, serialization, strong typing
Roadmap Motivation MAUI system design Evaluation Summary Beyond MAUI MAUI proxy MAUI profiler MAUI solver Evaluation Summary Beyond MAUI
MAUI Architecture RPC RPC Smartphone Maui server Application Client Proxy Profiler Solver Maui Runtime Server Proxy Profiler Solver Maui Runtime Application RPC RPC Maui Controller Smartphone Maui server
How Does a Programmer Use MAUI? Goal: make it dead-simple to MAUI-ify apps Build app as a standalone phone app Add .NET attributes to indicate “remoteable” Follow a simple set of rules
Language Run-Time Support For Partitioning Portability: Mobile (ARM) vs Server (x86) .NET Framework Common Intermediate Language Type-Safety and Serialization: Automate state extraction Reflection: Identifies methods with [Remoteable] tag Automates generation of RPC stubs
MAUI Proxy RPC RPC Smartphone Maui server Application Client Proxy Profiler Solver Maui Runtime Server Proxy Profiler Solver Maui Runtime Application Handles Errors RPC Provides runtime information Intercepts Application Calls Synchronizes State Chooses local or remote RPC Maui Controller Smartphone Maui server
MAUI Profiler Profiler Annotated Callgraph CPU Cycles State size Device Profile Execution Time Network Latency Profiler Callgraph Network Bandwidth Computational Power Cost Computational Delay Annotated Callgraph Network Power Cost Network Delay Computational Delay
MAUI Solver A sample callgraph C 5000 mJ 3000 ms 10000 mJ B 1000mJ Energy and delay for state transfer Computation energy and delay for execution 25000 mJ D 15000 mJ 12000 ms
Is Global Program Analysis Needed? Yes! – This simple example from Face Recognition app shows why local analysis fails. InitializeFace Recognizer 5000 mJ 10000 mJ User Interface FindMatch 900 mJ 1000mJ Cheaper to do local 25000 mJ DetectAndExtract Faces 15000 mJ
Is Global Program Analysis Needed? Yes! – This simple example from Face Recognition app shows why local analysis fails. InitializeFace Recognizer 5000 mJ 10000 mJ Cheaper to do local User Interface FindMatch 900 mJ 1000mJ 25000 mJ DetectAndExtract Faces 15000 mJ Cheaper to do local
Is Global Program Analysis Needed? InitializeFace Recognizer User Interface FindMatch 1000mJ 25900mJ Cheaper to offload DetectAndExtract Faces
Can MAUI Adapt to Changing Conditions? Network Bandwidth/Latency Changes Variability on method’s computational requirements Experiment: Modified off the shelf arcade game application Physics Modeling (homing missiles) Evaluated under different latency settings
Can MAUI Adapt to Changing Conditions? DoLevel HandleMissiles DoFrame HandleEnemies HandleBonuses 11KB + missiles 11KB + missiles 11KB + missiles missiles Required state is smaller Complexity increases with # of missiles *Missiles take around 60 bytes each
Case 1 Zero Missiles Low latency (RTT < 10ms) HandleEnemies DoFrame DoLevel HandleBonuses Offload starting at DoLevel HandleMissiles Computation cost is close to zero *Missiles take around 60 bytes each
Case 2 5 Missiles Some latency (RTT = 50ms) HandleEnemies DoFrame DoLevel HandleBonuses Very expensive to offload everything Little state to offload HandleMissiles Only offload Handle Missiles Most of the computation cost *Missiles take around 60 bytes each
Roadmap Motivation MAUI system design Evaluation Summary Beyond MAUI MAUI proxy MAUI profiler MAUI solver Evaluation Summary Beyond MAUI
MAUI Implementation Platform Applications Windows Mobile 6.5 .NET Framework 3.5 HTC Fuze Smartphone Monsoon power monitor Applications Chess Face Recognition Arcade Game Voice-based translator
Questions How much can MAUI reduce energy consumption? How much can MAUI improve performance? Can MAUI Run Resource-Intensive Applications?
How much can MAUI reduce energy consumption? Face Recognizer An order of magnitude improvement on Wi-Fi Big savings even on 3G
How much can MAUI improve performance? Face Recognizer Improvement of around an order of magnitude
Latency to server impacts the opportunities for fine-grained offload Solver would decide not to offload Arcade Game Opportunities for MAUI nodes collocated with APs or Cell towers Up to 40% energy savings on Wi-Fi
Roadmap Motivation MAUI system design Evaluation Summary Beyond MAUI MAUI proxy MAUI profiler MAUI solver Evaluation Summary Beyond MAUI
Summary MAUI enables developers to: Bypass the resource limitations of handheld devices Low barrier entry: simple program annotations For a resource-intensive application MAUI reduced energy consumed by an order of magnitude MAUI improved application performance similarly MAUI adapts to: Changing network conditions Changing applications CPU demands
Roadmap Motivation MAUI system design Evaluation Summary Beyond MAUI MAUI proxy MAUI profiler MAUI solver Evaluation Summary Beyond MAUI
Beyond MAUI For a method to be offloaded The cost of the network transfer is small The computation cost is high What to do when both costs are high? Video processing Computer graphics Games
MAUI for Games Cloud gaming is already happening How can MAUI help? OnLive, Gaikai, etc. Thin client model Steep bandwidth requirements (6 Mbps) How can MAUI help? Let the phone do as much as possible Reduce bandwidth consumption Allow disconnected gaming
Preliminary results Two promising game offload mechanisms Require 30-70% less bandwidth Still providing the same level of quality as OnLive Enable high-end gaming with as little as 400 kbps With a small reduction in video quality
Fidelity Aware MAUI Current failure model Redefined model How? On disconnection do local Simple and effective but slow Redefined model Give the best results when connected Give results fast at the expense of accuracy How? Allow applications to adapt Make it as simple as possible for application developers
Questions? ecuervo@cs.duke.edu http://research.microsoft.com/en-us/projects/maui/ ecuervo@cs.duke.edu