Text Amol Khanapurkar, TCS Chetan Phalak, TCS Measuring Wait and Service Time in Java Using Bytecode Instrumentation.

Text Amol Khanapurkar, TCS Chetan Phalak, TCS Measuring Wait and Service Time in Java Using Bytecode Instrumentation

Agenda Problem Statement – Measuring Response Time components Available Solutions Our solution Validating solution Q & A 2

Problem Statement Code comprises of Serial and Parallel portions. Serial portions limit scalability. Larger serial portion manifests in form of loss of throughput / lack of scalability 3

Problem Statement For purpose of this discussion, lets assume that code is inherently multi-threaded and serial portion of the code is implemented using Critical Section. A Critical Section in Java looks as below synchronized(_lock) { try { sharedVal++; }catch(Exception e) { e.printStackTrace(); } Acquire a lock Thread-safe operation 4

Problem Statement - Response Time Components Response Time measurements in multi-threaded environment involving Critical Sections 1 2 3 4 6 5 7 ThreadsCritical Section 11 1 2 3 2 1 7 6 5 4 3 2 Service Time Wait Time 1 2 3 4 6 5 7 5

Problem Statement – Service and Wait Time Java provides 3 mechanisms to manage concurrency –Synchronized Block –Synchronized Object –Synchronized Method 6 public void inc(){ long t1 = System.currentTimeMillis(); synchronized(_lockA1){ long t2 = System.currentTimeMillis(); try{ sharedVal++; }catch(InterruptedException e){ e.printStackTrace(); } long t3 = System.currentTimeMillis(); } long t4 = System.currentTimeMillis(); } Where, (t4 - t1) is the Response Time, (t3 - t2) is the Service Time and (t2 - t1) is the Wait Time Also, in practice t3 = t4 Java provides 3 mechanisms to manage concurrency –Synchronized Block –Synchronized Object –Synchronized Method Out-of-Scope

Alternate Solutions Logging - Insert probes in source code using logging techniques –Logging has its own overheads –After the logs have been written to, these logs will need to be crunched programmatically to get the desired information –Even for simple programs, the crunching program can get complex because it has to tag the appropriate timestamps to appropriate threads –For complex programs involving nested synchronized blocks (e.g. code that implements 2-phase commits), the crunching program can quickly become more complex and may require significant development and testing time –This method will fail in cases where source code is not available 7 Buy tools - Very few commercial tools provide this data.

Our Solution Our objective in building a state machine is to get the following details about a Synchronized Block. –Location of the block i.e. which class and which method is synchronized() –Variable name on which this block is synchronized() –Breakup of synchronized() block response time into wait and service time components 8 We take same approach as discussed in Logging section in previous slide to take our measurements except that the probes are inserted using Bytecode Instrumentation techniques. We call the Bytecode instrumentation infrastructure, the State Machine. Lets see how the state machine works.

Introduction to Bytecode Instrumentation Java bytecode is the instruction set of the Java virtual machine. Each bytecode is composed by one, or in some cases two, bytes that represent the instruction (opcode), along with zero or more bytes for passing parameters. Of the 256 possible byte-long opcodes, 198 are currently in use, 51 are reserved for future use. : Source Wikipediainstruction setJava virtual machinebytecodeopcodeopcodes 9 Bytecode instrumentation is the process of reading the compiled code and adding / transforming the existing code into new code with additional behavior(s). The input and output of the instrumentation process, both is the bytecode itself. We use ObjectWeb ASM library to do the instrumentation. ASM library has 2 APIs – The Streaming API and the Tree API. We use the Streaming API to do the instrumentation. Load-time instrumentation is convenient, however in our prototype implementation, due to time constraints, we did compile-time instrumentation. Conversion is trivial and is just a matter of implementing the right API calls which provides hooks for load-time instrumentation.

10 Introduction to Bytecode Instrumentation public void inc(){ long t1 = System.currentTimeMillis(); synchronized(_lockA1){ long t2 = System.currentTimeMillis(); try{ sharedVal++; Thread.sleep(10); }catch(InterruptedException e){ e.printStackTrace(); } long t3 = System.currentTimeMillis(); } long t4 = System.currentTimeMillis(); } public void inc(); Code: 0: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J 3: lstore_1 4: aload_0 5: getfield #3; //Field _lockA1:Ljava/lang/Object; 8: dup 9: astore_3 10: monitorenter 11: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J 14: lstore 4 16: aload_0 17: dup 18: getfield #5; //Field sharedVal:I 21: iconst_1 22: iadd 23: putfield #5; //Field sharedVal:I 26: ldc2_w #6; //long 10l 29: invokestatic #8; //Method java/lang/Thread.sleep:(J)V 32: goto 42 35: astore 6 37: aload 6 39: invokevirtual #10; //Method java/lang/InterruptedException.printStackT race:()V 42: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J 45: lstore 6 47: aload_3 48: monitorexit 49: goto 59 52: astore 8 54: aload_3 55: monitorexit 56: aload 8 58: athrow 59: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J 62: lstore_3 63: return Exception table: from to target type 16 32 35 Class java/lang/InterruptedException 11 49 52 any 52 56 52 any Fig 1. Source Code Fig 2. Javap depiction of bytecode

Bytecode instrumentation challenges The bytecode will change when source code changes. How to write instrumentation logic such that measurements become source code agnostic? 11 Java Source code and individual Java opcodes are well described in the literature. But mapping between source code and opcode is not well described. It’s a standard practice to use javap utility of the JDK to figure out what bytecode gets generated for a given source code. The JVM specification is prescriptive i.e. it talks only about what should be the behavior of a JVM. It does not talk about How a particular behavior should be implemented. Hence ensuring portability of the solutions across JDK vendors is non-trivial. Steep learning curve (Java opcode to API call mapping), hence easy to make mistakes. Trial and error method was the order of the day to build an empirical model of how the Sync Block in Java works.

State Machine – The Empirical Model 12 public void inc(); Code: 0: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J 3: lstore_1 4: aload_0 5: getfield #3; //Field _lockA1:Ljava/lang/Object; 8: dup 9: astore_3 10: monitorenter 11: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J 14: lstore 4 16: aload_0 17: dup 18: getfield #5; //Field sharedVal:I 21: iconst_1 22: iadd 23: putfield #5; //Field sharedVal:I 26: ldc2_w #6; //long 10l 29: invokestatic #8; //Method java/lang/Thread.sleep:(J)V 32: goto 42 35: astore 6 37: aload 6 39: invokevirtual #10; //Method java/lang/InterruptedException.printStackT race:()V 42: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J 45: lstore 6 47: aload_3 48: monitorexit 49: goto 59 52: astore 8 54: aload_3 55: monitorexit 56: aload 8 58: athrow 59: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J 62: lstore_3 63: return invokestatic lstore aload getfield dup astore monitorenter …. monitorexit Call StackEvent of interest invokestatic lstore aload getfield dup astore monitorenter public void inc(){ long t1 = System.currentTimeMillis(); synchronized(_lockA1){ long t2 = System.currentTimeMillis(); try{ sharedVal++; Thread.sleep(10); }catch(InterruptedException e){ e.printStackTrace(); } long t3 = System.currentTimeMillis(); } long t4 = System.currentTimeMillis(); } public void inc(){ long t1 = System.currentTimeMillis(); synchronized(_lockA1){ long t2 = System.currentTimeMillis(); try{ sharedVal++; Thread.sleep(10); }catch(InterruptedException e){ e.printStackTrace(); } long t3 = System.currentTimeMillis(); } long t4 = System.currentTimeMillis(); } …. monitorexit

State Machine – The Empirical Model 13 getfield astore monitorenter monitorexit Implementation of the State Machine revolves largely around these 4 Opcodes The implementation uses 2 instrumentation adapters 1.ResponseTime Adapter 2.ServiceTime Adapter ResponseTime Adapter getfield astore monitorenter Sets a flag to true Takes timestamp (T1) monitorexit Takes timestamp (T3) getfield astore monitorenter ServiceTime Adapter Checks if flag is true If true, takes timestamp (T2) sets flag to false 1.Instrumentation Adapters change the state of the bytecode by adding / inserting their own probes. 2.When more than one adapters are involved, it is the responsibility of the developer to ensure proper order and coordination between two adapters. 3.In our solution, we do one instrumentation and write out the bytecode. This bytecode becomes input for the second adapter.

State Machine – Implementation nuts & bolts 14 The ASM API provides the following hooks for events (of interest to us) to be generated visitInsn() :- For monitorenter and monitorexit visitVarInsn() :- For astore visitFieldInsn() :- For getstatic and getfield visitMethodInsn() :- For class and method names The adapters provide implementations of these methods to achieve the desired behaviour. Once all timestamps are embedded in the code, at runtime the code logs the following tuple to in-memory data structure - where tid - Thread Identifier time – Timestamp locationID - a combination of class, method and exact location (say line number or variable on which synchronization happened). Tuples are then crunched to get Service Time and Wait Time components of Response Time

Testing and Validating Solution 16 1.Theory-based programs which demonstrate computer science principles or concepts Producer-Consumer problem Dining Philosopher problem Cigarette Smokers problem M/M/1 Queues 2.Custom programs which are comparable to code that gets written in IT industry today Database connection pooling code Update account balance for money transfer transaction (2-phase commit) These programs check the qualitative and quantitative correctness of the code that passes through the state machine.

Testing - Producer–Consumer Problem Common Buffer Producer 1 Producer 2 Producer n Consumer 1 Consumer 2 Consumer n Buffer Empty Buffer Full 17

Previous Bytecode Instrumentation Benchmarks (~2011) Y-axis : Logarithmic scale from 1 microsecond to 10K seconds X-axis : Samples in increasing order of payload Fig. 4 Jensor overheads on response times Overheads are constant in the range of 10 – 30 microseconds - 18 - 18

Takeaways It is possible to decompose Response Time at critical section into its Service Time and Wait Time components using BCI 19 Having visibility into Wait times can enable tuning / optimization / redesign of the critical section of the code Having visibility into Service time can enable better modeling of Java applications Our State machine based approach is capable of getting values for above metrics Other researchers have built utilities on top of this implementation to predict performance in presence of software and hardware resource bottlenecks

20 Q & A

Text Amol Khanapurkar, TCS Chetan Phalak, TCS Measuring Wait and Service Time in Java Using Bytecode Instrumentation.

Similar presentations

Presentation on theme: "Text Amol Khanapurkar, TCS Chetan Phalak, TCS Measuring Wait and Service Time in Java Using Bytecode Instrumentation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Text Amol Khanapurkar, TCS Chetan Phalak, TCS Measuring Wait and Service Time in Java Using Bytecode Instrumentation.

Similar presentations

Presentation on theme: "Text Amol Khanapurkar, TCS Chetan Phalak, TCS Measuring Wait and Service Time in Java Using Bytecode Instrumentation."— Presentation transcript:

Similar presentations

About project

Feedback