University of Michigan Electrical Engineering and Computer Science Dynamic Parallelization of JavaScript Applications Using an Ultra-lightweight Speculation.

Slides:



Advertisements
Similar presentations
SoC CAD 1 Automatically Exploiting Cross-Invocation Paralleism Using Runtime Information 徐 子 傑 Hsu,Zi Jei Department of Electrical Engineering National.
Advertisements

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Java Script Session1 INTRODUCTION.
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Instant Profiling: Instrumentation Sampling for Profiling Datacenter Applications Hyoun Kyu Cho 1, Tipp Moseley 2, Richard Hank 2, Derek Bruening 2, Scott.
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
Trace-based Just-in-Time Type Specialization for Dynamic Languages Andreas Gal, Brendan Eich, Mike Shaver, David Anderson, David Mandelin, Mohammad R.
University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.
Structure-driven Optimizations for Amorphous Data-parallel Programs 1 Mario Méndez-Lojo 1 Donald Nguyen 1 Dimitrios Prountzos 1 Xin Sui 1 M. Amber Hassaan.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software.
Techniques for Reducing the Overhead of Run-time Parallelization Lawrence Rauchwerger Department of Computer Science Texas A&M University
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Dynamic Tainting for Deployed Java Programs Du Li Advisor: Witawas Srisa-an University of Nebraska-Lincoln 1.
Scalable, Reliable, Power-Efficient Communication for Hardware Transactional Memory Seth Pugsley, Manu Awasthi, Niti Madan, Naveen Muralimanohar and Rajeev.
Parasol LaboratoryTexas A&M University IPDPS The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops Francis Dang, Hao Yu, and Lawrence.
Lecture 1CS 380C 1 380C Last Time –Course organization –Read Backus et al. Announcements –Hadi lab Q&A Wed 1-2 in Painter 5.38N –UT Texas Learning Center:
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
University of Michigan Electrical Engineering and Computer Science 1 Practical Lock/Unlock Pairing for Concurrent Programs Hyoun Kyu Cho 1, Yin Wang 2,
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
1 CS 131 Wrap Up Fall 2008 What Good is Programming?
SAGE: Self-Tuning Approximation for Graphics Engines
Introduction to AJAX AJAX Keywords: JavaScript and XML
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Dynamic Web Pages (Flash, JavaScript)
CNIT 133 Interactive Web Pags – JavaScript and AJAX JavaScript Environment.
University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Tutorial 10 Programming with JavaScript
Web Pages with Features. Features on Web Pages Interactive Pages –Shows current date, get server’s IP, interactive quizzes Processing Forms –Serach a.
Thread-Level Speculation Karan Singh CS
AjaxScope & Doloto: Towards Optimizing Client-side Web 2.0 App Performance Ben Livshits Microsoft Research (joint work with Emre.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Web Design (1) Terminology. Coding ‘languages’ (1) HTML - Hypertext Markup Language - describes the content of a web page CSS - Cascading Style Sheets.
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
Web Pages with Features. Features on Web Pages Interactive Pages –Shows current date, get server’s IP, interactive quizzes Processing Forms –Serach a.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
Automatically Exploiting Cross- Invocation Parallelism Using Runtime Information Jialu Huang, Thomas B. Jablin, Stephen R. Beard, Nick P. Johnson, and.
Fortress Aaron Becker Abhinav Bhatele Hassan Jafri 2 May 2006.
Parallelism without Concurrency Charles E. Leiserson MIT.
EECS 583 – Class 18 Research Topic 1 Breaking Dependences, Dynamic Parallelization University of Michigan November 21, 2011.
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
Node.Js 1. 2 Contents About Node.Js Web requirement latest trends Introduction Simple web server creation in Node.Js Dynamic Web pages Dynamic web page.
Dynamic Parallelization of JavaScript Applications Using an Ultra-lightweight Speculation Mechanism ECE 751, Fall 2015 Peng Liu 1.
Lightweight Runtime Control Flow Analysis for Adaptive Loop Caching + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing Marisha.
University of Michigan Electrical Engineering and Computer Science Paragon: Collaborative Speculative Loop Execution on GPU and CPU Mehrzad Samadi 1 Amir.
CPU-GPU Collaboration for Output Quality Monitoring Mehrzad Samadi and Scott Mahlke University of Michigan March 2014 Compilers creating custom processors.
A UTOMATICALLY SECURING WEB 2.0 APPLICATIONS THROUGH REPLICATED EXECUTION Ben Livshits Microsoft Research.
EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha, Scott Mahlke, Satish Narayanasamy University of Michigan August,
1 Parallel Processing Fundamental Concepts. 2 Selection of an Application for Parallelization Can use parallel computation for 2 things: –Speed up an.
JavaScript Invented 1995 Steve, Tony & Sharon. A Scripting Language (A scripting language is a lightweight programming language that supports the writing.
K-Nearest Neighbor Digit Recognition ApplicationDomainConstraintsKernels/Algorithms Voice Removal and Pitch ShiftingAudio ProcessingLatency (Real-Time)FFT,
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
Open Solutions for a Changing World™ Eddy Kleinjan Copyright 2005, Data Access WordwideNew Techniques for Building Web Applications June 6-9, 2005 Key.
EECS 583 – Class 18 Research Topic 1 Breaking Dependences, Dynamic Parallelization University of Michigan November 14, 2012.
Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:
Parallelisation of Desktop Environments Nasser Giacaman Supervised by Dr Oliver Sinnen Department of Electrical and Computer Engineering, The University.
WWW and HTTP King Fahd University of Petroleum & Minerals
Sculptor: Flexible Approximation with
Presentation transcript:

University of Michigan Electrical Engineering and Computer Science Dynamic Parallelization of JavaScript Applications Using an Ultra-lightweight Speculation Mechanism Mojtaba Mehrara Po-Chun Hsu Mehrzad Samadi Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan

Electrical Engineering and Computer Science Web 2.0 is Here More computation is moved to client-side –More responsive browsing –Avoid unnecessary network traffic 2 JavaScript + DHTML Client-side computation Server-side computation Client-side rendering Static HTML Web 1.0 Web 2.0 [Vikram et al., CCS’09]

University of Michigan Electrical Engineering and Computer Science Client-side Computation in JavaScript Flexibility, ease of prototyping, and portability 3 Poor performance is one of the main challenges

University of Michigan Electrical Engineering and Computer Science Client-side Applications 4 Interaction-intensive: –Largely composed of event handlers, triggered by user –Examples are Gmail, Facebook, etc. Compute-intensive: –Dominated by loops and hot functions –Online image editing such as Adobe’s Photoshop.com, Google’s Picnik –Lot more potential: Online games Video editing Sound editing and voice recognition

University of Michigan Electrical Engineering and Computer Science Improving JavaScript Performance Many efforts underway by browser developers to improve JavaScript sequential performance Hits a performance wall as multi-cores are becoming dominant Parallelism must be exploited to make use of multi-core clients JavaScript is inherently sequential Language and run-time system provide little/no concurrency support 5 Our proposal: Low-cost dynamic & speculative parallelization of JavaScript applications Our proposal: Low-cost dynamic & speculative parallelization of JavaScript applications

University of Michigan Electrical Engineering and Computer Science JavaScript Parallelization A typical static parallelization flow 6 Source code Memory dependence analysis Parallel code generation Parallel execution Compile time Runtime Memory profiling Data flow analysis Runtime Dynamic parallelization $ $ Speculation engine (Software transactional memory) $

University of Michigan Electrical Engineering and Computer Science Our Approach: ParaScript Light-weight dynamic analysis & code generation for speculative DOALL loops Low-cost customized SW speculation with a single checkpoint 7 Hot loop detection Initial parallelizability assessment Parallel Code generation Parallel execution Sequential execution Loop Selection Runtime Customized speculation Finish Abort

University of Michigan Electrical Engineering and Computer Science Dependence Analysis 8 JIT compilation time data flow analysis Runtime initial tests + range-based monitoring Runtime reference-counting-based monitoring

University of Michigan Electrical Engineering and Computer Science Scalar Array Conflict Detection Initial assessment catches trivial conflicts Keep track of max and min accessed element indices Cross-check RD/WR sets after thread execution 9 A[0] = … A[5] = … B[7] = … A[6] = A[5]+1 Thread 1 Thread 2 &A05 Array write-set &A66 Array write-set &B77 &A55 Array read-set ptr min max

University of Michigan Electrical Engineering and Computer Science Object Array Conflict Detection More involved than scalar arrays Different indices of the same array may point to the same object 10 ptr RefCnt myObj0 header ptr RefCnt myObj1 header A &A 1 1 B &B 1 2 If dependent based on data-flow analysis If dependent based on data-flow analysis

University of Michigan Electrical Engineering and Computer Science Loop Selection Focus on DOALL-counted (e.g. for loops) Avoid parallelizing loops with: –Browser interactions –HTTP request functions –Runtime code insertion 11 var addFunction = new Function("a", "b“, "return a+b;"); Function addFunction(a, b){ return a+b; } eval("a = 7; b = 13; document.write(a+b);"); a = 7; b = 13; document.write(a+b); Requires locks on browser internals Requires locks on browser internals Requires server-side speculation Requires server-side speculation

University of Michigan Electrical Engineering and Computer Science Checkpointing Mechanism Go through all references, clone them, and ask GC not to touch the clones 12 Cloning process Strings, numbers and Booleans Copy the values Custom objects Deep-copy all properties, avoid recursion Functions - Same as custom objects - No need to clone source code Arrays Clone all properties and elements Monitor overhead, back out if more expensive than a threshold

University of Michigan Electrical Engineering and Computer Science Checkpointing Optimizations Selective variable cloning –Only clone a variable if it is touched during speculative execution Array clone elimination –Large arrays holding results of browser functions –Instead of cloning the array, just call the function again for recovery E.g. getImageData in the canvas HTML5 element 13

University of Michigan Electrical Engineering and Computer Science Parallel Code Generation 14 IE = min(IS+CS*SS,n); for (i=IS;i<IE;i+=SS) // original loop code for (i=IS;i<IE;i+=SS) // original loop code conflictcheck(); chunkbarrier() IS+=CS * TC * SS; conflictcheck(); chunkbarrier() IS+=CS * TC * SS; Parallel Loop Loop Barrier Reduction variable & conditional live-out aggregation -Take checkpoint -Spawn threads

University of Michigan Electrical Engineering and Computer Science Experimental Setup Implemented in Firefox 3.7a1pre Subset of SunSpider benchmark suite –Others identified as not parallelizable early on, causing 2% slow-down due to the initial analysis. A set of Pixastic Image Processing filters 8-processor system -- 2 Intel Xeon Quad-cores, running Ubuntu 9.10 Ran each benchmark 10 times and took the average 15

University of Michigan Electrical Engineering and Computer Science Parallelism Coverage 16 High fraction of sequential execution in the getImageData() browser function DOALL loop that extracts pixel RGB & alpha values High fraction of sequential execution in the getImageData() browser function DOALL loop that extracts pixel RGB & alpha values

University of Michigan Electrical Engineering and Computer Science SunSpider 17 A long iteration dominates execution

University of Michigan Electrical Engineering and Computer Science Pixastic Image Processing 18 2 threads 4 threads 8 threads 1 thread High memory op to computation ratio

University of Michigan Electrical Engineering and Computer Science Conclusion Web applications dominance is pushing JavaScript to the forefront of computing Dynamic environment and performance constraints makes parallelization challenging We introduce efficient solutions for exploiting parallelism 17% speculation overhead across all benchmarks ParaScript achieved an average of 2.55x and 1.82x speedup on 8 processors for SunSpider and Pixastic 19

University of Michigan Electrical Engineering and Computer Science Thank You! Questions? 20