Presentation is loading. Please wait.

Presentation is loading. Please wait.

EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha, Scott Mahlke, Satish Narayanasamy University of Michigan August,

Similar presentations


Presentation on theme: "EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha, Scott Mahlke, Satish Narayanasamy University of Michigan August,"— Presentation transcript:

1 EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha, Scott Mahlke, Satish Narayanasamy University of Michigan August, 2014 University of Michigan Electrical Engineering and Computer Science 1

2 Evolution of the Web 2 Web 1.0Web 2.0 server client published content user generated content published content user generated content Static Web Pages Passively view content Dynamic Web Pages Collaborate and generate content

3 Evolution of Web 3 Web 1.0Web 2.0 server client published content user generated content published content user generated content Rich user experience compute

4 Evolution of the Web 4 Web 1.0Web 2.0 yahoo.com in 1996 yahoo.com in 2014 30x more instructions executed Good client-side performance Rich User ExperienceBrowser responsiveness

5 Core Specialization 5 Private Caches Core 1 Core 2 Core 3 Core 4 Private Caches Core 1 Core 2 Core 3 Core 4 Multi-core processor

6 Web Core 6 Private Caches Core 1 Core 2 Core 3 Core 4 Private Caches Core 1 Core 2 Core 3 Core 4 WebBoost Web Core Multi-core processor

7 WebBoost 1.0 7 Script performance: High L1-I cache misses Goal: Specialized instruction prefetcher for web client-side script Goal: Specialized instruction prefetcher for web client-side script Othe r Web client-side script performance Browser responsiveness Web browser computational components Web 1.0Web 2.0

8 Poor I-Cache Performance Web pages tend to support numerous functionalities – Large instruction footprint – Lack hot code 8 graphics effects image editing online forms document editing web personalization games audio & video Web client-side script inefficiencies : code bloat – JIT compiled by JS engine – Dynamic typing V8 IonMonkey Nitro Chakra

9 Lack of Hot Code 9 95% 86020,400

10 Poor I-Cache Performance Compared to conventional programs, JS code incurs many more L1-I misses Perfect I-Cache: 53% speedup 10

11 Problem Statement Problem: Poor web client-side script I-Cache performance Opportunity: Web client-side scripts are executed in an event-driven model Solution: – Specialized prefetcher that is customized for event-driven execution model – Identifies distinct events in the instruction stream 11

12 Outline 12 Event-driven Web Applications EFetch Facets of Instruction Prefetching Design and Architecture Methodology Results Conclusion

13 Web Browser Events 13 External Input Event Mouse Click On Load Internal Browser Event

14 Event-driven Web Applications 14 Renderer Thread Event Queue Popping an event for execution Events inserted in to the queue Events generate other events Executes on JS Engine Event Queue empty - Program waits Mouse Click Keyboard key press GPS events External Input Events Internal Events Timer event DOMContentLoaded E2E2 E3E3 E1E1 Hea d Poor I-Cache performance Different events tend to execute different code Events typically execute for a very short duration Poor I-Cache performance Different events tend to execute different code Events typically execute for a very short duration

15 EFetch 15 Renderer Thread E2E2 E3E3 E1E1 Event Fetch - Instruction Prefetcher for event-driven web applications Technique: – Uses an event ID to identify distinct events in the instruction stream – Event ID is augmented to create an event signature that predicts control flow well Event ID

16 Event Signature 16 Renderer Thread E2E2 E3E3 E1E1 Event Type Event Handler Event ID Formed by the browser Uniquely identifies an event Function Call Context Event Signature Formed in the hardware from context depth (3) ancestor functions in the Call Stack Correlates well the program control flow

17 Instruction Prefetcher: Facets 17 What to prefetch? When to prefetch? Instruction Prefetcher

18 What to Prefetch? Naïve solution: On a function call, prefetch the function body – But, this is too late Our approach: On a function call, predict its callees and prefetch their function body addresses 18 event ID Event Signature c 1 : c 2 : c 3 : c i - callee

19 Duplication of Addresses 19 f h g event A function can appear in two distinct event signatures Its body addresses might be duplicated event f h callee I-Cache addresses event g h

20 Compacting I-Cache Addresses 20 event f h g h I-Cache Addresses f h g ( 1, 1, 1, 0 ) f h g ( 1, 0, 1, 1 ) callee bit vector

21 Recording Callees and Function Bodies 21 c1c1 event signature Context Table Function Table callee bit vector c2c2 c2c2

22 Instruction Prefetcher: Facets 22 What to prefetch? When to prefetch? Instruction Prefetcher

23 When to Prefetch? When?: Important to prefetch sufficiently in advance, but not too early Goal: Prefetch the next predicted function – Able to hide LLC hit latency – Typically sufficient due to low instruction miss rate in LLC Our Design: Keep track of a speculative call stack – Predictor Stack 23

24 Predictor Stack Maintains the call stack as predicted by the prefetcher Helps prefetch the next function predicted to be called 24 f hi Predictor Stack f Function Prefetched h i h call Call Stack f hi call return i call return

25 Architecture 25 Call Stack Functio n Call Context Event-ID X Event Signature ci ci Context Table bv Function Table b 1 b 2 d EA Predicted callees, addresses Predictor Stack Prefetch Queue

26 Methodology Instrumented open source browser – Chromium – It uses the V8 JS engine shared with Google Chrome Browsing sessions of popular websites were studied – Their instruction traces were simulated with Sniper Sim Our focus was on JS code execution, which was simulated 26

27 Architectural Details 27 Modeled after Samsung Exynos 5250 Core: 4-wide OoO, 1.66 GHz L1-(I,D) Cache: 32 KB, 2-way L2 Cache: 2 MB, 16-way Energy Modeling: V dd = 1.2 V, 45 nm

28 Related Work We compare EFetch with the following designs: – L1I-64KB: Hardware overhead of EFetch provisioned towards extra L1-I cache capacity – 64 KB – N2L: Next-2 line prefetcher – CGP: Call Graph Prefetching – PIF: Proactive Instruction Fetch – RDIP: Return address stack Directed Instruction Prefetching 28 Annavaram, et. al. HPCA ‘01 Ferdman, et. al. MICRO ‘11 Kolli, et. al. MICRO ‘13

29 Prefetcher Efficacy 29

30 Performance 30

31 Energy Consumption 31 DesignCGPPIFRDIPEFetch Overhead (KB)322046339 Prefetching hardware structures consume little energy – Ranging from 0.01% of the total energy consumed for EFetch to 1.06% for PIF Erroneous prefetches consume significant fraction of energy Prefetching hardware structures consume little energy – Ranging from 0.01% of the total energy consumed for EFetch to 1.06% for PIF Erroneous prefetches consume significant fraction of energy

32 Energy, Performance, Area 32 EFetch PIF CGP RDIP N2L Performance Energy

33 Conclusion Web 2.0 places greater demands on client-side computing I-Cache performance is poor for web client-side script execution EFetch exploits the event-driven nature of web client-side script execution It achieves 29% performance improvement over no prefetching 33

34 EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha, Scott Mahlke, Satish Narayanasamy University of Michigan August, 2014 University of Michigan Electrical Engineering and Computer Science 34

35 Performance Potential 35 Perfect I-Cache: 53% speedup

36 Web Core 36 Core Equipped with simple microarchitectural enhancements accelerating the browser like MMX extensions for multimedia Web Core

37 I-Cache addresses A BC WXYZ ADEC Duplication of Addresses 37 g1g1 h1h1 g2g2 Event e 1 event sign 1 : e 1 g 1 event sign 2 : e 1 g 2 A function can appear in two distinct event signatures Its body addresses might be duplicated

38 What to Prefetch? 38 event signature prefetch addresses Naïve solution: Keep track of all I-Cache blocks accessed for each event signature I-Cache addresses ABC WXYZ

39 Duplication of Addresses 39 A, B, C, D, E, L, M, N, O, W, X, Y, Z c1c1 A, B, C, D, E c2c2 L, M, N, O c3c3 W, X, Y, Z callees callee body addresses addresses aggregate over different contexts loses context information addresses aggregate over different contexts loses context information A function can appear in two distinct event signatures Its body addresses might be duplicated

40 Preserving Context Information 40 base address (b) bit vector (bv) I-Cache Block Addresses c1c1 A, B, C, D, E c2c2 L, M, N, O c3c3 W, X, Y, Z b 11 bv 11 b 12 bv 12 b 21 bv 21 b 22 bv 22 b 31 bv 31 b 32 bv 32 Constant for a callee for all event signatures Specific to an event signature and are stored together

41 Recording Callees and Function Bodies 41 c1c1 c2c2 c3c3 event signature Context Table Function Table callee address bv 11 bv 12 bv 21 bv 22 bv 31 bv 32 b1b1 b1b1 b2b2 b2b2

42 Predictor Stack Speculative call stack – Maintains the call stack as predicted by the prefetcher – Helps prefetch one function ahead of program execution Synchronized with the Call Stack after every function call and return 42 Predictor Stack Predicted callees, addresses Prefetch addresses

43 Evolution of Web 43 Web 1.0Web 2.0 server client published content user generated content published content user generated content static web pages dynamic web pages passively view content collaborate and generate content

44 Evolution of Web 44 Web 1.0Web 2.0 compute server client compute published content user generated content published content user generated content rich user experience client side performance matters


Download ppt "EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha, Scott Mahlke, Satish Narayanasamy University of Michigan August,"

Similar presentations


Ads by Google