Facade: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications UC Irvine USA Khanh Nguyen Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang,

Slides:



Advertisements
Similar presentations
CHP-5 LinkedList.
Advertisements

Object-Orientation Meets Big Data Language Techniques towards Highly- Efficient Data-Intensive Computing Harry Xu UC Irvine.
Introduction to Memory Management. 2 General Structure of Run-Time Memory.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
5. Memory Management From: Chapter 5, Modern Compiler Design, by Dick Grunt et al.
Memory allocation CSE 2451 Matt Boggus. sizeof The sizeof unary operator will return the number of bytes reserved for a variable or data type. Determine:
Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
Runtime Environments Source language issues Storage organization
Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.
Run-Time Storage Organization
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Big Kernel: High Performance CPU-GPU Communication Pipelining for Big Data style Applications Sajitha Naduvil-Vadukootu CSC 8530 (Parallel Algorithms)
1 Memory Model of A Program, Methods Overview l Memory Model of JVM »Method Area »Heap »Stack.
1 Contents. 2 Run-Time Storage Organization 3 Static Allocation In many early languages, notably assembly and FORTRAN, all storage allocation is static.
CACHETOR Detecting Cacheable Data to Remove Bloat Khanh Nguyen Guoqing Xu UC Irvine USA.
Pregel: A System for Large-Scale Graph Processing
Tutorial 6 Memory Management
1 Chapter 5: Names, Bindings and Scopes Lionel Williams Jr. and Victoria Yan CSci 210, Advanced Software Paradigms September 26, 2010.
Hyracks: A new partitioned-parallel platform for data-intensive computation Vinayak Borkar UC Irvine (joint work with M. Carey, R. Grover, N. Onose, and.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
Lecture 10 : Introduction to Java Virtual Machine
Runtime Environments Compiler Construction Chapter 7.
1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.
Java Virtual Machine Case Study on the Design of JikesRVM.
Chapter 4 Memory Management.
Computer Science and Software Engineering University of Wisconsin - Platteville 2. Pointer Yan Shi CS/SE2630 Lecture Notes.
Storage Bindings Allocation is the process by which the memory cell or collection of memory cells is assigned to a variable. These cells are taken from.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Basic Semantics Associating meaning with language entities.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. C H A P T E R F I V E Memory Management.
Dynamic Memory Allocation. Domain A subset of the total domain name space. A domain represents a level of the hierarchy in the Domain Name Space, and.
PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki
1 Recursive Data Structure Profiling Easwaran Raman David I. August Princeton University.
Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.
Standard Template Library The Standard Template Library was recently added to standard C++. –The STL contains generic template classes. –The STL permits.
Optimizing MapReduce for GPUs with Effective Shared Memory Usage Department of Computer Science and Engineering The Ohio State University Linchuan Chen.
Transparent Pointer Compression for Linked Data Structures June 12, 2005 MSP Chris Lattner Vikram Adve.
Programming Languages and Paradigms Activation Records in Java.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
1 Lecture07: Memory Model 5/2/2012 Slides modified from Yin Lou, Cornell CS2022: Introduction to C.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Factorbird: a Parameter Server Approach to Distributed Matrix Factorization Sebastian Schelter, Venu Satuluri, Reza Zadeh Distributed Machine Learning.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Processes and threads.
Spark Presentation.
Compositional Pointer and Escape Analysis for Java Programs
Seminar in automatic tools for analyzing programs with dynamic memory
Harry Xu University of California, Irvine & Microsoft Research
Yak: A High-Performance Big-Data-Friendly Garbage Collector
Speculative Region-based Memory Management for Big Data Systems
CACHETOR Detecting Cacheable Data to Remove Bloat
Skyway: Connecting Managed Heaps in Distributed Big Data Systems
Yak: A High-Performance Big-Data-Friendly Garbage Collector
Department of Computer Science University of California, Santa Barbara
Pregelix: Big(ger) Graph Analytics on A Dataflow Engine
understanding memory usage by a c++ program
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
Memory Management Kathryn McKinley.
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights Feng Zhang †⋄, Jidong Zhai ⋄, Xipeng Shen #, Onur Mutlu ⋆, Wenguang.
Pregelix: Think Like a Vertex, Scale Like Spandex
Department of Computer Science University of California, Santa Barbara
RUN-TIME STORAGE Chuen-Liang Chen Department of Computer Science
Run-time environments
CMPE 152: Compiler Design May 2 Class Meeting
Presentation transcript:

Facade: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications UC Irvine USA Khanh Nguyen Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, Harry Xu

Scalability ◦ JVM crashes due to OutOfMemory error at early stage Management cost ◦ GC time accounts for up to 50% of the execution time [Bu et al. ISMM ’13]

Golden rule for scalability The number of heap objects and references must not grow proportionally with the cardinality of the dataset Non-intrusive techniqueOperate at compiler levelGeneral and practicalRequire minimal user’s effort Statically bound the number of data objects in the heap

Facade execution model Use the off-heap, native memory to store unbounded data items Create heap objects only for control purposes ◦ Bounded object pooling

Benefits from Facade Significantly reduced GC time Reduced memory consumption Reduced memory access costs Reduced execution time Improved scalability

Static bound of data objects s : cardinality of the data set t : number of threads n : number of data types p : number of pages

Data representation Memory address is used as object reference (pointer)= pageRef Native memory id students name

Data manipulation Object references are substituted by pageRef Objects are created as facades for control purposes Professor p = f;long pRef = fRef;

p.addStudent(s); ProfessorFacade pf = professorPool[0]; StudentFacade sf = studentPool[0]; pf.pageRef = pRef; sf.pageRef = sRef; pf.addStudent(sf); Have only pRef and sRef void addStudent(StudentFacade sf) { long thisRef = this.pageRef; long sRef = sf.pageRef; //... }

p.addStudents (s1,s2,s3,s4,s5) pf = professorPool[0]; sf1 = studentPool[0]; sf2 = studentPool[1]; sf3 = studentPool[2]; sf4 = studentPool[3]; sf5 = studentPool[4]; pf.addStudents (sf1,sf2,sf3,sf4,sf5)

Challenge 1 Dynamic dispatch ◦ Use type ID in the record’s header ◦ Parameter facade pool ◦ Separated receiver facade pool p.addStudent(s); ProfessorFacade pf = FacadeRuntime.resolve(pRef);

Challenge 2 Concurrency: ◦ Thread-local facade pooling ◦ Global lock pool to support object locks enterMonitor(o); … exitMonitor(o); Get a free lock l from the lock pool; Write l into the lock field of oRef l.compareAndInc(); enterMonitor(l); … exitMonitor(l); if(l.compareAndDec() == 0){ Write 0 into the lock field of oRef return l to the pool;} Object locks

Memory management Allocation ◦ High-performance parallel allocator  Thread-local managers  Uses different size classes Insights: ◦ Data-processing functions are iteration-based ◦ Each iteration processes distinct data partition ◦ Data objects in each iteration have disjoint lifetime Deallocation ◦ Use a user-provided pair of calls to recycle pages:  iteration_start() && iteration_end() ◦ Iterations are well-defined --- it took us only a few minutes to find iterations and insert callbacks in GraphChi

Other supports Optimizations: ◦ Object inlining for records whose size is known statically ◦ Oversized pages for large arrays ◦ Type specialization ◦…◦… Support most of Java 7 features Details can be found in the paper

Experiments GraphChi [ Kyrola et al. OSDI’12] ◦ High-performance graph analytical framework for a single machine Hyracks [ Borkar et al. ICDE’11] ◦ Data parallel platform to run data-intensive jobs on a cluster of shared-nothing machines GPS [ Salihoglu and Widom SSDBM’13] ◦ Distributed graph processing system for large graphs 3 frameworks, 7 applications

GraphChi Connected Component Page Rank

GraphChi - Throughput X 10 5 Facade

Hyracks External Sort Word Count The original program crashed in all of these sets thus no figure

GPS Page Rank, KMeans & Random Walk ◦ Reduction in largest graph: 120M vertices, 1.7B edges

GPS Page Rank, KMeans & Random Walk ◦ Average cumulative reduction

Results GraphChi (Page Rank & Connected Components) ◦ Up to 6.4x reduction in GC time ◦ Up to 28% reduction in memory usage (6+GB datasets) ◦ Up to 48% reduction in execution time ◦ 1.4x improvement in throughput Hyracks (Word Count & External Sort) ◦ 3.8x improvement in scalability ◦ Up to 88x reduction in GC time ◦ Up to 32% reduction in memory usage ◦ Up to 10% reduction in execution time GPS (Page Rank, KMeans & Random Walk) ◦ Up to 40% reduction in GC time ◦ Up to 15% reduction in execution time

Conclusion Facade is a complete package: ◦ Compiler: automatically transform existing programs ◦ Runtime system: run on top of JVM, i.e., no modification of JVM Thank you!