Skyway: Connecting Managed Heaps in Distributed Big Data Systems

Slides:



Advertisements
Similar presentations
Object-Orientation Meets Big Data Language Techniques towards Highly- Efficient Data-Intensive Computing Harry Xu UC Irvine.
Advertisements

MapReduce.
A Coherent and Managed Runtime for ML on the SCC KC SivaramakrishnanLukasz Ziarek Suresh Jagannathan Purdue University SUNY Buffalo Purdue University.
Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.
Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,
Efficient Sparse Matrix-Matrix Multiplication on Heterogeneous High Performance Systems AACEC 2010 – Heraklion, Crete, Greece Jakob Siegel 1, Oreste Villa.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Spark: Cluster Computing with Working Sets
Implementing Remote Procedure Calls Andrew Birrell and Bruce Nelson Presented by Kai Cong.
ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.
Task-aware Garbage Collection in a Multi-Tasking Virtual Machine Sunil Soman Laurent Daynès Chandra Krintz RACE Lab, UC Santa Barbara Sun Microsystems.
Programming Language Semantics Java Threads and Locks Informal Introduction The Java Specification Language Chapter 17.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Facade: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications UC Irvine USA Khanh Nguyen Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang,
1 RISE: Randomization Techniques for Software Security Dawn Song CMU Joint work with Monica Chew (UC Berkeley)
February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,
Intro to Java The Java Virtual Machine. What is the JVM  a software emulation of a hypothetical computing machine that runs Java bytecodes (Java compiler.
JAVA v.s. C++ Programming Language Comparison By LI LU SAMMY CHU By LI LU SAMMY CHU.
Java Security. Topics Intro to the Java Sandbox Language Level Security Run Time Security Evolution of Security Sandbox Models The Security Manager.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Lecture 10 : Introduction to Java Virtual Machine
Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++
CSC 253 Lecture 2. Some differences between Java and C  Compiled C code is machine specific, whereas Java compiles for a virt. machine.  Virtual machines.
Drew Freer, Beayna Grigorian, Collin Lambert, Alfonso Roman, Brian Soumakian.
Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.
Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.
Making Watson Fast Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds –
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.
Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
©SoftMoore ConsultingSlide 1 Serialization. ©SoftMoore ConsultingSlide 2 Serialization Allows objects to be written to a stream Can be used for persistence.
From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,
Big Data Engineering: Recent Performance Enhancements in JVM- based Frameworks Mayuresh Kunjir.
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
1 Copyright © 2011 Tata Consultancy Services Limited TCS Internal.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
IncApprox The marriage of incremental and approximate computing Pramod Bhatotia Dhanya Krishnan, Do Le Quoc, Christof Fetzer, Rodrigo Rodrigues* (TU Dresden.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Synergy.cs.vt.edu VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units Shucai Xiao 1, Pavan Balaji 2, Qian Zhu 3,
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
SketchVisor: Robust Network Measurement for Software Packet Processing
Big Data is a Big Deal!.
Institute of Parallel and Distributed Systems (IPADS)
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Machine Learning Library for Apache Ignite
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
Java 9: The Quest for Very Large Heaps
Spark Presentation.
Yak: A High-Performance Big-Data-Friendly Garbage Collector
Speculative Region-based Memory Management for Big Data Systems
An Empirical Analysis of Java Performance Quality
Improving java performance using Dynamic Method Migration on FPGAs
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Yak: A High-Performance Big-Data-Friendly Garbage Collector
StreamApprox Approximate Stream Analytics in Apache Flink
NumaGiC: A garbage collector for big-data on big NUMA machines
StreamApprox Approximate Stream Analytics in Apache Spark
StreamApprox Approximate Computing for Stream Analytics
On Spatial Joins in MapReduce
Adaptive Code Unloading for Resource-Constrained JVMs
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights Feng Zhang †⋄, Jidong Zhai ⋄, Xipeng Shen #, Onur Mutlu ⋆, Wenguang.
Pregelix: Think Like a Vertex, Scale Like Spandex
Android Topics UI Thread and Limited processing resources
MAPREDUCE TYPES, FORMATS AND FEATURES
CSC 253 Lecture 2.
Gary M. Zoppetti Gagan Agrawal Rishi Kumar
JIT Compiler Design Maxine Virtual Machine Dhwani Pandya
Map Reduce, Types, Formats and Features
Presentation transcript:

Skyway: Connecting Managed Heaps in Distributed Big Data Systems Khanh Nguyen, Lu Fang, Christian Navasca, Harry Xu, Brian Demsky Shan Lu University of Chicago University of California, Irvine

BIG DATA

The managed runtime is costly Data Shuffling MR, Spark Apps Send & Receive Objects JVM Skyway less The managed runtime is costly

deserialization serialization InputStream in = Shuffler.GetInputStream(sender_id); while (in.hasData()) { Object o = in.readObject(); inDataset.store(o) } inDataset deserialization outDataset serialization OutputStream out = Shuffler.GetOutputStream(receiver_id); for (Object o: outDataset) { out.writeObject(o); }

TriangleCounting over LiveJournal on Spark 2.1.0 with 3 slaves Data transfer costs 17% 14% 16% 18% TriangleCounting over LiveJournal on Spark 2.1.0 with 3 slaves

Data transfer Network Serialization Deserialization Sender Receiver Object Reflection.allocate Serialization Deserialization Reflection.getField Reflection.setField Network Binary

WANTED: a system-level solution Data transfer Network Serialization Sender Receiver Network Serialization Deserialization Object Binary WANTED: a system-level solution Reflection.allocate Reflection.getField Reflection.setField

an analogy

Overlapping computation and data transfer Our solution Sender Receiver Skyway Object Reflection. allocate Reflection. getField Reflection. setField Overlapping computation and data transfer

Skyway Overview Implemented in OpenJDK 8 Modified the class loader, the object/heap layout, the Parallel Scavenge GC Efficiently handle data transfer: Outperforms 90 serializers Improves Spark by 36% (Java) - 16% (Kryo) Improves Flink by 19%

Challenges Type representation Pointer representation Automated global type numbering Pointer representation Use relative offsets Local JVM adaptation Visible for garbage collection Work pipelining Buffering

Type registries

Output & Input buffer Input buffer Output buffer Segregated by senders Multiple for each sender In managed heap Segregated by receivers One for each receiver In native, off-the-heap memory

Example: Serialization writeObject() Type Registry Integer[] 0xbb 0xcc 0xaa 3 Integer 20 30 10 “java.lang.Integer” 6 “[java.lang.Integer” 7 TypeString ID Output buffer in native memory 0xbb 0xcc 0xaa 3 7 10 6 20 6 30 6 7 11 15 Offset 7 7 11 15

Example: Deserialization Offset 7 11 15 11 15 7 3 10 20 30 6 readObject() Input buffer in heap Integer[] 7 Integer 6 Integer 6 Integer 6 3 0xf7 7 0xfb 11 0xff 15 10 20 30 java.lang.Integer 6 java.lang.Integer[] 7 MetadataObject ID Type Registry java.lang.Integer java.lang.Integer java.lang.Integer java.lang.Integer[]

In the paper Cyclic references Shared objects Support for threads Interaction with GC Integrating Skyway in real systems

Evaluations - Microbenchmark Java Serializer Benchmark Set Extensive performance evaluation with existing 90 serializers

SKYWAY GOOGLE’s Protobuf Kryo (rec. by Spark) 1.8x 2.2x

Evaluations – Real Systems Flink 1.3.2 5 query answering applications TPC-H datasets On average, reduces end-to-end time by 19%

Improvement Summary: Flink Normalized Performance to built-in serializer -19% -23% -25% Ser. Time Deser. Time Execution Time

Evaluations – Real Systems Spark 2.1.0 4 applications: WordCount, PageRank, ConnectedComponents, and TriangleCounting 4 datasets: LiveJournal, Orkut, UK-2005, and Twitter On average, reduces end-to-end time by 16% (w.r.t. Kryo) by 36% (w.r.t. Java serializer)

Improvement Summary: Spark vs. Java vs. Kryo Normalized Performance to Java Serializer 0.05% -38% -16% -36% -84% -38% Ser. Time Deser. Time Execution Time

Conclusion Goal: Reduce data transfer costs in Big Data systems Solution: Skyway, the first JVM-based serializer Efficiently transfer data Easy to integrate

Thank You!