Download presentation
Presentation is loading. Please wait.
Published by嬉 包 Modified over 6 years ago
1
Skyway: Connecting Managed Heaps in Distributed Big Data Systems
Khanh Nguyen, Lu Fang, Christian Navasca, Harry Xu, Brian Demsky Shan Lu University of Chicago University of California, Irvine
2
BIG DATA
3
The managed runtime is costly
Data Shuffling MR, Spark Apps Send & Receive Objects JVM Skyway less The managed runtime is costly
4
deserialization serialization InputStream in =
Shuffler.GetInputStream(sender_id); while (in.hasData()) { Object o = in.readObject(); inDataset.store(o) } inDataset deserialization outDataset serialization OutputStream out = Shuffler.GetOutputStream(receiver_id); for (Object o: outDataset) { out.writeObject(o); }
5
TriangleCounting over LiveJournal on Spark 2.1.0 with 3 slaves
Data transfer costs 17% 14% 16% 18% TriangleCounting over LiveJournal on Spark with 3 slaves
6
Data transfer Network Serialization Deserialization Sender Receiver
Object Reflection.allocate Serialization Deserialization Reflection.getField Reflection.setField Network Binary
7
WANTED: a system-level solution Data transfer Network Serialization
Sender Receiver Network Serialization Deserialization Object Binary WANTED: a system-level solution Reflection.allocate Reflection.getField Reflection.setField
8
an analogy
9
Overlapping computation and data transfer
Our solution Sender Receiver Skyway Object Reflection. allocate Reflection. getField Reflection. setField Overlapping computation and data transfer
10
Skyway Overview Implemented in OpenJDK 8
Modified the class loader, the object/heap layout, the Parallel Scavenge GC Efficiently handle data transfer: Outperforms 90 serializers Improves Spark by 36% (Java) - 16% (Kryo) Improves Flink by 19%
11
Challenges Type representation Pointer representation
Automated global type numbering Pointer representation Use relative offsets Local JVM adaptation Visible for garbage collection Work pipelining Buffering
12
Type registries
13
Output & Input buffer Input buffer Output buffer Segregated by senders
Multiple for each sender In managed heap Segregated by receivers One for each receiver In native, off-the-heap memory
14
Example: Serialization
writeObject() Type Registry Integer[] 0xbb 0xcc 0xaa 3 Integer 20 30 10 “java.lang.Integer” 6 “[java.lang.Integer” 7 TypeString ID Output buffer in native memory 0xbb 0xcc 0xaa 3 7 10 6 20 6 30 6 7 11 15 Offset 7 7 11 15
15
Example: Deserialization
Offset 7 11 15 11 15 7 3 10 20 30 6 readObject() Input buffer in heap Integer[] 7 Integer 6 Integer 6 Integer 6 3 0xf7 7 0xfb 11 0xff 15 10 20 30 java.lang.Integer 6 java.lang.Integer[] 7 MetadataObject ID Type Registry java.lang.Integer java.lang.Integer java.lang.Integer java.lang.Integer[]
16
In the paper Cyclic references Shared objects Support for threads
Interaction with GC Integrating Skyway in real systems
17
Evaluations - Microbenchmark
Java Serializer Benchmark Set Extensive performance evaluation with existing 90 serializers
18
SKYWAY GOOGLE’s Protobuf Kryo (rec. by Spark) 1.8x 2.2x
19
Evaluations – Real Systems
Flink 1.3.2 5 query answering applications TPC-H datasets On average, reduces end-to-end time by 19%
20
Improvement Summary: Flink
Normalized Performance to built-in serializer -19% -23% -25% Ser. Time Deser. Time Execution Time
21
Evaluations – Real Systems
Spark 2.1.0 4 applications: WordCount, PageRank, ConnectedComponents, and TriangleCounting 4 datasets: LiveJournal, Orkut, UK-2005, and Twitter On average, reduces end-to-end time by 16% (w.r.t. Kryo) by 36% (w.r.t. Java serializer)
22
Improvement Summary: Spark
vs. Java vs. Kryo Normalized Performance to Java Serializer 0.05% -38% -16% -36% -84% -38% Ser. Time Deser. Time Execution Time
23
Conclusion Goal: Reduce data transfer costs in Big Data systems
Solution: Skyway, the first JVM-based serializer Efficiently transfer data Easy to integrate
24
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.