Presentation is loading. Please wait.

Presentation is loading. Please wait.

Skyway: Connecting Managed Heaps in Distributed Big Data Systems

Similar presentations


Presentation on theme: "Skyway: Connecting Managed Heaps in Distributed Big Data Systems"— Presentation transcript:

1 Skyway: Connecting Managed Heaps in Distributed Big Data Systems
Khanh Nguyen, Lu Fang, Christian Navasca, Harry Xu, Brian Demsky Shan Lu University of Chicago University of California, Irvine

2 BIG DATA

3 The managed runtime is costly
Data Shuffling MR, Spark Apps Send & Receive Objects JVM Skyway less The managed runtime is costly

4 deserialization serialization InputStream in =
Shuffler.GetInputStream(sender_id); while (in.hasData()) { Object o = in.readObject(); inDataset.store(o) } inDataset deserialization outDataset serialization OutputStream out = Shuffler.GetOutputStream(receiver_id); for (Object o: outDataset) { out.writeObject(o); }

5 TriangleCounting over LiveJournal on Spark 2.1.0 with 3 slaves
Data transfer costs 17% 14% 16% 18% TriangleCounting over LiveJournal on Spark with 3 slaves

6 Data transfer Network Serialization Deserialization Sender Receiver
Object Reflection.allocate Serialization Deserialization Reflection.getField Reflection.setField Network Binary

7 WANTED: a system-level solution Data transfer Network Serialization
Sender Receiver Network Serialization Deserialization Object Binary WANTED: a system-level solution Reflection.allocate Reflection.getField Reflection.setField

8 an analogy

9 Overlapping computation and data transfer
Our solution Sender Receiver Skyway Object Reflection. allocate Reflection. getField Reflection. setField Overlapping computation and data transfer

10 Skyway Overview Implemented in OpenJDK 8
Modified the class loader, the object/heap layout, the Parallel Scavenge GC Efficiently handle data transfer: Outperforms 90 serializers Improves Spark by 36% (Java) - 16% (Kryo) Improves Flink by 19%

11 Challenges Type representation Pointer representation
Automated global type numbering Pointer representation Use relative offsets Local JVM adaptation Visible for garbage collection Work pipelining Buffering

12 Type registries

13 Output & Input buffer Input buffer Output buffer Segregated by senders
Multiple for each sender In managed heap Segregated by receivers One for each receiver In native, off-the-heap memory

14 Example: Serialization
writeObject() Type Registry Integer[] 0xbb 0xcc 0xaa 3 Integer 20 30 10 “java.lang.Integer” 6 “[java.lang.Integer” 7 TypeString ID Output buffer in native memory 0xbb 0xcc 0xaa 3 7 10 6 20 6 30 6 7 11 15 Offset 7 7 11 15

15 Example: Deserialization
Offset 7 11 15 11 15 7 3 10 20 30 6 readObject() Input buffer in heap Integer[] 7 Integer 6 Integer 6 Integer 6 3 0xf7 7 0xfb 11 0xff 15 10 20 30 java.lang.Integer 6 java.lang.Integer[] 7 MetadataObject ID Type Registry java.lang.Integer java.lang.Integer java.lang.Integer java.lang.Integer[]

16 In the paper Cyclic references Shared objects Support for threads
Interaction with GC Integrating Skyway in real systems

17 Evaluations - Microbenchmark
Java Serializer Benchmark Set Extensive performance evaluation with existing 90 serializers

18 SKYWAY GOOGLE’s Protobuf Kryo (rec. by Spark) 1.8x 2.2x

19 Evaluations – Real Systems
Flink 1.3.2 5 query answering applications TPC-H datasets On average, reduces end-to-end time by 19%

20 Improvement Summary: Flink
Normalized Performance to built-in serializer -19% -23% -25% Ser. Time Deser. Time Execution Time

21 Evaluations – Real Systems
Spark 2.1.0 4 applications: WordCount, PageRank, ConnectedComponents, and TriangleCounting 4 datasets: LiveJournal, Orkut, UK-2005, and Twitter On average, reduces end-to-end time by 16% (w.r.t. Kryo) by 36% (w.r.t. Java serializer)

22 Improvement Summary: Spark
vs. Java vs. Kryo Normalized Performance to Java Serializer 0.05% -38% -16% -36% -84% -38% Ser. Time Deser. Time Execution Time

23 Conclusion Goal: Reduce data transfer costs in Big Data systems
Solution: Skyway, the first JVM-based serializer Efficiently transfer data Easy to integrate

24 Thank You!


Download ppt "Skyway: Connecting Managed Heaps in Distributed Big Data Systems"

Similar presentations


Ads by Google