Download presentation
Presentation is loading. Please wait.
Published byDaniela Eaton Modified over 9 years ago
1
Tachyon: Reliable File Sharing at Memory-Speed Across Cluster Frameworks Haoyuan Li UC Berkeley
2
Outline | Motivation| Design | Results| Status| Future
System Design Evaluation Results Release Status Future Directions Outline | Motivation| Design | Results| Status| Future
3
Outline| Motivation | Design | Results| Status| Future
Memory is King Outline| Motivation | Design | Results| Status| Future
4
Outline| Motivation | Design | Results| Status| Future
Memory Trend RAM throughput increasing exponentially Outline| Motivation | Design | Results| Status| Future
5
Outline| Motivation | Design | Results| Status| Future
Disk Trend Disk throughput increasing slowly Outline| Motivation | Design | Results| Status| Future
6
Outline| Motivation | Design | Results| Status| Future
Consequence Memory locality key to achieve Interactive queries Fast query response Outline| Motivation | Design | Results| Status| Future
7
Current Big Data Eco-system
Many frameworks already leverage memory e.g. Spark, Shark, and other projects File sharing among jobs replicated to disk Replication enables fault-tolerance Problems Disk scan is slow for read. Synchronous disk replication for write is even slower. Outline| Motivation | Design | Results| Status| Future
8
Outline| Motivation | Design | Results| Status| Future
Tachyon Project Reliable file sharing at memory-speed across cluster frameworks/jobs Challenge How to achieve reliable file sharing without replication? Outline| Motivation | Design | Results| Status| Future
9
Re-computation (Lineage) based storage using memory aggressively.
Idea Re-computation (Lineage) based storage using memory aggressively. One copy of data in memory (Fast) Upon failure, re-compute data using lineage (Fault tolerant) Outline| Motivation | Design | Results| Status| Future
10
Outline| Motivation | Design | Results| Status| Future
Stack Outline| Motivation | Design | Results| Status| Future
11
Outline| Motivation | Design | Results| Status| Future
System Architecture Outline| Motivation | Design | Results| Status| Future
12
Outline| Motivation | Design | Results| Status| Future
Lineage Outline| Motivation | Design | Results| Status| Future
13
Outline| Motivation | Design | Results| Status| Future
Lineage Information Binary program Configuration Input Files List Output Files List Dependency Type Outline| Motivation | Design | Results| Status| Future
14
Outline| Motivation | Design | Results| Status| Future
Fault Recovery Time Re-computation Cost? Outline| Motivation | Design | Results| Status| Future
15
Outline| Motivation | Design | Results| Status| Future
Example Outline| Motivation | Design | Results| Status| Future
16
Asynchronous Checkpoint
Better than using existing solutions even under failure. Bounded recovery time (Naïve and Snapshot asynchronous checkpointing). Outline| Motivation | Design | Results| Status| Future
17
Master Fault Tolerance
Multiple masters Use ZooKeeper to elect a leader After crash workers contact new leader Update the state of leader with contents of caches Outline| Motivation | Design | Results| Status| Future
18
Implementation Details
15,000+ lines of JAVA Thrift for data transport Underlayer file system supports HDFS, S3, localFS, GlusterFS Maven, Jenkins Outline| Motivation | Design | Results| Status| Future
19
Sequential Read using Spark
Theoretical Maximum Disk Throughput Flat Datacenter Storage Outline| Motivation | Design | Results | Status| Future
20
Sequential Write using Spark
Theoretical Maximum Disk Throughput Flat Datacenter Storage Outline| Motivation | Design | Results | Status| Future
21
Realistic Workflow using Spark
Outline| Motivation | Design | Results | Status| Future
22
Realistic Workflow Under Failure
Outline| Motivation | Design | Results | Status| Future
23
Conviva Spark Query (I/O intensive)
Tachyon outperforms Spark cache because of JAVA GC More than 75x speedup Outline| Motivation | Design | Results | Status| Future
24
Conviva Spark Query (less I/O intensive)
12x speedup GC kicks in earlier for Spark cache Outline| Motivation | Design | Results | Status| Future
25
Outline| Motivation | Design | Results | Status | Future
Alpha Status Releases Developer Preview: V0.2.1 (4/25/2013) Contributions from: Outline| Motivation | Design | Results | Status | Future
26
Outline| Motivation | Design | Results | Status | Future
Alpha Status First read of files cached in-memory Writes go synchronously to HDFS (No lineage information in Developer Preview release) MapReduce and Spark can run without any code change (ser/de becomes the new bottleneck) Outline| Motivation | Design | Results | Status | Future
27
Outline| Motivation | Design | Results | Status | Future
Current Features Java-like file API Compatible with Hadoop Master fault tolerance Native support for raw tables WhiteList, PinList Command line interaction Web user interface Outline| Motivation | Design | Results | Status | Future
28
Spark without Tachyon val file = sc.textFile(“hdfs://ip:port/path”)
Outline| Motivation | Design | Results | Status | Future
29
Spark with Tachyon val file = sc.textFile(“tachyon:// ip:port/path”)
Outline| Motivation | Design | Results | Status | Future
30
Shark without Tachyon CREATE TABLE orders_cached AS SELECT * FROM orders; Outline| Motivation | Design | Results | Status | Future
31
Shark with Tachyon CREATE TABLE orders_tachyon AS SELECT * FROM orders; Outline| Motivation | Design | Results | Status | Future
32
Outline| Motivation | Design | Results | Status | Future
Experiments on Shark Shark (from 0.7) can store tables in Tachyon with fast columnar Ser/De 20 GB data / 5 machines Spark Cache Tachyon Table Full Scan 1.4 sec 1.5 sec GroupBys (10 GB Shark Memory) 50 – 90 sec 45 – 50 sec GroupBys (15 GB Shark Memory) 44 – 48 sec 37 – 45 sec Outline| Motivation | Design | Results | Status | Future
33
Outline| Motivation | Design | Results | Status | Future
Experiments on Shark Shark (from 0.7) can store tables in Tachyon with fast columnar Ser/De 20 GB data / 5 machines Spark Cache Tachyon Table Full Scan 1.4 sec 1.5 sec GroupBys (10 GB Shark Memory) 50 – 90 sec 45 – 50 sec GroupBys (15 GB Shark Memory) 44 – 48 sec 37 – 45 sec 4 * 100 GB TPC-H data / 17 machines Spark Cache Tachyon TPC-H Q1 65.68 sec 24.75 sec TPC-H Q2 sec sec TPC-H Q3 sec 55.99 sec TPC-H Q4 sec sec Outline| Motivation | Design | Results | Status | Future
34
Outline| Motivation | Design | Results | Status | Future
Efficient Ser/De support Fair sharing for memory Full support for lineage Next release is coming soon Outline| Motivation | Design | Results | Status | Future
35
Outline| Motivation | Design | Results | Status | Future
Acknowledgment Research Team: Haoyuan Li, Ali Ghodsi, Matei Zaharia, Eric Baldeschwieler , Scott Shenker, Ion Stoica Code Contributors: Haoyuan Li, Calvin Jia, Bill Zhao, Mark Hamstra, Rong Gu, Hobin Yoon, Vamsi Chitters, Reynold Xin, Srinivas Parayya, Dilip Joseph Outline| Motivation | Design | Results | Status | Future
36
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.