Download presentation
Presentation is loading. Please wait.
Published bySilvester Dean Modified over 6 years ago
3
July, 2016 Fangshi Li Carl Steinbach
Dr. LinkedIn July, 2016 Fangshi Li Carl Steinbach
4
Agenda Hadoop at Linkedin Brief history of Dr. Elephant
Dr. Elephant overview and architecture Features Demo Dr. Elephant as a JIRA Bot (automated workflow review service) Open source status report Roadmaps
5
Hadoop at Linkedin: circa 2008
1 cluster 20 nodes 10 users 10 production workflows MR, Pig
6
Hadoop at Linkedin: Now
> 10 clusters > 10,000 nodes > 1,000 users hundreds of production workflows, thousands of development flows ad-hoc Qs MR, Pig, Hive, Gobblin, Cubert, Scalding, Spark, ...
7
What we learned along the way...
Scaling Hadoop Infrastructure is Hard
8
What we learned along the way...
Scaling User Productivity is Much Harder!
9
Agenda Hadoop at Linkedin A doctor’s history
Dr. Elephant overview and architecture Demo Dr. Elephant as a JIRA Bot(automated production review service) Roadmap
10
Dr. Elephant motivation
Efficiency matters. 10% each job -> 10% cluster People have different level of Hadoop skills. Most people don’t even know they are writing jobs with performance problems. People are making common mistakes solved by similar tuning tips Split sizes Parallelism Memory settings We should build a tool to automate the tuning for every job
11
Dr. Elephant overview automated job performance monitoring and tuning tool for Hadoop/Spark Web app that can run on any host within the cluster Single host, single process Job based. Analyze a MR/Spark job right after it finishes. Rule-based configurable heuristics Analyzed 100k jobs per day at Linkedin. Hundreds of users
12
Architecture
13
Demo
14
Demo
15
One MR example
16
Configurable rules
19
Job/Flow performance history on Dr. Elephant
Problem to solve: I want to understand how my job/workflow performance is changing over time Compare performance between runs while developing the job/flow Why my job/flow is sooo slow today? Identify/alert problem before it happens
20
Dr. Elephant flow history feature
21
Dr. Elephant as a JIRA bot
Production review at Linkedin Before 2014, Hadoop production reviews were a nightmare Painful process for flow developer & Hadoop team: An hour long start meeting Code and workflow manual inspection (recursively!). The whole process usually took up to one week(many are much longer) 10+ workflows per week!
22
JIRA Bot
23
Other service on top of Dr. Elephant
Daily report Everyday, Dr. Elephant reports/alerts jobs/flows running on the cluster with worst performance, and workflows that needs to be reviewed again Cost to serve analysis Dr. Elephant’s data is used to analyze cluster utilization and resource consumption per organization
24
Dr. Elephant Open Source Status
Open sourced April, 2016 Since then, 10+ contributors, 40+ commits 50+ topics in user group Patches by external company: Airbnb(10), foursquare(3), EverString(3), Paypal(2), flipkart(2), InMobi(1), DatumKako(1), Vinted(1) ...
25
Dr. Elephant Roadmap Beyond MR level analytics. Understand higher-level knowledge such as Pig/Hive/Cascading Support for other schedulers(oozie, airflow) and frameworks(tez) Beyond batch analytics(Real time analytics) Analytics for failed jobs (exception fingerprinting, …) JVM level insights (task profiling)
26
Resources Dr. Elephant on Github Dr. Elephant Google Group dr-elephant-users Dr. Elephant engineering blog post performance-tuning-hadoop-spark Dr. Elephant presentation from Hadoop Summit
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.