Presentation is loading. Please wait.

Presentation is loading. Please wait.

July, 2016 Fangshi Li Carl Steinbach Dr. LinkedIn July, 2016 Fangshi Li Carl Steinbach.

Similar presentations


Presentation on theme: "July, 2016 Fangshi Li Carl Steinbach Dr. LinkedIn July, 2016 Fangshi Li Carl Steinbach."— Presentation transcript:

1

2

3 July, 2016 Fangshi Li Carl Steinbach
Dr. LinkedIn July, 2016 Fangshi Li Carl Steinbach

4 Agenda Hadoop at Linkedin Brief history of Dr. Elephant
Dr. Elephant overview and architecture Features Demo Dr. Elephant as a JIRA Bot (automated workflow review service) Open source status report Roadmaps

5 Hadoop at Linkedin: circa 2008
1 cluster 20 nodes 10 users 10 production workflows MR, Pig

6 Hadoop at Linkedin: Now
> 10 clusters > 10,000 nodes > 1,000 users hundreds of production workflows, thousands of development flows ad-hoc Qs MR, Pig, Hive, Gobblin, Cubert, Scalding, Spark, ...

7 What we learned along the way...
Scaling Hadoop Infrastructure is Hard

8 What we learned along the way...
Scaling User Productivity is Much Harder!

9 Agenda Hadoop at Linkedin A doctor’s history
Dr. Elephant overview and architecture Demo Dr. Elephant as a JIRA Bot(automated production review service) Roadmap

10 Dr. Elephant motivation
Efficiency matters. 10% each job -> 10% cluster People have different level of Hadoop skills. Most people don’t even know they are writing jobs with performance problems. People are making common mistakes solved by similar tuning tips Split sizes Parallelism Memory settings We should build a tool to automate the tuning for every job

11 Dr. Elephant overview automated job performance monitoring and tuning tool for Hadoop/Spark Web app that can run on any host within the cluster Single host, single process Job based. Analyze a MR/Spark job right after it finishes. Rule-based configurable heuristics Analyzed 100k jobs per day at Linkedin. Hundreds of users

12 Architecture

13 Demo

14 Demo

15 One MR example

16 Configurable rules

17

18

19 Job/Flow performance history on Dr. Elephant
Problem to solve: I want to understand how my job/workflow performance is changing over time Compare performance between runs while developing the job/flow Why my job/flow is sooo slow today? Identify/alert problem before it happens

20 Dr. Elephant flow history feature

21 Dr. Elephant as a JIRA bot
Production review at Linkedin Before 2014, Hadoop production reviews were a nightmare Painful process for flow developer & Hadoop team: An hour long start meeting Code and workflow manual inspection (recursively!). The whole process usually took up to one week(many are much longer) 10+ workflows per week!

22 JIRA Bot

23 Other service on top of Dr. Elephant
Daily report Everyday, Dr. Elephant reports/alerts jobs/flows running on the cluster with worst performance, and workflows that needs to be reviewed again Cost to serve analysis Dr. Elephant’s data is used to analyze cluster utilization and resource consumption per organization

24 Dr. Elephant Open Source Status
Open sourced April, 2016 Since then, 10+ contributors, 40+ commits 50+ topics in user group Patches by external company: Airbnb(10), foursquare(3), EverString(3), Paypal(2), flipkart(2), InMobi(1), DatumKako(1), Vinted(1) ...

25 Dr. Elephant Roadmap Beyond MR level analytics. Understand higher-level knowledge such as Pig/Hive/Cascading Support for other schedulers(oozie, airflow) and frameworks(tez) Beyond batch analytics(Real time analytics) Analytics for failed jobs (exception fingerprinting, …) JVM level insights (task profiling)

26 Resources Dr. Elephant on Github Dr. Elephant Google Group dr-elephant-users Dr. Elephant engineering blog post performance-tuning-hadoop-spark Dr. Elephant presentation from Hadoop Summit

27


Download ppt "July, 2016 Fangshi Li Carl Steinbach Dr. LinkedIn July, 2016 Fangshi Li Carl Steinbach."

Similar presentations


Ads by Google