1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

2 Agenda Big Data Trends What is Jumbune? Description of Components

3 Big Data Trends Resource sharing/isolation frameworks: Yarn, Mesos, etc. Shared cluster workers (resources) Multiple Execution engines: MapReduce, Spark, Hama, Storm, Giraph, etc. Data ETLing from all possible sources to Data Lake

4 Hadoop based solution life stages (as on ground) – Cyclic execution xxx Business User Data AnalystMapReduce Dev Logic & Data Test Devops Staging Data Production Bad Logic? Resource Utilization ? Bad Data? Monitoring Needs

5 5 Challenges in Analytical Solutions 1. No common platform across actors to detect root causes 2. Incremental imports may ingest bad data 3. Cluster resources are shared and optimal utilization is key 4. Implementing models in custom MR in initial attempts is like hitting bull’s eye 5. Bad Logic or Bad data

6 Intersecting solution Lifecycle Stages xxx Solution Development Quality Test Devops Bulk & Incremental Data

7 Jumbune Flow AnalyzerData Validation Cluster MonitorJob Profiler “A catalyst to accelerate realization of analytical solutions”

8 Niche offerings In depth code level analysis of cluster wide flow Record level data violation reports. No deployment on Workers - Ultra light agent installation on Hadoop master only Ability to turn on/off cluster monitoring at will – lessens resource load Customizable rack aware monitoring Correlated profiling analysis of phases, throughput and resource consumption Ability to work across all Hadoop Distributions

9 Components - Recommended Environments Dev Flow Debugger Data Validation MR Job Profiler QA Data Validation Stage + Perf MR Job Profiler Prod Cluster Monitoring Data Validation

10 Supported Deployments Jumbune Azure, EC2 All major distributions On Premise

11 MapReduce Flow Debugger Verifies the flow of input records in user’s map reduce implementation Drill down visualization helps developer to quickly identify the problem. Only tool to assist developers to figure out MapReduce implementation faults without any extra coding

12 Data Validator Validates inconsistencies in data in the form of : – Null checks – Data type checks – Regular expression checks Generic way of specifying validation rules Provides record level report for found anomalies Currently supports HDFS as the lake file system

13 MR Job Profiling Per Job Phase wise – performance for each JVM – data flow rate – Resource usage Per Job Heap sites for Mapper & Reducer Per Job CPU cycles for Mapper & Reducer

14 Hadoop Cluster Monitoring Data Centre & Rack aware nodes view of Yarn and Non Yarn Daemons Dynamic Interval based monitoring Hadoop JMX, Node Resource Statistics Per file, node wise replica Placement (which nodes have replicas of a given file ?) HDFS data placement view (HDFS balanced ?)

15 How we are building Jumbune?

16 Let’s Collaborate Website http://jumbune.org Contribute http://github.com/impetus-opensource/jumbune http://jumbune.org/jira/JUM Social Follow @jumbune Use #jumbune Jumbune Group: http://linkd.in/1mUmcYm Forums Users: users-subscribe@collaborate.jumbune.org Dev: dev-subscribe@collaborate.jumbune.org Issues: issues-subscribe@collaborate.jumbune.org Downloads http://jumbune.org https://bintray.com/jumbune/downloads/jumbune

17 Thanks

1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

Similar presentations

Presentation on theme: "1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

Similar presentations

Presentation on theme: "1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)"— Presentation transcript:

Similar presentations

About project

Feedback