Download presentation
Presentation is loading. Please wait.
Published byMeryl Lee Modified over 9 years ago
1
1 Jumbune Data Analyzer
2
2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?
3
3 Data ETLing from all possible sources to Enterprise Data Lake through Real time ingestion Micro batch ingestion Batch ingestion A unified hub makes analysis, management and access of data easier. Enterprise data lake enables ecosystem tools to collaboratively manage data. A place to store all data in its original fidelity, with the flexibility to run a variety of Enterprise workloads. One Unified System: An Enterprise Data Lake
4
4 Data Quality – data values as per business KPI Data Profiling – statistical assessment of data Data Governance – management of data Data Lineage – define data lifecycle Data Security – protecting data from unauthorized users Key elements of an Enterprise Data Lake BIG DATA
5
5 Incremental imports may ingest Bad Data Analyzing anomalies in HDFS data Tracking data quality over time Tracing bad data out of billions of rows Displaying concise meaningful results Major challenges in Data Analysis
6
6 Jumbune’s Data Analyzer
7
7 Gain a better control over Data Analysis ControlAnalyse ProfileQuality TimelinesViolations Business Rules Anomalies Gives a centralized dashboard for profiling data quality to gain better control Leverage Jumbune’s infrastructure to get capabilities of remote profiling capabilities No data movement required for performing data profiling No specialized MapReduce or coding skills are required to validate data.
8
8 Offering Data Quality and Data Profiling to Enterprise Data Lake Tracing the conservation of data quality on timeline, even in massive data offloading environment. Real time data quality monitoring tracked against customizable KPIs Statistic assessment of data values within a data set for consistency, uniqueness and logic. Gauging the data profiles as per the business rules. Data Quality Timeline Data Profiling
9
9 jumbune Data Analysis Component Data Analysis Process HDFS/NFSRecords AnalysisData Profiling & Quality Reports
10
10 Validates inconsistencies in data in form of : Null Checks Data Type Checks Regular Expressions In depth record level data violation reports, can be drilled to line and field level. Offers to generically specify data quality requirements according to user’s data lake. Makes impossible looking quality checks on Big Data Lake possible. Doesn’t require data to be moved out of Hadoop for testifying anomalies Currently, Jumbune supports HDFS, NFS as Data Lake. Data Quality: Provides Generic way of testifying Anomalies
11
11 Data Profiling: Provides lake insights Remote Centralized Integrate Generic Statistical analysis of data values present in the enterprise data lake. Computes various profiles that help you become familiar with data. Evaluating structure of the data set in the enterprise data lake according to the set of business rules. Helps to know whether existing data can be used for more analytics.
12
Let’s provision a clean Enterprise Data Lake Website http://jumbune.org Contribute http://github.com/impetus-opensource/jumbune http://jumbune.org/jira/JUM Social Follow @jumbune Use #jumbune Jumbune Group: http://linkd.in/1mUmcYm Forums Users: users-subscribe@collaborate.jumbune.org Dev: dev-subscribe@collaborate.jumbune.org Issues: issues-subscribe@collaborate.jumbune.org Downloads http://jumbune.org https://bintray.com/jumbune/downloads/jumbune
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.