Download presentation
Presentation is loading. Please wait.
1
Pentaho 7.1
2
Data Discovery / Analysis
Current State Today Data Engineering Data Prep Analytics Ingestion Processing Blending Data Delivery Data Discovery / Analysis Analysis & Dashboards Administration Security Lifecycle Management Data Provenance Dynamic Data Pipeline Monitoring Automation
3
Future Vision: A Single Flow
Data Engineering Data Prep Analytics Ingestion Processing Blending Data Delivery Data Discovery / Analysis Analysis & Dashboards Administration Security Lifecycle Management Data Provenance Dynamic Data Pipeline Monitoring Automation Looking to build out a single platform where data engineers, data analysts, business analysts and data scientists can enter anywhere and make data useful for the business
4
Industry Challenges: An Evolving Big Data Landscape
Rapidly evolving Big Data technologies and landscape Disjointed Tools Growth in volumes and varieties of data Pentaho 7.1 Those working in data are working in a turbulent, rapidly changing data world. They are facing the challenges of: Growth of volumes and variety of data Rapidly evolving big data technologies and landscape Disjointed tools 7.1 is built to ease these challenges
5
Analyze data anywhere in the data pipeline
Recap: Pentaho 7.0 Analyze data anywhere in the data pipeline Data Prep Data Engineering Analytics Ingestion Processing Blending Data Delivery Data Discovery / Analysis Analysis & Dashboards Administration Security Lifecycle Management Data Provenance Dynamic Data Pipeline Monitoring Automation Bridging the gap between data preparation and analytics with a visual data experience from anywhere in the data pipeline: Bringing analytics into data prep Share analytics during data prep And governance, security, and big data ecosystem support for a blended data world New Spark capabilities Enhanced metadata Injection Hadoop security Simplified configuration, deployment, and administration
6
Introducing Pentaho 7.1 Adaptive execution and improved visualizations make users more productive and improves big data job performance across the entire data pipeline. Data Prep Data Engineering Analytics Ingestion Processing Blending Data Delivery Data Discovery / Analysis Analysis & Dashboards Administration Security Lifecycle Management Data Provenance Dynamic Data Pipeline Monitoring Automation Increased productivity and job performance for big data Adaptive execution on any engine, starting with Spark Increased cloud support with HDInsight More enterprise-level security for Hortonworks Kerberos impersonation support Ranger support Improved analytics at every stage of the data pipeline Visual Data Exploration enhancements Support for third party visualizations
7
CHALLENGE: Productivity and Job Performance for Big Data Jobs
8
Increased User Productivity and Job Performance for Big Data Jobs
Adaptive Execution on any engine, starting with Spark Adaptive Execution for Spark Increased cloud support with HDInsight
9
Adaptive Execution on Any Engine, Starting with Spark
Build Once, Execute on Any Engine Challenge: With rapidly changing big data technology, coding on various engines can be time-consuming or impossible with existing resources Solution: Future-proof data integration and analytics development in a drag-and-drop visual development environment, eliminating the need for specialized coding and API knowledge. Seamlessly switch between execution engines to fit data volume and transformation complexity Challenge: With rapidly changing big data technology, coding on various engines can be time-consuming or impossible Enables anyone, not just a Java developer, to work with processing engines such as Spark Build once, Execute on Any Engine. Can easily switch between execution engines, without rewriting transformation logic, based on data complexity Allows data teams to future-proof data integration and analytics development in a drag and drop visual development environment, eliminates the need for specialized coding and API knowledge and enables seamless switching of execution engines to fit data volume and transformation complexity. While the competition, such as Talend, support multiple execution engines, the number of transformation steps that they support decreases when they transition from their native engine to MapReduce and then further reduces when they transition to Spark. Pentaho on the other hand provides coverage of virtually all available steps across execution engines.
10
Adaptive Execution for Spark
Process Big Data Faster on Spark Without Manual Coding Challenge: Finding the talent and time to work with Spark and newer big data technologies Solution: More easily develop Spark applications in PDI using adaptive execution to ingest, process and blend data from a range of big data sources and scale on Spark clusters Challenge: Finding the talent and time to work with Spark and newer big data technologies Develop Spark applications in PDI’s drag-and-drop visual development environment. Enables more people to be more productive Spark applications. Pentaho is the only vendor to support Spark with all data integration steps in a visual drag-and-drop environment.
11
Increased Cloud Support with HDInsight
Store and Process Big Data in the Cloud with HDInsight Challenge: Big Data storage and processing options for deploying in the cloud, on premise, or hybrid Solution: Customers who already use Microsoft Azure HDInsight will be able to seamlessly use Pentaho's capabilities, allowing more options in cloud, on-premise, or hybrid deployments Challenge: Big Data storage and processing options for deploying in the cloud, on premise, or hybrid More options to store – and more importantly, process – big data in hybrid, on-premises, and public cloud environments. Now, potential customers who already use Microsoft Azure HDInsight will be able to use Pentaho's capabilities with support for Microsoft Azure HDInsight, Azure SQL and Azure SQL Server
12
CHALLENGE: Enterprise-Level Security for Hadoop Deployments
13
Additional Enterprise-Level Security for Hortonworks
Kerberos Impersonation Support Ranger Support
14
Increased Kerberos Impersonation Support
Increase Security with Hortonworks Deployments Challenge: Authentication security vulnerabilities with Hortonworks deployments Solution: Reduced risk, more secure multi-user Hadoop data integration, better big data governance, and cluster protection from intruders Challenge: Authentication security vulnerabilities with Hortonworks deployments With increased Kerberos impersonation support - reduce risk, provide more secure multi-user Hadoop data integration, better big data governance, and cluster protection from intruders
15
Manage Role Based Permissions on Hortonworks
Ranger Support Manage Role Based Permissions on Hortonworks Challenge: Governance and risk with authorization on Hortonworks deployments. Solution: Enterprise-grade compatibility with Ranger for authorization and role-based access to specific data sets on Hortonworks, ensuring business access rules are enforced across Hadoop data and components Challenge: Security and role based permissions on Hortonworks In 7.1, enterprise-grade compatibility with Ranger for authorization and role based access to specific data sets on Hortonworks. Ensures business access rules are enforced across Hadoop data and components, promoting governance, protecting resources, and reducing risk.
16
CHALLENGE: Multiple Tools and Siloed Processes in Data Prep
17
Improved Analytics at Every Stage of the Data Pipeline
Visual Data Exploration Enhancements Integration with Third Party Visualizations
18
Recap: Visual Data Exploration
Access visualizations during data prep for inspection or prototyping Challenge: Inability to view visualizations without switching in and out of tools Solution: Visual Data Exploration provides access to analytics during data preparation so users can easily spot check data issues on the spot, without switching in and out of tools or waiting until the very end to find data quality problems In addition, IT and the business can collaborate and iterate faster, shortening the cycle from raw data to meaningful analytics. Challenge: Inability to view visualizations without switching in and out of tools Visual Data Exploration provides access to analytics during data preparation so users can easily spot check data issues on the spot, without switching in and out of tools or waiting until the very end to find data quality problems In addition, IT and the business can collaborate and iterate faster, shortening the cycle from raw data to meaningful analytics.
19
Visual Data Exploration Enhancements
Drill-Down Exploration Can click to drill into various hierarchies Access visualizations during data prep for data inspection or prototyping Visual Data Exploration Enhancement: Ability to further drill-down into visualizations within Visual Data Explorer x Visual Data Exploration Enhancement: Ability to further drill-down into visualizations within Visual Data Explorer
20
Visual Data Exploration Enhancements
New Visualizations: Heat Grid, Geo Map, Sunburst Access visualizations during data prep for data inspection or prototyping Visual Data Exploration Enhancement: New visualizations for expanded prototyping – heat grid, geo map, sunburst New visualizations for expanded prototyping – heat grid, geo map, sunburst Heat Grid: shows 2 dimensions and 2 measures at once. Most useful for relative comparisons at the ‘intersection’ of 2 dimensions. Ex: See sales metrics by each combination of month and region (as shown) Sunburst: useful for showing how a measure is distributed across several categories / attributes. Esp. useful for showing multiple levels in hierarchy at once. Ex: breakdown of sales by state (inner slice), and city (outer slice) Geo map: measures represented by dot size/color. Pan, zoom actions. User can now explicitly define lat and long fields when creating a location attribute for a model in annotate stream, with results showing in DE and Analyzer
21
Integration with 3rd Party Visualizations
More Easily Integrate 3rd Party Visualizations Challenge: Easily integrating visualizations that are not out of the box Solution: Integrate visualizations from 3rd party libraries (D3, FusionCharts, Highcharts, etc) with an easier-to-use and more flexible API and documentation Challenge: Easily integrating visualizations that are not out of the box Integrate visualizations from 3rd party libraries (D3, FusionCharts, Highcharts, etc) with an easier-to-use and more flexible API and documentation. More robust framework for developers to use, including samples and documentation. Easier to integrate new visualizations into Pentaho Better developer APIs Reusability of visualizations in Data Explorer and Pentaho Analyzer Better documentation
22
Demonstration
23
Questions?
24
Thank You
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.