#1 Modern Platform to Turn Data into a Strategic Asset ©2016 RapidMiner, Inc. All rights reserved. May 24, 2016 Featuring Howard Dresner Predictive Analytics: Extracting Big Value from Big Data
©2016 RapidMiner, Inc. All rights reserved Speakers Howard Dresner Chief Research Officer Dresner Advisory Services Lars Bauerle Chief Product Officer RapidMiner
©2016 RapidMiner, Inc. All rights reserved Housekeeping Recording will be available within 1-2 business days, link will be ed to you You may type your questions in the Questions panel on the screen at any time We will leave time at the end for a Q&A session
Dresner Advisory Services Advanced and Predictive Analytics and Big Data Copyright 2016 Dresner Advisory Services, LLC
Definitions Advanced and Predictive Analytics Includes statistics, modeling, machine learning, and data mining to analyze facts to make predictions about future, or otherwise unknown, events. We define big data analytics as systems that enable end-user access to and analysis of data contained and managed within the broader Hadoop ecosystem. Advanced and Predictive Analytics Includes statistics, modeling, machine learning, and data mining to analyze facts to make predictions about future, or otherwise unknown, events. We define big data analytics as systems that enable end-user access to and analysis of data contained and managed within the broader Hadoop ecosystem. Copyright 2016 Dresner Advisory Services, LLC
Dresner Advisory Services Advanced and Predictive Analytics and Big Data Copyright 2016 Dresner Advisory Services, LLC
#1 Modern Platform to Turn Data into a Strategic Asset ©2016 RapidMiner, Inc. All rights reserved. May 24, 2016 Lars Bauerle Chief Product Officer RapidMiner for Advanced/Predictive Analytics and Big Data
©2016 RapidMiner, Inc. All rights reserved Leader 2016, 2015 & 2014 Gartner Magic Quadrant for Advanced Analytics Platforms Strong Performer 2015 Forrester Wave on Big Data Predictive Analytics Innovation Winner 2015 Wisdom of Crowds for Advanced & Predictive Analytics, Big Data Analytics & End-User Data Preparation #1 Open-Source Platform 2015, 2014, 2013 Data Mining & Analytics Software Poll RapidMiner is #1 OPEN SOURCE
©2016 RapidMiner, Inc. All rights reserved RapidMiner is UNIQUE Open-Source Innovation Cutting-edge data science platform designed for the Big Data era Frictionless Operationalization Prescriptive analytics closes the loop between insight & action Lightning-Fast Data Science Seamless orchestration accelerates predictive analytics lifecycle Self-Service Predictive Analytics Effortless & guided design democratizes data science
©2016 RapidMiner, Inc. All rights reserved ACCELERATES Time-to-Value DATA PREP Speed & optimize ALL data exploration, blending & cleansing tasks OPERATIONALIZE Easily deploy & maintain models and embed analytic results MODEL & VALIDATE Rapidly prototype and confidently validate predictive models DATA PREP Speed & optimize ALL data exploration, blending & cleansing tasks CONNECT TO ANY DATA SOURCE, ANY FORMAT, AT ANY SCALE SUPPORT FOR ALL MAJOR BI, DATA VISUALIZATION & BUSINESS APPLICATIONS
©2016 RapidMiner, Inc. All rights reserved ©2016 RapidMiner, Inc. All rights reserved STREAMLINED Data Preparation Speed & optimize ALL data exploration, blending & cleansing tasks A powerful chart engine offers statistical overviews, graphs & charts for data exploration Rapidly import, combine and transform structured & unstructured data for deeper predictive insights Accelerate advanced data blending tasks with powerful feature weighting, selection & generation Expertly cleanse data with anomaly & outlier detection, missing value handling and normalization
©2016 RapidMiner, Inc. All rights reserved ©2016 RapidMiner, Inc. All rights reserved POWERFUL Modeling & Validation Rapidly prototype and confidently validate predictive models Breadth of machine learning functions enhance supervised & unsupervised learning Automatic techniques for model building, selection & optimization, simplify each step in the process Prescriptive algorithms, optimization loops & guided recommendations reveal optimal actions Modular cross-validation & honest performance calculations ensure that results will deliver the expected outcome
©2016 RapidMiner, Inc. All rights reserved ©2016 RapidMiner, Inc. All rights reserved FRICTIONLESS Operationalization—the details Easily deploy & maintain models and embed analytic results Scheduled or event-driven model execution supports human decisions & automated actions Embed results into data visualizations, business applications & web services Dynamically manage models to ensure continued updates and accuracy including tuning, versioning & alerting Support for cloud, big data/Hadoop & server based infrastructure for separation of design and execution
©2016 RapidMiner, Inc. All rights reserved ©2016 RapidMiner, Inc. All rights reserved. Demo
©2016 RapidMiner, Inc. All rights reserved ©2016 RapidMiner, Inc. All rights reserved. Big Data - Hadoop
©2016 RapidMiner, Inc. All rights reserved Big Data – Hadoop Challenges How to EXTRACT VALUE from Hadoop – What to actually do with all the data being collected – There is opportunity to improve the business in there – But, how do we do it? SKILLS GAP is a major adoption inhibitor – Lots of technology – Rapidly changing – Very technical - programming
©2016 RapidMiner, Inc. All rights reserved Sampling Grid Computing Native Distributed Algorithms Different Approaches to Big Data Analytics
©2016 RapidMiner, Inc. All rights reserved Sampling Grid Computing Native Distributed Algorithms Approach 1: Sampling
©2016 RapidMiner, Inc. All rights reserved Approach 1: Sampling Data Movement & Processing Pulls sample data from HDFS/Hive/Impala In the analytics tool (DV, PA, programming) When to use it +Only data exploration / data understanding +Early prototyping on prepared and clean data +Machine Learning modeling with very few and basic patterns (e.g. only a handful of columns and binary prediction target) When NOT to use it −Large number of columns in the data −Need to blend large data sets (e.g. large-scale joins) −Complex Machine Learning models Analytics Tool Pieces of data pulled out of Hadoop Performs Calculations
©2016 RapidMiner, Inc. All rights reserved Sampling Grid Computing Native Distributed Algorithms Approach 2: Grid Computing
©2016 RapidMiner, Inc. All rights reserved Approach 2: Grid computing Data Movement and Processing Only results are moved, data remains in Hadoop Custom single-node application running on multiple Hadoop nodes When to use it +Task can be performed on smaller, independent data subsets +Complex data pre-processing When NOT to use it – Complex Machine Learning models – Lots of interdependencies between data subsets App Analytics Tool Application Results Calculations App
©2016 RapidMiner, Inc. All rights reserved Sampling Grid Computing Native Distributed Algorithms Approach 3: Native Distributed Algorithms
©2016 RapidMiner, Inc. All rights reserved Analytics Tool Approach 3: Native distributed algorithms Data Movement and Processing Only results are moved, data remains in Hadoop Executed by native Hadoop tools: Hive, Spark, H2O, Pig, MapReduce, etc. When to use it +Complex Machine Learning models needed +Lots of interdependencies inside the data (e.g. graph analytics) +Need to blend and cleanse large data sets (e.g. large- scale joins) When NOT to use it −Data is not that large −Sample would reveal all interesting patterns −You don’t want to do a lot of Programming in multiple languages Calculations Results Instructions pushed to Hadoop
©2016 RapidMiner, Inc. All rights reserved Sampling Grid Computing Native Distributed Algorithms Different Approaches to Big Data Analytics Which one to use for a given use case?
©2016 RapidMiner, Inc. All rights reserved Typical projects need all three to succeed Sampling Grid Computing Native Distributed Algorithms
©2016 RapidMiner, Inc. All rights reserved RapidMiner Predictive Analytics Platform
©2016 RapidMiner, Inc. All rights reserved Sampling Grid Computing Native Distributed Algorithms Single Analytics Platform to support all three Pull data from Hive/Impala Use operators SparkRM, PySpark, SparkR Spark, Hive, Impala, custom UDFs, Mahout, Pig RapidMiner Capabilities for all use cases In a GUI environment In a single platform
©2016 RapidMiner, Inc. All rights reserved What it looks like RapidMiner Capabilities for all use cases In a GUI environment In a single platform
©2016 RapidMiner, Inc. All rights reserved RapidMiner for Big Data (Hadoop) RapidMiner Radoop extends predictive analytics to Hadoop and Spark We speak Hadoop so you don’t have to Translates predictive analytics into native Hadoop – you concentrate on creating analytics, not Hadoop programming COMPLETE insights into your Big Data Pushes analytic instructions into Hadoop for computation, so you can analyze the full breadth and variety of your Big Data Use your favorite Hadoop scripts, too! Incorporates SparkR, PySpark, Pig and HiveQL Safe and sound Integrates with Kerberos authentication, supports data access authorization – seamless for users, easy administration for IT
©2016 RapidMiner, Inc. All rights reserved TRANSFORMATIONAL Business Impact Build Better Predictive Models Faster Accelerate the creation of high- value predictive analytics while streamlining low-value tasks Easily Use Predictive Analytics Confidently extract the hidden value from your data using intuitive predictive analytics Operationalize Competitive Advantage Bridge the Data Science Skills Gap Leverage prescriptive analytics in all your decisions to achieve better outcomes Empower data scientists and citizen data scientists to feed the insatiable demand for predictive insights CHIEF ANALYTICS OFFICER CHIEF EXECUTIVE OFFICER DATA SCIENTISTBUSINESS ANALYST
CONFIDENTIAL #1 Agile Predictive Analytics Platform for Today’s Modern Analysts ©2016 RapidMiner, Inc. All rights reserved. Q & A Download RapidMiner