Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Spark: Cluster Computing with Working Sets
Chapter 10 Regression. Defining Regression Simple linear regression features one independent variable and one dependent variable, as in correlation the.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Statistics for the Social Sciences
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Correlation and Regression Analysis
Classification and Prediction: Regression Analysis
Correlation & Regression Math 137 Fresno State Burger.
Correlation & Regression
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Relationship of two variables
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Regression. Population Covariance and Correlation.
Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
CHAPTER 3 Describing Relationships
Data Mining and Decision Support
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
The simple linear regression model and parameter estimation
Machine Learning with Spark MLlib
Trail Mix Investigation
Matlab.
CHAPTER 3 Describing Relationships
Correlation & Regression
CHAPTER 3 Describing Relationships
Regression Analysis Module 3.
Chapter 5 STATISTICS (PART 4).
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Spark Software Stack Inf-2202 Concurrent and Data-Intensive Programming Fall 2016 Lars Ailo Bongo
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS
Regression-Based Prediction for Artifacts in JPEG-Compressed Images
Correlation and Regression
REGRESSION.
CS110: Discussion about Spark
Regression Models - Introduction
CHAPTER 3 Describing Relationships
Product moment correlation
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
9/27/ A Least-Squares Regression.
Regression and Correlation of Data
Machine Learning in Business John C. Hull
What is Artificial Intelligence?
Presentation transcript:

Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile: Techical Blogs: 1

Building Scale-able Machine Learning Applications on Apache Spark Introduction to Machine Learning Apache Spark as a Data Processing Platform for Predictive Analytics Apache Spark Reference Architecture and basic building parts Case Study of Linear Regression The future of Machine Learning and how it can help build Cognitive Apps 2

Machine Learning 3 Machine learning explores the study and construction of algorithms that can learn from and make predictions on data Deep capability of understanding patterns in data Uses principals of mathematics, computational statistics and computer processing to develop predictive data models

Types of Machine Learning 4 Supervised Learning The computer is provided training data which teaches it the relationships between predictor and target variables The computer is then presented with test data set consisting of the predictor variables, and asked to predict the value of the target variables Unsupervised Learning No training sets are provided, leaving it on its own to find structures and patterns in data Examples include clustering data based on similar attributes Reinforcement Learning A computer program interacts with a dynamic environment to perform a certain goal and has to be intelligent enough to understand how it is progressing in its goal Example include automatically driving a car

Technology platform for Machine learning 5 Machine learning involves very complex computations, and CPU and memory intensive number crunching Development of machine learning algorithms is a very expensive process, hence the processing platform must be able to scale horizontally as the data processing size grows Fault tolerant and redundant systems to ensure minimal impact during hardware component failures Ideally support open source computational and statistical processing languages like R and Python

6 Apache Spark with its real time in memory processing capability and built in redundancy / fault tolerance of the executor nodes, forms an ideal platform to build Predictive Data Analytics Platforms.

Apache Spark Reference Architecture 7

Apache Spark Building Blocks 8 RDD (Resilient Distributed Data) Sets & Spark Context Sparse and Dense Vectors / Labeled Points Spark SQL Context / Data Frames for Spark ML

Vectors and Labeled Points 9 Dense Vector (4.2,3.1,6.0) Sparse Vector Original : (2.6,0.0,0.0,3.1,0.0) Representation: (5, (0,3), (2.6, 3.1) Labeled Point Contains a “Label” (target variable) and a list of “features” (the predictors) Labeled Point (1.0, Vectors.dense(2.6,0.0,0.0,3.1,0.0)

Linear Regression (Multi Variate) 10 It is a method of investigating functional relationship between variables. It tries to estimate the value of dependent variables from the values of independent variables using a linear equation. Regression analysis is typically used when dependent and independent variables are continuous and have some co relation. What is Linear regression?

Example of a Simple Linear Equation 11 Y = ãX + ß The above plot shows a simple linear equation where we only have one variable X, which we are using to find the value of Y. ã is called the slope which is Y/X ß is the intercept which is the value of Y when X=0

Multi Variate Linear Regression 12 We have multiple independent variables x1, x2…xn which we use to calculate the value of variable Y. It can be expressed in the form Y = x1ã1 +x2ã2 +x3ã3…….x1ãn + ß

Model Accuracy of a Linear Regression 13 Fitting a line in Linear regression A linear regression algorithm will try to fit a line that will give the least residuals. Residuals is the sum of square of vertical distances between the points. Goodness of fit R-squared is a measure which tells us how close the data is to the fitted line. It goes from values 0 to 1. The higher the value, the better is the fit.

The Linear Regression Machine Learning Process 14 Development of Linear equation Apache Spark and ML Predict target Variables Training Data: Predictors + Target Variables Test Data: Predictors

Apache Spark Machine Learning Process Flow 15 Load data into a Spark RDD Transform RDD – Filtering, data type conversions, Centering and Scaling etc Convert to Labeled Point for Spark ML to work Create Data Frame using Spark SQL Context Split training and Testing Data Build Model Perform Predictions and collect Model Performance Statistics

Example of Linear Regression with Apache Spark ML 16 Problem statement: The input data set contains data about details of various car models. Based on the information provided, the goal is to come up with a model to predict Miles-per-gallon of a given model.

Example of Linear Regression with Apache Spark ML 17 Linear Regression – Multi Variate Data Imputation Variable Reduction Techniques used:

Example of Linear Regression with Apache Spark ML 18

Example of Linear Regression with Apache Spark ML 19

Example of Linear Regression with Apache Spark ML 20 Using R psyc package to find Pearson’s co relation co efficients library(psych) pairs.panels(reg_df_2)

Example of Linear Regression with Apache Spark ML 21

Example of Linear Regression with Apache Spark ML 22

Example of Linear Regression with Apache Spark ML 23

Example of Linear Regression with Apache Spark ML 24

Example of Linear Regression with Apache Spark ML 25

Example of Linear Regression with Apache Spark ML 26

Example of Linear Regression with Apache Spark ML 27

Example of Linear Regression with Apache Spark ML 28

Example of Linear Regression with Apache Spark ML 29

Example of Linear Regression with Apache Spark ML 30

Example of Linear Regression with Apache Spark ML 31

Example of Linear Regression with Apache Spark ML 32

Example of Linear Regression with Apache Spark ML 33

Example of Linear Regression with Apache Spark ML 34

Example of Linear Regression with Apache Spark ML 35

Example of Linear Regression with Apache Spark ML 36 Apache Spark ML Linear Regression algorithm provides comparable model accuracy compared to other popular R packages We were able to use Spark RDDs, and Spark SQL context to develop a truly scalable distributed computing machine learning process. Conclusions:

Where can machine Learning take us? 37

38 Abhik Roy Database Technologies Josh Ivanoff, Internal Communications