Predictive Analytics with BiG data

Slides:



Advertisements
Similar presentations
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Advertisements

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Deriving Performance Metrics From Project Plans to Provide KPIs for Management Information Primavera SIG October 2013.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden.
Opening Keynote Presentation An Architecture for Intelligent Trading  Alessandro Petroni – Senior Principal Architect, Financial Services, TIBCO Software.
The DM Process – MS’s view (DMX). The Basics  You select an algorithm, show the algorithm some examples called training example and, from these examples,
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Website Intelligence Platform “It’s Like Caller-ID for your Website” Proprietary and Confidential.
Finding New Customers with Bid Data Ariel Geifman, Director of Marketing, Mintigo.
Andy Roberts Data Architect
Azure ML in SSIS An introduction to Azure Machine Learning Through the eyes of an SSIS developer David Söderlund – SolidQ Nordic
AZURE MACHINE LEARNING Bringing New Value To Old Data SQL Saturday #
András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.
Cloud Analytics Platforms Christian Frey. About AIDA Our mission is to advance knowledge in data analytics through research, education and outreach Our.
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
3 Ways to Integrate Business Systems to Partners
Detecting Web Attacks Using Multi-Stage Log Analysis
Bhakthi Liyanage SQL Saturday Atlanta 15 July 2017
Building Enterprise Applications Using Visual Studio®
Connected Infrastructure
Big data classification using neural network
Cloud Computing for Science
Analytics as a First-Class Concern
Azure Machine Learning
Data Platform and Analytics Foundational Training
How Cutting Edge Big Data and Analytics Lets J. D
An Intro to AWS Machine Learning
PLM, Document and Workflow Management
Amazon Storage- S3 and Glacier
Overview of MDM Site Hub
Assurance Scoring: Using Machine Learning and Analytics to Reduce Risk in the Public Sector Matt Thomson 17/11/2016.
MISSION POSSIBLE:  Migrating to Oracle’s Planning and Budgeting Cloud Service Bob Usset, EPM Manager © 2016 eCapital Advisors, LLC.
Connected Infrastructure
Database Testing in Azure Cloud
Azure Machine Learning & ML Studio
Kathi Kellenberger Redgate Software
Introduction to Azure Machine Learning Studio
Advanced Analytics. Advanced Analytics What is Machine Learning?
Machine Learning Platform Life-Cycle Management
Data Analysis.
SQL Azure Database – No CDC, No Problem!
Data science and machine learning at scale, powered by Jupyter
Machine Learning with Weka
: Infrastructure for Complete Machine Learning Lifecycle
MANAGING DATA RESOURCES
FileFacets Information Governance Solution Performs High-Quality Automated Enterprise Content Management Migration, Built on Azure MICROSOFT AZURE APP.
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Technical Capabilities
Machine Learning Interpretability
Model Evaluation and Selection
Amazon Machine Learning
What is this and how can I use it?
Getting Started Using Azure ML
Lecture 10 – Introduction to Weka
Machine Learning Algorithms – An Overview
Agenda Need of Cloud Computing What is Cloud Computing
Big DATA.
The Student’s Guide to Apache Spark
Data Analysis Case Study – Auto Claim Assignment
DBOS DecisionBrain Optimization Server
Data Wrangling as the key to success with Data Lake
Mark Quirk Head of Technology Developer & Platform Group
Open Systems Technologies Data Analyst Internship:
OU BATTLECARD: Oracle Identity Management Training
Machine Learning for Cyber
Integrated Statistical Production System WITH GSBPM
ROC Curves and Operating Points
Presentation transcript:

Predictive Analytics with BiG data AGENDA About me Usecase Predictive Analytics Amazon Machine Learning (ML) Key Concepts Datasources Binary Model Model Evaluation Other Types & Pricing Google Machine Learning

ABOUT Me Naveen VK Principal Architect at NVISIA, a regional software development company Worked for NVISIA for over 17 years Designed and built custom multi-tier applications using Java Enterprise stack for various companies Involved in entire application development lifecycle including requirements gathering, architecture, design, implementation, integration, testing and deployment Some clients: ETF - State of WI, American Family, Harley Davidson, Cumulus Media Currently working at ETF (Employee Trust Fund) Manage pensions, insurance and other benefits for state and local employees Involved in multiple projects (5) and currently supporting multiple applications (7) Has deep expertise in databases like Oracle (since 1994) and DB2 (since 1999) and with SQL queries and PL/SQL stored procedures Codecinella.org 3 fun facts about myself NVISIA® Confidential 2016

Usecase Sample usecase NVISIA® Confidential 2016

Usecase – Titanic survivors Did this person survive the sinking of the Titanic? Titanic Survivors Dataset from kaggle.com CSV file Hard to code Survival depends on various parameters: age, cabin/class, gender, siblings/spouse, etc. NVISIA® Confidential 2016

Predictive Analytics What is Predictive Analytics? Some use cases/examples NVISIA® Confidential 2016

PREDICTIVe ANALYTICS What is it? Mining data, using statistical algorithms and machine learning to predict trends or probabilities Use historical data and patterns in historical data to predict future Create models based on patterns in data to predict the probability of something happening in the future The better the model and the training data, the better the prediction Steps: Create model Train model Evaluate model Test model Predict Examples Is this email spam? Will this product sell? How many units of this product will sell? Is this product a piece of clothing, a book or a movie? What price will this house sell for? What will be the temperature here tomorrow? NVISIA® Confidential 2016

Amazon machine learning (ml) What is it? When to use it? NVISIA® Confidential 2016

Amazon machine learning (mL) AWS (Amazon Web Service) cloud-based service for predictive analytics Use tools and wizards to create machine learning models Use simple APIs to obtain predictions for your application No need to write custom code or have supporting infrastructure Finds patterns in your existing data Use models to process new data and generate predictions When to use ML? ML is not a solution for every type of problem A target value can be determined by coding simple rules, computations and steps without any data-driven learning Use ML when the rules cannot be programmed easily Too many factors Too many overlapping rules Too much fine tuning of rules Use ML when the solution cannot be scaled 100s of Millions vs. 100s (Example: manual vs. automated spam filter) NVISIA® Confidential 2016

Amazon ml – Key Concepts Terms and concepts NVISIA® Confidential 2016

Amazon mL – Key concepts Datasources Contains metadata associated with data inputs to the ML Speadsheets, CSV files, Streaming data, Relational data base ML Models Use patterns in data to generate predictions Evaluations Measure the quality of ML models Predictions Batch Predictions Multiple data inputs aka batch data Asynchronous Realtime Predictions Individual data inputs Synchronous No coding or any kind of infrastructure needed NVISIA® Confidential 2016

Amazon ml – Datasources Details of datasources in Amazon ML NVISIA® Confidential 2016

Amazon mL – Datasources In Amazon ML, a datasource contains only the metadata about the actual input data Actual data may be stored in AWS cloud Amazon S3 buckets Amazon Redshift Databases MySQL databases in Amazon Relational Database Service (RDS) Amazon Kinesis Attributes Column headings represent attributes Unique names Required Target Attribute The data that is being predicted Training data has a target attribute that has already been predicted (required in training data) Observation Single row of data Row ID Attribute with unique values flagged to be included in prediction output Helps cross-reference the prediction with the observation NVISIA® Confidential 2016

Amazon mL – Datasources continued Schema All attributes and corresponding data-types of input data Location Location of input data stored in, say, Amazon S3 bucket Datasource Name Human readable name of the datasource Input data All observations aka Rows in spreadsheet/csv file or database NVISIA® Confidential 2016

Amazon ml – Binary MODEL Details of binary model in Amazon ML NVISIA® Confidential 2016

Amazon mL – binary MODEL In Amazon ML, a model finds patterns in data and generates predictions Binary Model Predicts values that has 1 of 2 states: true/false, 1/0, win/lose, alive/dead, pass/fail, healthy/sick Uses industry-wide standard learning algorithm called Binary Logistic Regression Algorithm Statistical model used to predict the probability of a binary response based on certain variables Examples Is this email spam? Will this product sell? Will this person survive the sinking of the Titanic? Recipe Attributes and attribute transformations available to train the model NVISIA® Confidential 2016

Amazon mL – BINARY MODEL - STEPS 1: Create an AWS account 2: Upload data to AWS cloud (S3 bucket) 3: Create and train model 4: Test model 5: Evaluate model 6: Create Predictions Demo NVISIA® Confidential 2016

Amazon ml – MODEL Evaluation Evaluating the model in Amazon ML NVISIA® Confidential 2016

Amazon mL – EVALUATIONS In Amazon ML, an evaluation measures the quality of the ML model Need to evaluate a model to determine if it will do a good job predicting the target on new/future data Need training data where target is already predicted to train/evaluate a model Max size of training data: 100KB Model Insight Amazon ML will provide metrics and insights to review accuracy of the model Overall success metric of the model Visualizations to explore accuracy of model Alerts to check validity of evaluation NVISIA® Confidential 2016

Amazon mL – EVALUATIONS – Binary insights Prediction score Actual output of the binary prediction Indicates the system’s certainty that the given observation has target value of 1 Output scores of observations is between 0 & 1 Default threshold score aka cut-off is 0.5, this can be changed Any observation that scores above cut-off is predicted as target=1 and below cut-off is predicted as 0 Correct predictions True Positive (TP) Predicted value of target = 1, true value of target = 1 True Negative (TN) Predicted value of target = 0, true value of target = 0 Incorrect predictions False Positive (FP) Predicted value of target = 1, true value of target = 0 False Negative (FN) Predicted value of target = 0, true value of target = 1 Area Under the Curve (AUC) Measures the ability of the model to make a correct prediction AUC near 1 indicates model is highly accurate (near 0s?) NVISIA® Confidential 2016

Amazon mL – EVALUATIONS – Binary insights – AUC (AWS Tutorial) NVISIA® Confidential 2016

Amazon ml – OTHER TYPES of MODELS & PRICING Other types of models and pricing in Amazon ML NVISIA® Confidential 2016

Amazon mL – OTHER TYPES of MODELS Multiclass Model Predicts values that belong to a pre-defined, limited set of states (1 of 3 or more states) Uses industry-wide standard learning algorithm called Multinomial Logistic Regression Algorithm Examples Is this product a book, a movie or apparel? Is this movie a thriller, a documentary or a comedy? Regression Model Predicts a numeric value For regression problems Uses industry-wide standard learning algorithm called Linear Regression Algorithm Statistical model to predict the value of y based on a number of variables x1, x2, x3, etc. Examples: What will the temperature be tomorrow? How many units of this product will sell? How much will this house sell for? Type of model chosen based on the type of target to predict NVISIA® Confidential 2016

https://aws.amazon.com/machine-learning/pricing/ Amazon mL – PRICING Pricing https://aws.amazon.com/machine-learning/pricing/ Data analysis and model building: @0.42/hr Batch predictions: $0.10/nearest 1000 (rounded up to the next 1000) Realtime predictions: $0.0001/transaction (rounded to nearest penny) S3 Standard storage: $0.03/TB/month NVISIA® Confidential 2016

Google machine learning What is it? When to use it? NVISIA® Confidential 2016

Google machine learning - Thoughts Uses Google Cloud Platform and TensorFlow™ (http://www.tensorflow.org) Beta version Unsuccessful in creating a working demo  Not very intuitive and not easy to use Steps Need to create projects on the Google Cloud first and enable APIs for the project Upload data to Google Cloud Storage (similar to AWS S3) Stores data/objects in buckets Was successful in creating a bucket for my data Create Model (unique name) Was successful in creating a model Create a version Errors out Activate Cloud Shell NVISIA® Confidential 2016

Amazon Machine Learning (ML) Key Concepts Datasources Binary Model Checklist Usecase Predictive Analytics Amazon Machine Learning (ML) Key Concepts Datasources Binary Model Model Evaluation Other Model Types & Pricing NVISIA® Confidential 2016

Thank YOU For COMING Links: http://docs.aws.amazon.com/machine-learning/latest/dg/what-is-amazon-machine-learning.html https://cloud.google.com/ml/docs/how-tos/ https://www.kaggle.com/ Contact Info: Linked-In: Naveen VK Email: naveen@nvisia.com (work) naveenvkm@gmail.com (personal) Github: https://github.com/navnoon23/