Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictive Analytics with BiG data

Similar presentations


Presentation on theme: "Predictive Analytics with BiG data"— Presentation transcript:

1 Predictive Analytics with BiG data
AGENDA About me Usecase Predictive Analytics Amazon Machine Learning (ML) Key Concepts Datasources Binary Model Model Evaluation Other Types & Pricing Google Machine Learning

2 ABOUT Me Naveen VK Principal Architect at NVISIA, a regional software development company Worked for NVISIA for over 17 years Designed and built custom multi-tier applications using Java Enterprise stack for various companies Involved in entire application development lifecycle including requirements gathering, architecture, design, implementation, integration, testing and deployment Some clients: ETF - State of WI, American Family, Harley Davidson, Cumulus Media Currently working at ETF (Employee Trust Fund) Manage pensions, insurance and other benefits for state and local employees Involved in multiple projects (5) and currently supporting multiple applications (7) Has deep expertise in databases like Oracle (since 1994) and DB2 (since 1999) and with SQL queries and PL/SQL stored procedures Codecinella.org 3 fun facts about myself NVISIA® Confidential 2016

3 Usecase Sample usecase NVISIA® Confidential 2016

4 Usecase – Titanic survivors
Did this person survive the sinking of the Titanic? Titanic Survivors Dataset from kaggle.com CSV file Hard to code Survival depends on various parameters: age, cabin/class, gender, siblings/spouse, etc. NVISIA® Confidential 2016

5 Predictive Analytics What is Predictive Analytics? Some use cases/examples NVISIA® Confidential 2016

6 PREDICTIVe ANALYTICS What is it?
Mining data, using statistical algorithms and machine learning to predict trends or probabilities Use historical data and patterns in historical data to predict future Create models based on patterns in data to predict the probability of something happening in the future The better the model and the training data, the better the prediction Steps: Create model Train model Evaluate model Test model Predict Examples Is this spam? Will this product sell? How many units of this product will sell? Is this product a piece of clothing, a book or a movie? What price will this house sell for? What will be the temperature here tomorrow? NVISIA® Confidential 2016

7 Amazon machine learning (ml)
What is it? When to use it? NVISIA® Confidential 2016

8 Amazon machine learning (mL)
AWS (Amazon Web Service) cloud-based service for predictive analytics Use tools and wizards to create machine learning models Use simple APIs to obtain predictions for your application No need to write custom code or have supporting infrastructure Finds patterns in your existing data Use models to process new data and generate predictions When to use ML? ML is not a solution for every type of problem A target value can be determined by coding simple rules, computations and steps without any data-driven learning Use ML when the rules cannot be programmed easily Too many factors Too many overlapping rules Too much fine tuning of rules Use ML when the solution cannot be scaled 100s of Millions vs. 100s (Example: manual vs. automated spam filter) NVISIA® Confidential 2016

9 Amazon ml – Key Concepts
Terms and concepts NVISIA® Confidential 2016

10 Amazon mL – Key concepts
Datasources Contains metadata associated with data inputs to the ML Speadsheets, CSV files, Streaming data, Relational data base ML Models Use patterns in data to generate predictions Evaluations Measure the quality of ML models Predictions Batch Predictions Multiple data inputs aka batch data Asynchronous Realtime Predictions Individual data inputs Synchronous No coding or any kind of infrastructure needed NVISIA® Confidential 2016

11 Amazon ml – Datasources
Details of datasources in Amazon ML NVISIA® Confidential 2016

12 Amazon mL – Datasources
In Amazon ML, a datasource contains only the metadata about the actual input data Actual data may be stored in AWS cloud Amazon S3 buckets Amazon Redshift Databases MySQL databases in Amazon Relational Database Service (RDS) Amazon Kinesis Attributes Column headings represent attributes Unique names Required Target Attribute The data that is being predicted Training data has a target attribute that has already been predicted (required in training data) Observation Single row of data Row ID Attribute with unique values flagged to be included in prediction output Helps cross-reference the prediction with the observation NVISIA® Confidential 2016

13 Amazon mL – Datasources continued
Schema All attributes and corresponding data-types of input data Location Location of input data stored in, say, Amazon S3 bucket Datasource Name Human readable name of the datasource Input data All observations aka Rows in spreadsheet/csv file or database NVISIA® Confidential 2016

14 Amazon ml – Binary MODEL
Details of binary model in Amazon ML NVISIA® Confidential 2016

15 Amazon mL – binary MODEL
In Amazon ML, a model finds patterns in data and generates predictions Binary Model Predicts values that has 1 of 2 states: true/false, 1/0, win/lose, alive/dead, pass/fail, healthy/sick Uses industry-wide standard learning algorithm called Binary Logistic Regression Algorithm Statistical model used to predict the probability of a binary response based on certain variables Examples Is this spam? Will this product sell? Will this person survive the sinking of the Titanic? Recipe Attributes and attribute transformations available to train the model NVISIA® Confidential 2016

16 Amazon mL – BINARY MODEL - STEPS
1: Create an AWS account 2: Upload data to AWS cloud (S3 bucket) 3: Create and train model 4: Test model 5: Evaluate model 6: Create Predictions Demo NVISIA® Confidential 2016

17 Amazon ml – MODEL Evaluation
Evaluating the model in Amazon ML NVISIA® Confidential 2016

18 Amazon mL – EVALUATIONS
In Amazon ML, an evaluation measures the quality of the ML model Need to evaluate a model to determine if it will do a good job predicting the target on new/future data Need training data where target is already predicted to train/evaluate a model Max size of training data: 100KB Model Insight Amazon ML will provide metrics and insights to review accuracy of the model Overall success metric of the model Visualizations to explore accuracy of model Alerts to check validity of evaluation NVISIA® Confidential 2016

19 Amazon mL – EVALUATIONS – Binary insights
Prediction score Actual output of the binary prediction Indicates the system’s certainty that the given observation has target value of 1 Output scores of observations is between 0 & 1 Default threshold score aka cut-off is 0.5, this can be changed Any observation that scores above cut-off is predicted as target=1 and below cut-off is predicted as 0 Correct predictions True Positive (TP) Predicted value of target = 1, true value of target = 1 True Negative (TN) Predicted value of target = 0, true value of target = 0 Incorrect predictions False Positive (FP) Predicted value of target = 1, true value of target = 0 False Negative (FN) Predicted value of target = 0, true value of target = 1 Area Under the Curve (AUC) Measures the ability of the model to make a correct prediction AUC near 1 indicates model is highly accurate (near 0s?) NVISIA® Confidential 2016

20 Amazon mL – EVALUATIONS – Binary insights – AUC (AWS Tutorial)
NVISIA® Confidential 2016

21 Amazon ml – OTHER TYPES of MODELS & PRICING
Other types of models and pricing in Amazon ML NVISIA® Confidential 2016

22 Amazon mL – OTHER TYPES of MODELS
Multiclass Model Predicts values that belong to a pre-defined, limited set of states (1 of 3 or more states) Uses industry-wide standard learning algorithm called Multinomial Logistic Regression Algorithm Examples Is this product a book, a movie or apparel? Is this movie a thriller, a documentary or a comedy? Regression Model Predicts a numeric value For regression problems Uses industry-wide standard learning algorithm called Linear Regression Algorithm Statistical model to predict the value of y based on a number of variables x1, x2, x3, etc. Examples: What will the temperature be tomorrow? How many units of this product will sell? How much will this house sell for? Type of model chosen based on the type of target to predict NVISIA® Confidential 2016

23 https://aws.amazon.com/machine-learning/pricing/
Amazon mL – PRICING Pricing Data analysis and model Batch predictions: $0.10/nearest 1000 (rounded up to the next 1000) Realtime predictions: $0.0001/transaction (rounded to nearest penny) S3 Standard storage: $0.03/TB/month NVISIA® Confidential 2016

24 Google machine learning
What is it? When to use it? NVISIA® Confidential 2016

25 Google machine learning - Thoughts
Uses Google Cloud Platform and TensorFlow™ ( Beta version Unsuccessful in creating a working demo  Not very intuitive and not easy to use Steps Need to create projects on the Google Cloud first and enable APIs for the project Upload data to Google Cloud Storage (similar to AWS S3) Stores data/objects in buckets Was successful in creating a bucket for my data Create Model (unique name) Was successful in creating a model Create a version Errors out Activate Cloud Shell NVISIA® Confidential 2016

26 Amazon Machine Learning (ML) Key Concepts Datasources Binary Model
Checklist Usecase Predictive Analytics Amazon Machine Learning (ML) Key Concepts Datasources Binary Model Model Evaluation Other Model Types & Pricing NVISIA® Confidential 2016

27 Thank YOU For COMING Links: Contact Info: Linked-In: Naveen VK (work) (personal) Github:


Download ppt "Predictive Analytics with BiG data"

Similar presentations


Ads by Google