Download presentation
Presentation is loading. Please wait.
Published byCoral Dorsey Modified over 9 years ago
1
Real-Time Big Data Analytics From Deployment to Production 1 David Smith Revolution Analytics @revodavid
2
2
3
3 REAL TIME BIG DATA PREDICTIVE ANALYTICS Buzzword Bingo!
4
4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0
5
5 Predictive Analytics Model Factors Scores ”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0 Decision Tree Logistic Regression Neural Network K-means clustering Ensemble Model Predictive Model User ID Browser Time/Date / Location Previous purchases Friend data Any known information Product of most interest Offer of most likely sale Most relevant link Forecast sale value Optimal Bid Prediction or Selection Scoring Rules
6
Real-time Deployment 1.Data distillation 2.Model development and validation 3.Model deployment 4.Real-time model scoring 5.Model refresh 6"CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0
7
1. Data Distillation in Hadoop 7 Unstructured Data Analytics Data Mart Structured Data Log Files Sensor Streams Language Text HDFS Load Map-Reduce rmr
8
8 2. The Model Development Cycle Feature Selection Sampling Aggregati on Variable Trans- formation Model Estimation Model Refineme nt Model Comparis on / Bench- marking Structured Data Predictive Model R White Paper bit.ly/r-is-hot R White Paper bit.ly/r-is-hot
9
3: Deployment Options Unknown factors SQL / Rules Engine Code (C++, Java, R, Hadoop) PMML Engine Factors known in advance Batch Lookup Tables 9 Factors Scores
10
Why did I buy that blender? Just browsing in the mall TV ad / magazine ad Coupon in the mail “Just moved” promo email Webstore recommendation Browsing catalog 10
11
UpStream: Attribution Modeling 11
12
ETL Marketing channel data Behavioral variables Promotional data Overlay data Exploratory data analysis Time-to-event models GAM survival models Scoring for inference Scoring for prediction 5 billion scores per day per retailer UPSTREAM DATA FORMAT CUSTOM VARIABLES (PMML) 4. Model Scoring
13
13 5. Model refresh Factors Scores Actual Outcomes
14
14 Big DataReal Time Kilobytes/S ec Megabytes/ Sec Gigabytes Terabytes Petabytes Exabytes Seconds Milliseconds Minutes Minutes Hours
15
15 PREDICTIVE ANALYTICS BIG DATA REAL TIME
16
16 www.revolutionanalytics.com+1 650 646 9545Twitter: @RevolutionR The leading enterprise provider of software and services for Open Source R Real-Time Big Data Predictive Analytics: From Deployment to Production Booth 618 / Office Hours Weds 1:30PM David Smith @revodavid
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.