Intelligent Services Serving Machine Learning Joseph E. Gonzalez jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc.
Contemporary Learning Systems Training Data Models Big Connect to political bias story ----- Meeting Notes (2/3/15 11:29) ----- Example of Data and Model use amazon model
Contemporary Learning Systems Create MLlib BIDMach Oryx 2 VW LIBSVM MLC
Data Model What happens after we train a model? Training Drive Actions Connect to political bias story ----- Meeting Notes (2/3/15 11:29) ----- Example of Data and Model use amazon model Conference Papers Dashboards and Reports
Data Model What happens after we train a model? Training Drive Actions Connect to political bias story ----- Meeting Notes (2/3/15 11:29) ----- Example of Data and Model use amazon model Conference Papers Dashboards and Reports
Suggesting Items at Checkout Fraud Detection Cognitive Assistance Internet of Things Low-Latency Personalized Rapidly Changing
Data Model Train Connect to political bias story ----- Meeting Notes (2/3/15 11:29) ----- Example of Data and Model use amazon model
Actions Data Model Adapt Serve Train Connect to political bias story ----- Meeting Notes (2/3/15 11:29) ----- Example of Data and Model use amazon model
Machine Learning Intelligent Services
The Life of a Query in an Intelligent Service Lookup Model Model Info Request: Items like x Top-K Query μ σ ρ ∑ ∫ α β math Feature Lookup Intelligent Service User Data Top Items New Page Images … Web Serving Tier Feature Lookup User Product Info Feedback Feedback: Preferred Item Content Request
Essential Attributes of Intelligent Services Responsive Intelligent applications are interactive Adaptive ML models out-of-date the moment learning is done Manageable Many models created by multiple people
Responsive: Now and Always Compute predictions in < 20ms for complex Models Queries Top K Features SELECT * FROM users JOIN items, click_logs, pages WHERE … under heavy query load with system failures.
Experiment: End-to-end Latency in Spark MLlib Evaluate Model To JSON HTTP Req. Feature Trans. 4 HTTP Response Encode Prediction
Latency measured in milliseconds End-to-end Latency for Digits Classification 784 dimension input Served using MLlib and Dato Inc. NOP (Avg = 5.5, P99 = 20.6) Single Logistic Regression (Avg = 21.8, P99 = 38.6) 100 Tree Random Forrest (Avg = 50.5, P99 = 73.4) Count out of 1000 500 Tree Random Forrest (Avg = 172.56, P99 = 268.7) 500 Tree Random Forrest (Avg = 172.6, P99 = 268.7) Decision Tree (Avg = 22.4, P99 = 63.8) One-vs-all LR (10-class) (Avg = 137.7, P99 = 217.7) AlexNet CNN (Avg = 418.7, P99 = 549.8) Latency measured in milliseconds
Adaptive to Change at All Scales Population Granularity of Data Session Shopping for Mom Shopping for Me : New features are Customer interests evolve within sessions Need to collect the right data to make the best predictions. Make clear the modeling assumptions here. Think about change in statistics Months Rate of Change Minutes
Adaptive to Change at All Scales Population Granularity of Data Session Population Law of Large Numbers Change Slow Rely on efficient offline retraining High-throughput Systems Shopping for Mom Shopping for Me : New features are Customer interests evolve within sessions Need to collect the right data to make the best predictions. Months Rate of Change Minutes Months New users and products Evolving interests and trends
Adaptive to Change at All Scales Population Granularity of Data Session Small Data Rapidly Changing Low Latency Online Learning Sensitive to feedback bias Shopping for Mom Shopping for Me : New features are Customer interests evolve within sessions Need to collect the right data to make the best predictions. Months Rate of Change Minutes
The Feedback Loop Opportunity for Bandit Algorithms My Amazon Homepage The Feedback Loop I once looked at cameras on Amazon … Similar cameras and accessories Opportunity for Bandit Algorithms Bandits present new challenges: computation overhead complicates caching + indexing
Exploration / Exploitation Tradeoff Systems that can take actions can adversely bias future data. Opportunity for Bandits! Bandits present new challenges: Complicates caching + indexing tuning + counterfactual reasoning
Management: Collaborative Development Teams of data-scientists working on similar tasks “competing” features and models Complex model dependencies: Cat Photo Cat Classifier Animal Classifier isCat isAnimal Cuteness Predictor Cute!
Daniel Crankshaw, Xin Wang, Joseph Gonzalez Predictive Services UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan
Active Research Project Predictive Services UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Active Research Project
Velox Model Serving System [CIDR’15, LearningSys’15] Focuses on the multi-task learning (MTL) domain Spam Classification Content Rec. Scoring Session 1: Session 2: Localized Anomaly Detection
Velox Model Serving System [CIDR’15, LearningSys’15] Personalized Models (Mulit-task Learning) Input Output “Separate” model for each user/context.
Velox Model Serving System [CIDR’15, LearningSys’15] Personalized Models (Mulit-task Learning) Split Feature Model Personalization Model
Hybrid Offline + Online Learning Update feature functions offline using batch solvers Leverage high-throughput systems (Apache Spark) Exploit slow change in population statistics Split Feature Model Personalization Model Update the user weights online: Simple to train + more robust model Address rapidly changing user statistics
Hybrid Online + Offline Learning Results Similar Test Error Substantially Faster Training Hybrid Offline Full Offline Full Hybrid User Pref. Change
Evaluating the Model Cache Feature Evaluation Split Input
Evaluating the Model Cache Feature Evaluation Split Feature Caching Across Users Input Approximate Feature Hashing Anytime Feature Evaluation
Feature Caching Feature Hash Table New input: Compute feature: Hash input: Store result in table Feature Hash Table Alex notes; use a cover tree….
LSH Cache Coarsening Feature Hash Table Use Wrong Value! New input Use Wrong Value! Hash new input: False cache collision LSH hash fn. Feature Hash Table Alex notes; use a cover tree….
LSH Cache Coarsening Feature Hash Table Locality-Sensitive Hashing: Hash new input: False cache collision Use Value Anyways! Req. LSH Locality-Sensitive Caching: Alex notes; use a cover tree….
Always able to render a prediction by the latency deadline Anytime Predictions Compute features asynchronously: if a particular element does not arrive use estimator instead Always able to render a prediction by the latency deadline __ __ __ Andrew McCallum Emma Streubel – “ACL 2015”
Coarsening + Anytime Predictions Coarser Hash Overly Coarsened No Coarsening More Features Approx. Expectation Best Better Alex idea: Hashing with some idea of the frequency of the key make more space Map keys to different spaces to depending on key frequency Improve physical cache locality by building different hash spaces. Prefix code 000001${h(x)} truncate to bits Make zeros proportional to frequency of key Overlay with deadline misses and consider making deadline sooner to see impact on rightside of plot Checkout our poster!
Part of Berkeley Data Analytics Stack Training Management + Serving Spark Streaming BlinkDB Graph X MLbase Velox Spark SQL ML library Model Manager Prediction Service Spark Mesos HDFS, S3, … Tachyon
Daniel Crankshaw, Xin Wang, Predictive Services UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan
Dato Predictive Services Production ready model serving and management system Elastic scaling and load balancing of docker.io containers AWS Cloudwatch Metrics and Reporting Serves Dato Create models, scikit-learn, and custom python Distributed shared caching: scale-out to address latency REST management API: Demo?
Responsive Adaptive Manageable Caching, Bandits, & Management Predictive Services UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Responsive Adaptive Manageable Key Insights: Caching, Bandits, & Management Online/Offline Learning Latency vs. Accuracy
Actions Data Model Future of Learning Systems Adapt Serve Train Connect to political bias story ----- Meeting Notes (2/3/15 11:29) ----- Example of Data and Model use amazon model Train Data Model
Joseph E. Gonzalez Thank You joseph@dato.com, Co-Founder @ Dato jegonzal@cs.berkeley.edu, Assistant Professor @ UC Berkeley joseph@dato.com, Co-Founder @ Dato