AD Click Prediction a View from the Trenches

Slides:

Advertisements

Similar presentations

Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )

Advertisements

Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.

Follow the regularized leader

Optimization Tutorial

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.

Assuming normally distributed data! Naïve Bayes Classifier.

Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:

Multi-Class Object Recognition Using Shared SIFT Features

Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.

Collaborative Filtering Matrix Factorization Approach

SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.

Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.

EDA385 Project Presentation The Sound Disguiser. Overview Sample input audio Read input from the rotary encoder Process the audio due to choosen mode.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호

NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

The problem of overfitting

Regularization (Additional)

CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.

USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.

Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.

Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.

Bandwidth-Efficient Continuous Query Processing over DHTs Yingwu Zhu.

Duplicate Detection in Click Streams(2005) SubtitleAhmed Metwally Divyakant Agrawal Amr El Abbadi Tian Wang.

Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )

StressSense: Detecting Stress in Unconstrained Acoustic Environments using Smartphones Hong Lu, Mashfiqui Rabbi, Gokul T. Chittaranjan, Denise Frauendorfer,

1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent

Machine Learning – Classification David Fenyő

Convolutional Neural Network

Adding Unlike Fractions

Randomness in Neural Networks

Applied Discrete Mathematics Week 7: Probability Theory

The Variable-Increment Counting Bloom Filter

Methods and Metrics for Cold-Start Recommendations

Intro to NLP and Deep Learning

563.10: Bloom Cookies Web Search Personalization without User Tracking

Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou.

Probabilistic Models for Linear Regression

Wide & deep networks.

Group Norm for Learning Latent Structural SVMs

Integers in 2’s compliment Floating point

Predictive Performance

Collaborative Filtering Matrix Factorization Approach

Understanding the Internet Low Bit Rate Coder

Designing Algorithms for Multiplication of Fractions

Word Embedding Word2Vec.

Chapter 5: Probabilistic Analysis and Randomized Algorithms

Neural Networks Geoff Hulten.

The Law of Complements P(A) = 1 – P(AC) Or P(AC) = 1 – P(A)

How do we find the best linear regression line?

CSE 491/891 Lecture 25 (Mahout).

Updating Data On-Line.

Chapter 5: Probabilistic Analysis and Randomized Algorithms

Shih-Yang Su Virginia Tech

Department of Computer Science Ben-Gurion University of the Negev

A flow aware packet sampling mechanism for high speed links

Reinforcement Learning (2)

Linear regression with one variable

Cengizhan Can Phoebe de Nooijer

Reinforcement Learning (2)

Logistic Regression Geoff Hulten.

Presentation transcript:

AD Click Prediction a View from the Trenches Google paper 2013 윤철환

Google Ad

System Overview

FTRL-Proximal Algorithm Online Gradient Descent(OGD) + Regularized Dual Averaging(RDA) Gradient Learning Late ,

FTRL-Proximal Algorithm

FTRL-Proximal Algorithm

Per Coordinate Learning Rates N : negative events P: Positive events p= P / ( N + P )

FTRL-Proximal Algorithm

Memory saving tech Probabilisitic feature inclusion Subsampling training data Encoding values with fewer bits

Probabilistic Feature Inclusion Poisson Inclusion New feature are inserted with probability p Bloom Filter Inclusion Once a feature has occurred more than n times (according to the filter), we add it to the model

Subsampling Training Data Any query for which at least one of the ads was clicked. A fraction r ∈ (0, 1] of the queries where none of the ads were clicked. The expected contribution of a randomly chosen event t in the unsampled data to the sub-sampled objective function

Encoding Values with Fewer Bits Naive implementations of the Online Gradient Descent algorithm use 32 or 64 bit floating point encodings. For their Regularized Logistic Regression models, such encodings waste memory. Use fixed point (q2.13 % 16bit) encoding instead. No measurable loss in precision and 75% RAM savings

GridViz