AD Click Prediction a View from the Trenches Google paper 2013 윤철환
Google Ad
System Overview
FTRL-Proximal Algorithm Online Gradient Descent(OGD) + Regularized Dual Averaging(RDA) Gradient Learning Late ,
FTRL-Proximal Algorithm
FTRL-Proximal Algorithm
Per Coordinate Learning Rates N : negative events P: Positive events p= P / ( N + P )
FTRL-Proximal Algorithm
Memory saving tech Probabilisitic feature inclusion Subsampling training data Encoding values with fewer bits
Probabilistic Feature Inclusion Poisson Inclusion New feature are inserted with probability p Bloom Filter Inclusion Once a feature has occurred more than n times (according to the filter), we add it to the model
Subsampling Training Data Any query for which at least one of the ads was clicked. A fraction r ∈ (0, 1] of the queries where none of the ads were clicked. The expected contribution of a randomly chosen event t in the unsampled data to the sub-sampled objective function
Encoding Values with Fewer Bits Naive implementations of the Online Gradient Descent algorithm use 32 or 64 bit floating point encodings. For their Regularized Logistic Regression models, such encodings waste memory. Use fixed point (q2.13 % 16bit) encoding instead. No measurable loss in precision and 75% RAM savings
GridViz