1 Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian.

Slides:



Advertisements
Similar presentations
MCMC estimation in MlwiN
Advertisements

Bayesian Belief Propagation
Chapter 5 Multiple Linear Regression
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Discrete Choice Model of Bidder Behavior in Sponsored Search Quang Duong University of Michigan Sebastien Lahaie
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski.
Chapter 13 Multiple Regression
Overview Full Bayesian Learning MAP learning
Temporal Video Denoising Based on Multihypothesis Motion Compensation Liwei Guo; Au, O.C.; Mengyao Ma; Zhiqin Liang; Hong Kong Univ. of Sci. & Technol.,
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Research © 2008 Yahoo! Statistical Challenges in Online Advertising Deepak Agarwal Deepayan Chakrabarti (Yahoo! Research)
Chapter 12 Multiple Regression
Econ 140 Lecture 71 Classical Regression Lecture 7.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
End of Chapter 8 Neil Weisenfeld March 28, 2005.
1 Clustering Applications at Yahoo! Deepayan Chakrabarti
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Objectives of Multiple Regression
Large Two-way Arrays Douglas M. Hawkins School of Statistics University of Minnesota
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
by B. Zadrozny and C. Elkan
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Model Building III – Remedial Measures KNNL – Chapter 11.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
1 Challenges in Computational Advertising Deepayan Chakrabarti
SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
Evaluation Methods and Challenges. 2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Research © 2008 Yahoo! Statistical Challenges in Online Advertising Deepak Agarwal Deepayan Chakrabarti (Yahoo! Research)
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Evaluating generalised calibration / Fay-Herriot model in CAPEX Tracy Jones, Angharad Walters, Ria Sanderson and Salah Merad (Office for National Statistics)
Mixed Effects Models Rebecca Atkins and Rachel Smith March 30, 2015.
Optimal Sampling Strategies for Multiscale Stochastic Processes Vinay Ribeiro Rolf Riedi, Rich Baraniuk (Rice University)
Predictive Discrete Latent Factor Models for large incomplete dyadic data Deepak Agarwal, Srujana Merugu, Abhishek Agarwal Y! Research MMDS Workshop, Stanford.
CPE 619 One Factor Experiments Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in.
In Stat-I, we described data by three different ways. Qualitative vs Quantitative Discrete vs Continuous Measurement Scales Describing Data Types.
V0 analytical selection Marian Ivanov, Alexander Kalweit.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Lecture 6: Point Interpolation
Statistics……revisited
Tutorial I: Missing Value Analysis
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 9 Review.
INTEGRATING SATELLITE AND MONITORING DATA TO RETROSPECTIVELY ESTIMATE MONTHLY PM 2.5 CONCENTRATIONS IN THE EASTERN U.S. Christopher J. Paciorek 1 and Yang.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Chapter 7 Review.
BINARY LOGISTIC REGRESSION
Paper – Stephen Se, David Lowe, Jim Little
2nd Level Analysis Methods for Dummies 2010/11 - 2nd Feb 2011
CH 5: Multivariate Methods
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Bandits for Taxonomies: A Model-based Approach
Spatial Online Sampling and Aggregation
Online Advertising Multi-billion dollar industry, high growth
Additional notes on random variables
Additional notes on random variables
Presentation transcript:

1 Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian

2 Estimation in the “tail” Contextual Advertising  Show an ad on a webpage (“impression”)  Revenue is generated if a user clicks  Problem: Estimate the click-through rate (CTR) of an ad on a page Most (ad, page) pairs have very few impressions, if any, and even fewer clicks  Severe data sparsity

3 Estimation in the “tail” Use an existing, well-understood hierarchy  Categorize ads and webpages to leaves of the hierarchy  CTR estimates of siblings are correlated  The hierarchy allows us to aggregate data Coarser resolutions  provide reliable estimates for rare events  which then influences estimation at finer resolutions

4 System overview Retrospective data [URL, ad, isClicked] Crawl URLs Classify pages and ads Rare event estimation using hierarchy a sample of URLs Impute impressions, fix sampling bias

5 Sampling of webpages Naïve strategy: sample at random from the set of URLs  Sampling errors in impression volume AND click volume Instead, we propose:  Crawling all URLs with at least one click, and  a sample of the remaining URLs  Variability is only in impression volume

6 Imputation of impression volume Ad classes Page classes sums to #impressions on ads of this ad class [column constraint] sums to ∑n ij + K.∑m ij [row constraint] sums to Total impressions (known) #impressions = n ij + m ij + x ij Clicked pool Sampled Non-clicked pool Excess impressions (to be imputed)

7 Imputation of impression volume Level 0 Level i Page hierarchy Ad hierarchy Region = (page node, ad node) Region Hierarchy  A cross-product of the page hierarchy and the ad hierarchy Page classes Ad classes Region

8 Imputation of impression volume sums to [block constraint] Level i Level i+1

9 Imputing x ij Level i Level i+1 Iterative Proportional Fitting [Darroch+/1972] Initialize x ij = n ij + m ij Iteratively scale x ij values to match row/col/block constraint Ordering of constraints: top- down, then bottom-up, and repeat block Page classes Ad classes

10 Imputation: Summary Given  n ij (impressions in clicked pool)  m ij (impressions in sampled non-clicked pool)  # impressions on ads of each ad class in the ad hierarchy We get  Estimated impression volume Ñ ij = n ij + m ij + x ij in each region ij of every level

11 System overview Retrospective data [page, ad, isclicked] Crawl Pages Classify pages and ads Rare event estimation using hierarchy a sample of pages Impute impressions, fix sampling bias

12 Rare rate modeling 1.Freeman-Tukey transform:  y ij = F-T(clicks and impressions at ij) ≈ transformed-CTR  Variance stabilizing transformation: Var(y) is independent of E[y]  needed in further modeling

13 S ij S parent(ij) Rare rate modeling 2.Generative Model (Tree-structured Markov Model) y ij y parent(ij) covariates β ij variance V ij Unobserved “state” variance W ij V parent(ij) β parent(ij) W parent(ij)

14 Rare rate modeling Model fitting with a 2-pass Kalman filter:  Filtering: Leaf to root  Smoothing: Root to leaf Linear in the number of regions

15 Experiments 503M impressions 7-level hierarchy of which the top 3 levels were used Zero clicks in  76% regions in level 2  95% regions in level 3 Full dataset DFULL, and a 2/3 sample DSAMPLE

16 Experiments Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE Some of these regions R >0 get clicks in DFULL A good model should predict higher CTRs for R >0 as against the other regions in R

17 Experiments We compared 4 models  TS: our tree-structured model  LM (level-mean): each level smoothed independently  NS (no smoothing): CTR proportional to 1/Ñ  Random: Assuming |R >0 | is given, randomly predict the membership of R >0 out of R

18 Experiments TS Random LM, NS

19 Experiments Enough impressions  little “borrowing” from siblings Few impressions  Estimates depend more on siblings

20 Related Work Multi-resolution modeling  studied in time series modeling and spatial statistics [Openshaw+/79, Cressie/90, Chou+/94] Imputation  studied in statistics [Darroch+/1972] Application of such models to estimation of such rare events (rates of ~10 -3 ) is novel

21 Conclusions We presented a method to estimate  rates of extremely rare events  at multiple resolutions  under severe sparsity constraints Our method has two parts  Imputation  incorporates hierarchy, fixes sampling bias  Tree-structured generative model  extremely fast parameter fitting

22 Rare rate modeling 1.Freeman-Tukey transform  Distinguishes between regions with zero clicks based on the number of impressions  Variance stabilizing transformation: Var(y) is independent of E[y]  needed in further modeling ~~ # clicks in region r # impressions in region r

23 Rare rate modeling Generative Model  S ij values can be quickly estimated using a Kalman filtering algorithm  Kalman filter requires knowledge of β, V, and W  EM wrapped around the Kalman filter filtering smoothing

24 Rare rate modeling Fitting using a Kalman filtering algorithm  Filtering: Recursively aggregate data from leaves to root  Smoothing: Propagate information from root to leaves Complexity: linear in the number of regions, for both time and space filtering smoothing

25 Rare rate modeling Fitting using a Kalman filtering algorithm  Filtering: Recursively aggregate data from leaves to root  Smoothing: Propagates information from root to leaves Kalman filter requires knowledge of β, V, and W  EM wrapped around the Kalman filter filtering smoothing

26 Imputing x ij Z (i) Z (i+1) Iterative Proportional Fitting [Darroch+/1972] Initialize x ij = n ij + m ij Top-down: Scale all x ij in every block in Z (i+1) to sum to its parent in Z (i) Scale all x ij in Z (i+1) to sum to the row totals Scale all x ij in Z (i+1) to sum to the column totals Repeat for every level Z(i) Bottom-up: Similar block Page classes Ad classes