Deep Interest Network for Click-Through Rate Prediction

Slides:

Advertisements

Similar presentations

PHONE MODELING AND COMBINING DISCRIMINATIVE TRAINING FOR MANDARIN-ENGLISH BILINGUAL SPEECH RECOGNITION Yanmin Qian, Jia Liu ICASSP2010 Pei-Ning Chen CSIE.

Advertisements

A brief review of non-neural-network approaches to deep learning

Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

Image Denoising and Inpainting with Deep Neural Networks Junyuan Xie, Linli Xu, Enhong Chen School of Computer Science and Technology University of Science.

Adapting Deep RankNet for Personalized Search

1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.

Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )

Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,

Rongxiang Hu, Wei Jia, Haibin ling, and Deshuang Huang Multiscale Distance Matrix for Fast Plant Leaf Recognition.

Authors : Ramon F. Astudillo, Silvio Amir, Wang Lin, Mario Silva, Isabel Trancoso Learning Word Representations from Scarce Data By: Aadil Hayat (13002)

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.

© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Feedforward semantic segmentation with zoom-out features

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

E-commerce in Your Inbox Product Recommendations at Scale

Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

ClusCite:Effective Citation Recommendation by Information Network-Based Clustering Date: 2014/10/16 Author: Xiang Ren, Jialu Liu,Xiao Yu, Urvashi Khandelwal,

Naifan Zhuang, Jun Ye, Kien A. Hua

He Xiangnan Research Fellow National University of Singapore

Click Through Rate Prediction for Local Search Results

Convolutional Neural Network

Where Did You Go: Personalized Annotation of Mobility Records

Open question answering over curated and extracted knowledge bases

Building Adaptive Basis Function with Continuous Self-Organizing Map

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Learning Mid-Level Features For Recognition

Pick samples from task t

Source: Procedia Computer Science（2015）70:

Natural Language Processing of Knee MRI Reports

On the Generative Discovery of Structured Medical Knowledge

Distributed Representations of Subgraphs

Lecture 23: Feature Selection

Bird-species Recognition Using Convolutional Neural Network

Convolutional Neural Networks for sentence classification

A Large Scale Prediction Engine for App Install Clicks and Conversions

Two-Stream Convolutional Networks for Action Recognition in Videos

ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power

Word Embedding Word2Vec.

Measuring the Latency of Depression Detection in Social Media

Outline Background Motivation Proposed Model Experimental Results

John H.L. Hansen & Taufiq Al Babba Hasan

Unsupervised Pretraining for Semantic Parsing

Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.

Enriching Taxonomies With Functional Domain Knowledge

Artificial Intelligence 10. Neural Networks

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Neural networks (3) Regularization Autoencoder

HeteroMed: Heterogeneous Information Network for Medical Diagnosis

Department of Computer Science Ben-Gurion University of the Negev

Discovering Important Nodes through Graph Entropy

Deep Object Co-Segmentation

Ask and Answer Questions

Using Clustering to Make Prediction Intervals For Neural Networks

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Heterogeneous Graph Attention Network

Visual Grounding.

Do Better ImageNet Models Transfer Better?

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Deep Interest Network for Click-Through Rate Prediction 5/26/2019 Advisor: Jia-Ling Koh Presenter: You-Xiang Chen Source: kdd 2018 Data: 2019/03/25

Outline Introduction Method Experiment Conclusion

Introduction The click-through rate of an advertisement is the number of times a click is made on the ad, divided by the number of times the ad is served. 𝑪𝑻𝑹= 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒍𝒊𝒄𝒌−𝒕𝒉𝒓𝒐𝒖𝒈𝒉𝒔 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒊𝒎𝒑𝒓𝒆𝒔𝒔𝒊𝒐𝒏𝒔 ×𝟏𝟎𝟎%

Motivation Deep learning based methods have been proposed for CTR prediction task , but they’re difficulty to capture user’s diverse interests effectively from rich historical behaviors. Training industrial deep networks with large scales parse features is a great challenge.

Goal Propose a novel model : Deep Interest Network (DIN) , which designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad. To make the computation acceptable , we develop a novel mini-batch aware regularization.

Outline Introduction Method Experiment Conclusion

Base Model (Embedding & MLP)

Feature sets in Alibaba

Feature Representation Encoding vector of feature group 𝑡 𝑖 ∈ 𝑅 𝐾 𝑖 , 𝐾 𝑖 denotes the dimensionality of feature group 𝑖 𝑡 𝑖 [j] ∈ {0,1} 𝑗=1 𝐾 𝑖 𝑡 𝑖 [j] = k 𝑥 = [ 𝑡 1 𝑇 , 𝑡 2 𝑇 ,…, 𝑡 𝑀 𝑇 ] 𝑖=1 𝑀 𝐾 𝑖 = K , K is dimensionality of entire feature group

Embedding layer Embedding dictionary 𝑊 𝑖 =[ 𝑤 1 𝑖 , 𝑤 2 𝑖 ,…, 𝑤 𝑗 𝑖 ,…, 𝑤 𝐾 𝑖 𝑖 ] ∈ ℝ 𝐷× 𝐾 𝑖 Embedding vector 𝑒 𝑖 = 𝑤 𝑗 𝑖 , 𝑡 𝑖 𝑖𝑠 𝑜𝑛𝑒−ℎ𝑜𝑡 𝑒 𝑖 1 𝑖 , 𝑒 𝑖 2 𝑖 ,…, 𝑒 𝑖 𝑘 𝑖 = 𝑤 𝑖 1 𝑖 , 𝑤 𝑖 2 𝑖 ,…, 𝑤 𝑖 𝑘 𝑖 , 𝑡 𝑖 𝑖𝑠 𝑚𝑢𝑙𝑡𝑖−ℎ𝑜𝑡

Pooling & Concat layer 𝑒 𝑖 =pooling( 𝑒 𝑖 1 , 𝑒 𝑖 2 ,…, 𝑒 𝑖 𝑘 )

Objective function Negative log-likelihood function

Deep Interest Network

Activation Unit

L2 Regularization 𝒚=𝒂+𝒃𝒙+ 𝒄𝒙 𝟐 + 𝒅𝒙 𝟑 𝒚=𝒂+𝒃𝒙+ 𝟎𝒙 𝟐 + 𝟎𝒙 𝟑 𝐉 𝜽 = [ 𝒚 𝜽 𝒙 −𝐲] 𝟐 +[ 𝜽 𝟏 𝟐 + 𝜽 𝟐 𝟐 +…] 𝐉 𝜽 = [ 𝒚 𝜽 𝒙 −𝐲] 𝟐 +𝝀 𝜽 𝒊 𝟐

Mini-batch Aware Regularization Expand the regularization on W over samples Transformed into following in the mini-batch aware manner

Data Adaptive Activation Function PReLU activation function Dice

Outline Introduction Method Experiment Conclusion

Datasets

Train & Test Loss 虛線 Overfitting 深綠色:沒有正規化 Training with good id is better than without (weighting)

Test AUC

Performance

Best AUCs of Base Model with different regularizations (comparing with first line) 有無使用 good_ids 和正規化 Dropout 隨機丟掉 50% 的feature Filter : 把很常出現的feature 移出，這邊把前2000萬筆常出現的feature移出，對剩下的做reg Difacto : 另一篇提到的正規化方法，概念是較常出現的feature 少做正規化。 MBA : 我們的方法 mini batch aware

Model Comparison on Alibaba Dataset with full feature sets

Illustration of adaptive activation in DIN

Outline Introduction Method Experiment Conclusion

Conclusion A novel approach named DIN is designed to activate related user behaviors and obtain an adaptive representation vector for user interests which varies over different ads. Two novel techniques are introduced to help training industrial deep networks and further improve the performance of DIN.