Download presentation
Presentation is loading. Please wait.
Published byAmanda RΓ‘cz Modified over 5 years ago
1
Deep Interest Network for Click-Through Rate Prediction
5/26/2019 Advisor: Jia-Ling Koh Presenter: You-Xiang Chen Source: kdd 2018 Data: 2019/03/25
2
Outline Introduction Method Experiment Conclusion
3
Introduction The click-through rate of an advertisement is the number of times a click is made on the ad, divided by the number of times the ad is served. πͺπ»πΉ= π΅πππππ ππ πππππβππππππππ π΅πππππ ππ πππππππππππ Γπππ%
4
Motivation Deep learning based methods have been proposed for CTR prediction task , but theyβre difficulty to capture userβs diverse interests effectively from rich historical behaviors. Training industrial deep networks with large scales parse features is a great challenge.
5
Goal Propose a novel model : Deep Interest Network (DIN) , which designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad. To make the computation acceptable , we develop a novel mini-batch aware regularization.
6
Outline Introduction Method Experiment Conclusion
7
Base Model (Embedding & MLP)
8
Feature sets in Alibaba
9
Feature Representation
Encoding vector of feature group π‘ π β π
πΎ π , πΎ π denotes the dimensionality of feature group π π‘ π [j] β {0,1} π=1 πΎ π π‘ π [j] = k π₯ = [ π‘ 1 π , π‘ 2 π ,β¦, π‘ π π ] π=1 π πΎ π = K , K is dimensionality of entire feature group
10
Embedding layer Embedding dictionary
π π =[ π€ 1 π , π€ 2 π ,β¦, π€ π π ,β¦, π€ πΎ π π ] β β π·Γ πΎ π Embedding vector π π = π€ π π , π‘ π ππ πππββππ‘ π π 1 π , π π 2 π ,β¦, π π π π = π€ π 1 π , π€ π 2 π ,β¦, π€ π π π , π‘ π ππ ππ’ππ‘πββππ‘
11
Pooling & Concat layer π π =pooling( π π 1 , π π 2 ,β¦, π π π )
12
Objective function Negative log-likelihood function
13
Deep Interest Network
14
Activation Unit
15
L2 Regularization π=π+ππ+ ππ π + π
π π π=π+ππ+ ππ π + ππ π
π π½ = [ π π½ π βπ²] π +[ π½ π π + π½ π π +β¦] π π½ = [ π π½ π βπ²] π +π π½ π π
16
Mini-batch Aware Regularization
Expand the regularization on W over samples Transformed into following in the mini-batch aware manner
17
Data Adaptive Activation Function
PReLU activation function Dice
18
Outline Introduction Method Experiment Conclusion
19
Datasets
20
Train & Test Loss θη· Overfitting ζ·±ηΆ θ²:ζ²ζζ£θ¦ε
Training with good id is better than without (weighting)
21
Test AUC
22
Performance
23
Best AUCs of Base Model with different regularizations (comparing with first line)
ζη‘δ½Ώη¨ good_ids ε ζ£θ¦ε Dropout ι¨ζ©δΈζ 50% ηfeature Filter : ζεΎεΈΈεΊηΎηfeature 移εΊοΌιιζε2000θ¬ηεΈΈεΊηΎηfeature移εΊοΌε°ε©δΈηεreg Difacto : ε¦δΈη―ζε°ηζ£θ¦εζΉζ³οΌζ¦εΏ΅ζ―θΌεΈΈεΊηΎηfeature ε°εζ£θ¦εγ MBA : ζεηζΉζ³ mini batch aware
24
Model Comparison on Alibaba Dataset with full feature sets
25
Illustration of adaptive activation in DIN
26
Outline Introduction Method Experiment Conclusion
27
Conclusion A novel approach named DIN is designed to activate related user behaviors and obtain an adaptive representation vector for user interests which varies over different ads. Two novel techniques are introduced to help training industrial deep networks and further improve the performance of DIN.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.