Personalized Celebrity Video Search Based on Cross-space Mining Zhengyu Deng, Jitao Sang, Changsheng Xu 1 Institute of Automation, Chinese Academy of Sciences 2 Chinese-Singapore Institute of Digital Media
Outline Motivation Framework Approach Experiment Conclusions
Motivation Celebrities are often popular in multiple fields and user interests are diverse. User 1 User 2 User 3 Sports video Entertainment video Interview video Beckham like
Motivation Celebrities are often popular in multiple fields and user interests are diverse. User Sports video Entertainment video Music video like Beckham Bieber Lady Gaga
Non-personalized search Motivation Non-personalized search David Beckham Daily life Sports Interview
Problem and solution Motivation Problem Solution Users have different interest distribution and celebrities have different popularity distribution. How to match user interest and celebrity popularity? Problem Learn interest space of users and popularity space of celebrities, then correlate the two spaces. Solution
Framework Query User Interest space Map Associated tags Search engine Celebrity Interest space Popularity space Map Re-rank Search engine Topic modeling Associated tags Query 2019/2/19
Approach … … … … … U1 C1 U2 C2 Um Cn Random walk LDA LDA KL-Divergence User Interest Space Celebrity … … … … … Vocabulary Popularity Space U1 C1 Z1 W1 X1 U2 Z2 X2 C2 W2 Um Zp Wx Xq Cn Random walk LDA LDA KL-Divergence P(Zi|Ui) P(Wi|Zi) P(Wi|Ti) P(Ti|Ci) P(Zi|Xi)
Approach Random walk Vj is the initial probabilistic score; pij is the transition matrix; rk(j) donote the relvence score of node j at iteration k (1) Rewrite as (2) The unique solution (3)
KL-Divergence Approach 𝑇𝑜𝑝𝑖𝑐 𝑧 (𝑥) is from interest (popularity) space. The KL-Divergence between them is 𝐷 𝐾𝐿 (𝑧 || 𝑥)= 1 2 ( 𝑖 𝑧(𝑖)𝑙𝑛 𝑧 𝑖 𝑥 𝑖 + 𝑖 𝑥(𝑖)𝑙𝑛 𝑥(𝑖) 𝑧(𝑖) ) (4) where 𝑧(𝑖) (𝑥 𝑖 ) denote the distribution score of topic 𝑧 𝑥 on word 𝑖. The similarity 𝑠𝑧𝑥 of topic 𝑧 and 𝑥 is defined as the inverse of KL-Divergence. 𝑠𝑧𝑥=1/𝐷 𝐾𝐿 (𝑧 || 𝑥) (5)
Video Projection Approach Given a celebrity video 𝑣 𝑀×1 , project it to interest space Φ 𝐾×𝑀 𝑣 𝐾×1 ′ = Φ 𝐾×𝑀 𝑣 𝑀×1 (6) where K is the topic number of interest space. M is the dimension of the vocabulary.
Video re-ranking Approach 𝑝 𝑠𝑐𝑜𝑟𝑒 𝑣,𝑢,𝑐) (7) Given a user 𝑢 and celebrity 𝑐, the score of 𝑣 is 𝑝 𝑠𝑐𝑜𝑟𝑒 𝑣,𝑢,𝑐) (7) = 𝑖=1 𝐾 𝑃 𝑧 𝑖 𝑣 𝑝 𝑧 𝑖 𝑢 𝑝 𝑧 𝑖 𝑐 = 𝑖=1 𝐾 𝑃 𝑧 𝑖 𝑣 𝑝 𝑧 𝑖 𝑢 𝑗=1 𝐿 𝑃 𝑥 𝑗 𝑐 𝑝 𝑧 𝑖 𝑥 𝑗 where K(L) is the topic number of interest (popularity) space, 𝑧 𝑖 ( 𝑥 𝑗 ) is the 𝑖 th (𝑗 th) topic of interest (popularity) space, 𝑝 𝑧 𝑖 𝑥 𝑗 is approximated by the inverse of KL-Divergence.
Data Preparation Experiments Celebrity list The World's Most Powerful 100 Celebrities List http://www.forbes.com/wealth/celebrities/list The 30 Most Generous Celebrities http://www.forbes.com/sites/andersonantunes/2012/01/11/the-30-most-generous-celebrities/3/ Top 200 Sexiest Actor http://www.imdb.com/list/Uun6vT7hWeM/ For each celebrity, 200 videos are downloaded from YouTube.
User and Celebrity Profiling Experiments User and Celebrity Profiling User registration info., favorite and uploaded videos raw tags stop words WorldNet noun tags. Celebrity Wikipedia Entry WorldNet noun tags celebrity user total Size 286 200 486 Tags Number 11424 5833 12073
Experimental Setting Experiments Experiment data Experiment setup 143 users 106 celebrities Experiment setup Each user have some videos related with a specific celebrity. Leave this videos out and learn topics. Rank this celebrity’s videos for the user. Evaluation f-Measure
Experiments Topic simples
Doc-Topics distribution Experiments Doc-Topics distribution E.g. Celebrity ”Beckham” Topic Probability of appearance 7 0.6229086229086229 4 0.1956241956241956 0 0.04967824967824968 8 0.03963963963963964 3 0.022393822393822392 1 0.018532818532818532 6 0.017245817245817245 9 0.014414414414414415 5 0.014414414414414415 2 0.005148005148005148
Topic-terms distribution Experiments Topic-terms distribution E.g. <topic id=“7"> <word weight="0.018062955825114312" count="478">jay</word> <word weight="0.01726939500434569" count="457">messi</word> <word weight="0.016891508899217776" count="447">real</word> <word weight="0.016551411404602652" count="438">ronaldo</word> <word weight="0.01640025696255149" count="434">kanye</word> <word weight="0.015644484752295656" count="414">west</word> <word weight="0.015606696141782866" count="413">wayne</word> <word weight="0.014964289763065412" count="396">lil</word> <word weight="0.013414956732040963" count="355">hop</word> <word weight="0.013226013679477006" count="350">lionel</word> <word weight="0.01311264784793863" count="347">beckham</word> <word weight="0.01231908702717001" count="326">beyonce</word> <word weight="0.012054566753580472" count="319">cristiano</word> <word weight="0.011941200922042096" count="316">soccer</word> <word weight="0.011941200922042096" count="316">football</word> … …
Topic-terms distribution Experiments Topic-terms distribution E.g. <topic id=“4"> <word weight="0.026509629402286503" count="1382">show</word> <word weight="0.014904473260185682" count="777">david</word> <word weight="0.014444103429755236" count="753">ellen</word> <word weight="0.01430982889587969" count="746">tv</word> <word weight="0.012027161819995396" count="627">comedy</word> <word weight="0.01112560423540244" count="580">jennifer</word> <word weight="0.010857055167651347" count="566">interview</word> <word weight="0.010550141947364382" count="550">degeneres</word> <word weight="0.010166500422005677" count="530">funny</word> <word weight="0.009245760761144787" count="482">letterman</word> <word weight="0.008689480549374665" count="453">hollywood</word> <word weight="0.008497659786695312" count="443">late</word> <word weight="0.007979743727461061" count="416">talk</word> <word weight="0.007615284278370291" count="397">celebrity</word> <word weight="0.006943911608992557" count="362">television</word> … …
Experiments Different approaches
Experiments Impact of random walk
Conclusions Conclusions Future work We presented a cross-space mining method to exploit the correlation between user preferences and celebrity popularities. Future work Instead of returning a ranking list, we will try to visualize the search results into semantically consistent groups. Investigate the issue of personalized query understanding in more general personalized search applications.
Thank you! Q&A?