Personalized Celebrity Video Search Based on Cross-space Mining

Slides:



Advertisements
Similar presentations
1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.
Advertisements

DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Chapter 5: Introduction to Information Retrieval
Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST.
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
Dong Liu Xian-Sheng Hua Linjun Yang Meng Weng Hong-Jian Zhang.
Ming Yan, Jitao Sang, Tao Mei, ChangSheng Xu
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.
Mining Cross-network Association for YouTube Video Promotion Ming Yan Institute of Automation, C hinese Academy of Sciences May 15, 2014.
Friends and Locations Recommendation with the use of LBSN
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Right Buddy Makes the Difference: an Early Exploration of Social Relation Analysis in Multimedia Applications Jitao Sang, Changsheng Xu*. 1 Institute of.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
A Generic Virtual Content Insertion System Based on Visual Attention Analysis H. Liu 1, 2, S. Jiang 1, Q. Huang 1, 2, C. Xu 2, 3 1 Institute of Computing.
Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz.
Popularity-Aware Topic Model for Social Graphs Junghoo “John” Cho UCLA.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Mining Cross-network Association for YouTube Video Promotion Ming Yan, Jitao Sang, Changsheng Xu*. 1 Institute of Automation, Chinese Academy of Sciences,
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Tag Ranking Present by Jie Xiao Dept. of Computer Science Univ. of Texas at San Antonio.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Friends and Locations Recommendation with the use of LBSN By EKUNDAYO OLUFEMI ADEOLA
Chapter 6: Information Retrieval and Web Search
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.
 Goal recap  Implementation  Experimental Results  Conclusion  Questions & Answers.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Hongbo Deng, Michael R. Lyu and Irwin King
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
User Characterization in Search Personalization
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
WSRec: A Collaborative Filtering Based Web Service Recommender System
Search Engines and Link Analysis on the Web
Topics and Transitions: Investigation of User Search Behavior
Chinese Academy of Sciences, Beijing, China
Personalized Social Image Recommendation
Martin Rajman, Martin Vesely
Community Distribution Outliers in Heterogeneous Information Networks
Thanks to Bill Arms, Marti Hearst
Wikitology Wikipedia as an Ontology
Discovering Functional Communities in Social Media
Zhengyu Deng, Jitao Sang, Changsheng Xu
Ying Dai Faculty of software and information science,
Learning to Rank Typed Graph Walks: Local and Global Approaches
Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Personalized Celebrity Video Search Based on Cross-space Mining Zhengyu Deng, Jitao Sang, Changsheng Xu 1 Institute of Automation, Chinese Academy of Sciences 2 Chinese-Singapore Institute of Digital Media

Outline Motivation Framework Approach Experiment Conclusions

Motivation Celebrities are often popular in multiple fields and user interests are diverse. User 1 User 2 User 3 Sports video Entertainment video Interview video Beckham like

Motivation Celebrities are often popular in multiple fields and user interests are diverse. User Sports video Entertainment video Music video like Beckham Bieber Lady Gaga

Non-personalized search Motivation Non-personalized search David Beckham Daily life Sports Interview

Problem and solution Motivation Problem Solution Users have different interest distribution and celebrities have different popularity distribution. How to match user interest and celebrity popularity? Problem Learn interest space of users and popularity space of celebrities, then correlate the two spaces. Solution

Framework Query User Interest space Map Associated tags Search engine Celebrity Interest space Popularity space Map Re-rank Search engine Topic modeling Associated tags Query 2019/2/19

Approach … … … … … U1 C1 U2 C2 Um Cn Random walk LDA LDA KL-Divergence User Interest Space Celebrity … … … … … Vocabulary Popularity Space U1 C1 Z1 W1 X1 U2 Z2 X2 C2 W2 Um Zp Wx Xq Cn Random walk LDA LDA KL-Divergence P(Zi|Ui) P(Wi|Zi) P(Wi|Ti) P(Ti|Ci) P(Zi|Xi)

Approach Random walk Vj is the initial probabilistic score; pij is the transition matrix; rk(j) donote the relvence score of node j at iteration k (1) Rewrite as (2) The unique solution (3)

KL-Divergence Approach 𝑇𝑜𝑝𝑖𝑐 𝑧 (𝑥) is from interest (popularity) space. The KL-Divergence between them is 𝐷 𝐾𝐿 (𝑧 || 𝑥)= 1 2 ( 𝑖 𝑧(𝑖)𝑙𝑛 𝑧 𝑖 𝑥 𝑖 + 𝑖 𝑥(𝑖)𝑙𝑛 𝑥(𝑖) 𝑧(𝑖) ) (4) where 𝑧(𝑖) (𝑥 𝑖 ) denote the distribution score of topic 𝑧 𝑥 on word 𝑖. The similarity 𝑠𝑧𝑥 of topic 𝑧 and 𝑥 is defined as the inverse of KL-Divergence. 𝑠𝑧𝑥=1/𝐷 𝐾𝐿 (𝑧 || 𝑥) (5)

Video Projection Approach Given a celebrity video 𝑣 𝑀×1 , project it to interest space Φ 𝐾×𝑀 𝑣 𝐾×1 ′ = Φ 𝐾×𝑀 𝑣 𝑀×1 (6) where K is the topic number of interest space. M is the dimension of the vocabulary.

Video re-ranking Approach 𝑝 𝑠𝑐𝑜𝑟𝑒 𝑣,𝑢,𝑐) (7) Given a user 𝑢 and celebrity 𝑐, the score of 𝑣 is 𝑝 𝑠𝑐𝑜𝑟𝑒 𝑣,𝑢,𝑐) (7) = 𝑖=1 𝐾 𝑃 𝑧 𝑖 𝑣 𝑝 𝑧 𝑖 𝑢 𝑝 𝑧 𝑖 𝑐 = 𝑖=1 𝐾 𝑃 𝑧 𝑖 𝑣 𝑝 𝑧 𝑖 𝑢 𝑗=1 𝐿 𝑃 𝑥 𝑗 𝑐 𝑝 𝑧 𝑖 𝑥 𝑗 where K(L) is the topic number of interest (popularity) space, 𝑧 𝑖 ( 𝑥 𝑗 ) is the 𝑖 th (𝑗 th) topic of interest (popularity) space, 𝑝 𝑧 𝑖 𝑥 𝑗 is approximated by the inverse of KL-Divergence.

Data Preparation Experiments Celebrity list The World's Most Powerful 100 Celebrities List http://www.forbes.com/wealth/celebrities/list The 30 Most Generous Celebrities http://www.forbes.com/sites/andersonantunes/2012/01/11/the-30-most-generous-celebrities/3/ Top 200 Sexiest Actor http://www.imdb.com/list/Uun6vT7hWeM/ For each celebrity, 200 videos are downloaded from YouTube.

User and Celebrity Profiling Experiments User and Celebrity Profiling User  registration info., favorite and uploaded videos  raw tags  stop words  WorldNet  noun tags. Celebrity  Wikipedia Entry  WorldNet  noun tags celebrity user total Size 286 200 486 Tags Number 11424 5833 12073

Experimental Setting Experiments Experiment data Experiment setup 143 users 106 celebrities Experiment setup Each user have some videos related with a specific celebrity. Leave this videos out and learn topics. Rank this celebrity’s videos for the user. Evaluation f-Measure

Experiments Topic simples

Doc-Topics distribution Experiments Doc-Topics distribution E.g. Celebrity ”Beckham” Topic Probability of appearance 7 0.6229086229086229 4 0.1956241956241956 0 0.04967824967824968 8 0.03963963963963964 3 0.022393822393822392 1 0.018532818532818532 6 0.017245817245817245 9 0.014414414414414415 5 0.014414414414414415 2 0.005148005148005148

Topic-terms distribution Experiments Topic-terms distribution E.g. <topic id=“7"> <word weight="0.018062955825114312" count="478">jay</word> <word weight="0.01726939500434569" count="457">messi</word> <word weight="0.016891508899217776" count="447">real</word> <word weight="0.016551411404602652" count="438">ronaldo</word> <word weight="0.01640025696255149" count="434">kanye</word> <word weight="0.015644484752295656" count="414">west</word> <word weight="0.015606696141782866" count="413">wayne</word> <word weight="0.014964289763065412" count="396">lil</word> <word weight="0.013414956732040963" count="355">hop</word> <word weight="0.013226013679477006" count="350">lionel</word> <word weight="0.01311264784793863" count="347">beckham</word> <word weight="0.01231908702717001" count="326">beyonce</word> <word weight="0.012054566753580472" count="319">cristiano</word> <word weight="0.011941200922042096" count="316">soccer</word> <word weight="0.011941200922042096" count="316">football</word> … …

Topic-terms distribution Experiments Topic-terms distribution E.g. <topic id=“4"> <word weight="0.026509629402286503" count="1382">show</word> <word weight="0.014904473260185682" count="777">david</word> <word weight="0.014444103429755236" count="753">ellen</word> <word weight="0.01430982889587969" count="746">tv</word> <word weight="0.012027161819995396" count="627">comedy</word> <word weight="0.01112560423540244" count="580">jennifer</word> <word weight="0.010857055167651347" count="566">interview</word> <word weight="0.010550141947364382" count="550">degeneres</word> <word weight="0.010166500422005677" count="530">funny</word> <word weight="0.009245760761144787" count="482">letterman</word> <word weight="0.008689480549374665" count="453">hollywood</word> <word weight="0.008497659786695312" count="443">late</word> <word weight="0.007979743727461061" count="416">talk</word> <word weight="0.007615284278370291" count="397">celebrity</word> <word weight="0.006943911608992557" count="362">television</word> … …

Experiments Different approaches

Experiments Impact of random walk

Conclusions Conclusions Future work We presented a cross-space mining method to exploit the correlation between user preferences and celebrity popularities. Future work Instead of returning a ranking list, we will try to visualize the search results into semantically consistent groups. Investigate the issue of personalized query understanding in more general personalized search applications.

Thank you! Q&A?