On Frequent Chatters Mining Claudio Lucchese 1 st HPC Lab Workshop 6/15/12 1st HPC Workshp - Claudio Lucchese.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

A probabilistic model for retrospective news event detection
Alexander Kotov, ChengXiang Zhai, Richard Sproat University of Illinois at Urbana-Champaign.
Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud
Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Active Learning and Collaborative Filtering
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Information Retrieval in Practice
Personalized Search Result Diversification via Structured Learning
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
Web Mining Research: A Survey
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Automatic Blog Monitoring and Summarization Ka Cheung “Richard” Sia PhD Prospectus.
Overview of Search Engines
Time-Sensitive Web Image Ranking and Retrieval via Dynamic Multi-Task Regression Gunhee Kim Eric P. Xing 1 School of Computer Science, Carnegie Mellon.
Coletto, Lucchese, Orlando, Perego ELECTORAL PREDICTIONS WITH TWITTER: A MACHINE-LEARNING APPROACH M. Coletto 1,3, C. Lucchese 1, S. Orlando 2, and R.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Social Network Analysis via Factor Graph Model
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Rui Yan, Yan Zhang Peking University
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Learning to Question: Leveraging User Preferences for Shopping Advice Author : Mahashweta Das, Aristides Gionis, Gianmarco De Francisci Morales, and Ingmar.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Similar Document Search and Recommendation Vidhya Govindaraju, Krishnan Ramanathan HP Labs, Bangalore, India JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Data Mining and Machine Learning Lab Unsupervised Feature Selection for Linked Social Media Data Jiliang Tang and Huan Liu Computer Science and Engineering.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
--He Xiangnan PhD student Importance Estimation of User-generated Data.
Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock,
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
Prediction of Influencers from Word Use Chan Shing Hei.
OCLC Online Computer Library Center 1 Social Media and Advocacy.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts Zhe Zhao Paul Resnick Qiaozhu Mei Presentation Group 2.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
The YouTube Video Recommendation System James Davidson Benjamin Liebald Junning Liu Palash Nandy Taylor Van Vleet (Google inc) Presented by Thuat Nguyen.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Unsupervised Streaming Feature Selection in Social Media
Artificial Intelligence Techniques Internet Applications 4.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
Information Retrieval in Practice
Reducing controversy by connecting opposing views
DM-Group Meeting Liangzhe Chen, Nov
Personalized Social Image Recommendation
Clustering tweets and webpages
#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),
Navi 下一步工作的设想 郑 亮 6.6.
Topic: Semantic Text Mining
Presentation transcript:

On Frequent Chatters Mining Claudio Lucchese 1 st HPC Lab Workshop 6/15/12 1st HPC Workshp - Claudio Lucchese

Frequent Patterns Mining How may patterns do you see in the following dataset ? ABCDEFGHIJKLM /15/12 1st HPC Workshp - Claudio Lucchese Claudio Lucchese, Salvatore Orlando, Raffaele Perego: Mining Top-K Patterns from Binary Datasets in Presence of Noise. SDM 2010

ABCDEFGHIJKLM Frequent Patterns Mining 6/15/12 1st HPC Workshp - Claudio Lucchese

Frequent Patterns Mining usually rows and cols are not in “good-looking” order 6/15/12 1st HPC Workshp - Claudio Lucchese

State of the art Most recent approaches try to discover the top- k patterns that optimize different cost functions: Minimize Noise (“holes”) or Minimize MDL encoding(Patterns) + encoding(Data|Patterns) Maximize Information Ratio: Number of bits of information w.r.t. to the Maximum Entropy Model built on the basis of rows and cols marginal distribution Minimize length of patterns and the amount of noise ( our approach =) 6/15/12 1st HPC Workshp - Claudio Lucchese

Evaluation Unsupervised: Measure how well the proposed algorithm optimizes the proposed cost function What is the best cost function ? We are investigating supervised measures: Unsupervised extraction : extract patterns from classification/clustering dataset without class/cluster labels information Supervised evaluation : measure how well the patterns can predict/match classes/clusters Preliminary result: Fancy cost functions might not be the best ones 6/15/12 1st HPC Workshp - Claudio Lucchese

Information Overload in News 6/15/12 1st HPC Workshp - Claudio Lucchese Gianmarco De Francisci Morales, Aristides Gionis, Claudio Lucchese: From chatter to headlines: harnessing the real-time web for personalized news recommendation. WSDM 2012.

✓ Timeliness ✓ Personalization Can we exploit Twitter? Number of mentions of “Osama Bin Laden” 6/15/12 1st HPC Workshp - Claudio Lucchese

90% of the clicks happen within 2 days from publication Only a few occur early! News Get Old Soon 6/15/12 1st HPC Workshp - Claudio Lucchese

T.Rex (Twitter-based news recommendation system) Builds a user model from Twitter Signals from user generated content, social neighbors and popularity across Twitter and news Entity-based representation (overcomes vocabulary mismatch) Learn a personalized news ranking function: Pick up candidates from a pool of related or popular fresh news, rank them and present top-k to the user 6/15/12 1st HPC Workshp - Claudio Lucchese

Ranking function is user and time dependent Social model + Content model + Popularity model Popularity model tracks entity popularity by the number of mentions in Twitter and news (with exponential forgetting) Content model measures relatedness of a bag-of-entities representation of a users’ tweet stream and of a news article Social model weights the content model of every social neighbor by a truncated PageRank on the Twitter network Recommendation Model 6/15/12 1st HPC Workshp - Claudio Lucchese

✓ Designed to be streaming and lightweight (just counting) ✓ User model is updated continuously System Overview 6/15/12 1st HPC Workshp - Claudio Lucchese

Learning to rank approach with SVM Each time the user clicks on a news, we learn a set of preferences (clicked_news > non_clicked_news): Prune the number of constraints for scalability: only news published in the last 2 days only take the top-k news for each ranking component Can optionally include additional features for news articles: click count, age, etc... (T.Rex+) Learning the Weights 6/15/12 1st HPC Workshp - Claudio Lucchese

✓ User generated content is a very good predictor albeit very sparse ✓ Click Count is a strong baseline but does not help T.Rex+ Predicting Clicked News 6/15/12 1st HPC Workshp - Claudio Lucchese

Predicting Clicked Entities 6/15/12 1st HPC Workshp - Claudio Lucchese

Future works (?) Explain a set of news showing how the main topics interacted with each other over time. 6/15/12 1st HPC Workshp - Claudio Lucchese

Future works (?) Explain a set of news showing how the main topics interacted with each other over time. Example: European sovereign-debt crisis tim e Merkel Monti France Berlusconi Greece EU New Italian government Fiscal Compact EuroBond Obama Loan 6/15/12 1st HPC Workshp - Claudio Lucchese

Future works (?) Explain a set of news showing how the main topics interacted with each other over time. Applications: Given the news the user is currently reading, provide an explanation of the related facts that precede that news Given a query, provide an explanation of the documents related to that query Given a set of topics, explain their relations over time Browse a collection of news, by changing the topics of interest, the time window, the granularity 6/15/12 1st HPC Workshp - Claudio Lucchese

Future works (?) Explain a set of news showing how the main topics interacted with each other over time. A topic is a named entity relevant over time An interaction is a cluster of news related to some event and relevant in a small time window It might be important to cover the given time window, but recent events might be more interesting 6/15/12 1st HPC Workshp - Claudio Lucchese

Future works (?) Explain a set of news showing how the main topics interacted with each other over time. Given a maximum number of main topics and interactions, maximize: Topic coverage and diversity Events time coverage Cluster similarity Main topics connectivity 6/15/12 1st HPC Workshp - Claudio Lucchese

Future works (?) Explain a set of news showing how the main topics interacted with each other over time. Its is different from news clustering: Even if you had a good clustering, might not be trivial to select which events and which topics to show in order to maximize the amount of information delivered to the user There is some interesting related work aimed at finding chains of news, we are more interested in topic evolution 6/15/12 1st HPC Workshp - Claudio Lucchese

Thank you ! 6/15/12 1st HPC Workshp - Claudio Lucchese