BuzzTrack Topic Detection and Tracking in IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich Roger Wattenhofer.

Slides:



Advertisements
Similar presentations
Google Series Part 1: gmail Part 2: maps Part 3: talk Part 4: earth Part 5: books Part 6: picasa Part 7: sites Part x: ?
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
The Internet 8th Edition Tutorial 3 Using Web-Based Services for Communication and Collaboration.
CMo: When Less Is More Yevgen Borodin Jalal Mahmud I.V. Ramakrishnan Context-Directed Browsing for Mobiles.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Information Retrieval in Practice
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Overview of Search Engines
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Using Microsoft Outlook: Basics. Objectives Guided Tour of Outlook –Identification –Views Basics –Contacts –Folders –Web Access Q&A.
» Explain the way that electronic mail ( ) works » Configure an client » Identify message components » Create and send messages.
Practical PC, 7 th Edition Chapter 9: Sending and Attachments.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Computer Concepts 2014 Chapter 7 The Web and .
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Advanced User Guide to Outlook and all its features.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Google Apps in Brief International School Dhaka ICT Department.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
The Internet 8th Edition Tutorial 2 Basic Communication on the Internet: .
Using gmail Works in any browser User experience better in some browsers than others I use Google’s Chrome browser FAST!! Get the most screen real estate.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Zimbra 8 Tips & Tricks Enterprise Applications Information Technology Services April 2014.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
Using . Creating and Sending Messages The Inbox view serves as Outlook’s interface Click the Inbox icon in the Outlook Bar or Folder List.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
What’s new in the GroupWise 7. Client October, 2007.
Outlook Web App Crash course. Outlook Agenda Login Login Reset Password Reset Password Getting Started in Outlook Web App Getting Started in Outlook Web.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
The Role of Metadata in Machine Learning for TAR Amanda Jones Marzieh Bazrafshan Fernando Delgado Tania Lihatsh Tami Schuyler
인지구조기반 마이닝 소프트컴퓨팅 연구실 박사 2 학기 박 한 샘 2006 지식기반시스템 응용.
Spam Detection Ethan Grefe December 13, 2013.
The Internet 8th Edition Tutorial 3 Using Web-Based Services for Communication and Collaboration.
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Effective Information Access Over Public Archives Progress Report William Lee, Hui Fang, Yifan Li For CS598CXZ Spring 2005.
Microsoft Outlook 2010 Instructor: Julie Thorngren
Managing Your Inbox. Flagging Messages Message requires a specific response or action from the recipient Flagging draws attention to your request Quick.
Stephanie McFarland Knowledge Management Systems February 22, 2005.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Information Retrieval
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
 Left Side  Mail/Contacts/Tasks  Labeled Folders  Contacts – “IM” Feature  Right Side  s.
Technical Awareness on Analysis of Headers.
Intelligent Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.
THE UNIVERSITY OF TEXAS AT AUSTIN School of Information Marie Hwang INF 385T: Knowledge Management Systems February 18, 2003 Week 6: .
Knowledge Management Systems Week 5 Schedule -Syllabus Updates Questions Assignments -Blogging More Commentary Evaluations of the blog process - .
Lesson 10—Networking BASICS1 Networking BASICS The Internet and Its Tools Unit 3 Lesson 10.
Classification Results for Folder Classification on Enron Dataset.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
XP Exploring Outlook  Outlook is a powerful information manager  You can use Outlook to perform a wide range of communication and organizational tasks,
Motivation Conclusion Effective Access Over Public Conversations William Lee, Hui Fang and Yifan Li University of Illinois at Urbana-Champaign Clustering.
Smart Calendar Chrome Extension v Dec. 28, 2010 Kyoungryol Kim 1.
By Laurel Johnson Young Adult Librarian Gmail Basics.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
IR Homework #2 By J. H. Wang May 9, Programming Exercise #2: Text Classification Goal: to classify each document into predefined categories Input:
Information Retrieval in Practice
What is GroupWise? A tool for communication in our organization A system to send and receive A way to increase productivity A method to get documents.
Gmail Basics By Laurel Johnson Young Adult Librarian.
Internet Business Associate v2.0
Search Engine Architecture
Next Gen: Campus Collaboration
CSE 635 Multimedia Information Retrieval
Guided Research: Intelligent Contextual Task Support for Mails
Using Microsoft Outlook: Outlook Support Number
Presentation transcript:

BuzzTrack Topic Detection and Tracking in IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich Roger Wattenhofer ETH Zurich Gabor Cselle Google

2 Overload clients were not designed to handle volume and variety of messages users are dealing with today: Large volumes of Task Management Personal Archiving or Filing Keeping Context [Whittaker and Sidner, 1996]

3 Search vs. Inbox Browsing Fast full-text search is today's solution to finding past s. But the flat inbox view of newly incoming s hasn’t changed. In our work, we focus on the problem of sensibly structuring s in the inbox.

4 Today's Clients: The Three-Pane View No sense of context: unrelated messages are shown together Important s may drop off the “first screen” “Thread-based” tree views are unsophisticated, may not pull in all relevant messages.

5 BuzzTrack client extension for Mozilla Thunderbird for displaying grouped by topic.

6 Related Work

7 Visualizations: Conversations Gmail (Google) common conversation title one entry per , folds out on click

8 Automatic Foldering Using machine learning techniques to automatically move s into folders upon arrival Low accuracy rates [Bekkerman et al, 2005], conceptual problems: Users need to manually create folders and seed them with data.

9 People-Centered Clients Bifrost ContactMap [Bälter and Sidner, 2002] [Whittaker et al., 2004]

10 Task-based Example: TaskMaster thrasks thrask contents item contents ( s, documents, etc.) TaskMaster [Belotti et al., 2003]

11 BuzzTrack

12 BuzzTrack Mozilla Thunderbird extension to automatically group related s into topics. Will be distributed through website: Provides a view on the user’s inbox.

13 What’s a Topic? Topics are groups of s that relate to the same idea, action, event, task, or question. Examples: A conversation about buying a digital camera. Referring a candidate for a job. All s belonging to same newsgroup.

14 Clustering Process For every new incoming PreprocessingClustering Label generation Cluster store BuzzTrack View in Thunderbird

15 Preprocessing Tokenization (remove HTML tags, style sheets, punctuation, and numbers) Language detection Stemming For topic labelling: Identify Parts-of-speech Remember popular original word forms

16 Clustering Single-link clustering: Newly incoming s are compared to every in existing topics: Similarity value > threshold: assigned to topic Similarity value <= threshold: starts new topic

17 Features - 1 How do we generate similarity values between s? Via a linear combination of several similarity features. Examples: Text similarity (TFIDF Value, cosine similarity metric) People similarities (comparing sets of people in the From / To / Cc lines of headers) Thread membership

18 Features - 2 Other features for deriving similarities: Subject similarity Sender domain overlaps Sender rank and percentage % of from sender that is answered Time passed since last in topic People and reference count for Known people and reference % Cluster size Has attachment

19 Decision Score Similarities are combined into a decision score for each / cluster pair through a linear combination of feature values: dec i,j = w a *sim a (mi,Cj) + w b *sim b (mi,Cj) + … We tested two sets of weights w x, both trained on a development set of s: Empirical Linear SVM

20 Evaluation How do we evaluate clustering quality? Topic Detection and Tracking competitions by NIST. Aimed at clustering news articles. Corpus:

21 Clustering Tasks Clustering Task is split into subtasks: New Topic Detection (NTD): Given stream of s, which ones start new topics? Topic Tracking (TT): Given a fixed topic, which newly incoming s belong to it? DET Curves plot miss rate vs. false alarm rate for possible threshold for decision scores

22 Results NTD TDT New Topic Detection Task Miss: 3% False alarm: 30% better

23 Results TT TDT Topic Tracking Task Miss: 8% False alarm: 2% better

24 Comparison Comparable quality to TDT for news articles [NIST 2004] News has less metadata, has worse text quality. Wide body of work exists on improving clustering performance on news, we haven’t tapped into that yet.

25 BuzzTrack View Mozilla Thunderbird plugin that provides useful view on inbox data “for free” Topics contain from last 60 days We’re interested in current only Reduces initial clustering time Each is shown in one topic

26

27 Demo 1: BuzzTrack

28 BuzzTrack Panes Topic pane: Provides additional info Starred topics pane: Topics sorted by last incoming

29 Future Work Distribute plugin to Thunderbird users Input on possible UI improvements Input on clustering quality Different clustering styles People-based Thread-based We hope BuzzTrack will be valuable tool for real-world users

30 Questions? Contact: Gabor Cselle, Website: