Download presentation
Presentation is loading. Please wait.
1
BuzzTrack Topic Detection and Tracking in Email IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich kenoa@tik.ee.ethz.ch Roger Wattenhofer ETH Zurich wattenhofer@tik.ee.ethz.ch Gabor Cselle Google gabor@google.com
2
2 Email Overload Email clients were not designed to handle volume and variety of messages users are dealing with today: Large volumes of email Task Management Personal Archiving or Filing Keeping Context [Whittaker and Sidner, 1996]
3
3 Search vs. Inbox Browsing Fast full-text search is today's solution to finding past emails. But the flat inbox view of newly incoming emails hasn’t changed. In our work, we focus on the problem of sensibly structuring emails in the inbox.
4
4 Today's Email Clients: The Three-Pane View No sense of context: unrelated messages are shown together Important emails may drop off the “first screen” “Thread-based” tree views are unsophisticated, may not pull in all relevant messages.
5
5 BuzzTrack Email client extension for Mozilla Thunderbird for displaying email grouped by topic.
6
6 Related Work
7
7 Visualizations: Conversations Gmail (Google) common conversation title one entry per email, folds out on click
8
8 Automatic Foldering Using machine learning techniques to automatically move emails into folders upon arrival Low accuracy rates [Bekkerman et al, 2005], conceptual problems: Users need to manually create folders and seed them with data.
9
9 People-Centered Email Clients Bifrost ContactMap [Bälter and Sidner, 2002] [Whittaker et al., 2004]
10
10 Task-based Email Example: TaskMaster thrasks thrask contents item contents (emails, documents, etc.) TaskMaster [Belotti et al., 2003]
11
11 BuzzTrack
12
12 BuzzTrack Mozilla Thunderbird extension to automatically group related emails into topics. Will be distributed through website: www.buzztrack.net Provides a view on the user’s inbox.
13
13 What’s a Topic? Topics are groups of emails that relate to the same idea, action, event, task, or question. Examples: A conversation about buying a digital camera. Referring a candidate for a job. All emails belonging to same newsgroup.
14
14 Clustering Process For every new incoming email: PreprocessingClustering Label generation Cluster store BuzzTrack View in Thunderbird
15
15 Preprocessing Tokenization (remove HTML tags, style sheets, punctuation, and numbers) Language detection Stemming For topic labelling: Identify Parts-of-speech Remember popular original word forms
16
16 Clustering Single-link clustering: Newly incoming emails are compared to every email in existing topics: Similarity value > threshold: assigned to topic Similarity value <= threshold: email starts new topic
17
17 Features - 1 How do we generate similarity values between emails? Via a linear combination of several similarity features. Examples: Text similarity (TFIDF Value, cosine similarity metric) People similarities (comparing sets of people in the From / To / Cc lines of email headers) Thread membership
18
18 Features - 2 Other features for deriving similarities: Subject similarity Sender domain overlaps Sender rank and percentage % of email from sender that is answered Time passed since last email in topic People and reference count for email Known people and reference % Cluster size Has attachment
19
19 Decision Score Similarities are combined into a decision score for each email / cluster pair through a linear combination of feature values: dec i,j = w a *sim a (mi,Cj) + w b *sim b (mi,Cj) + … We tested two sets of weights w x, both trained on a development set of emails: Empirical Linear SVM
20
20 Evaluation How do we evaluate clustering quality? Topic Detection and Tracking competitions by NIST. Aimed at clustering news articles. Corpus:
21
21 Clustering Tasks Clustering Task is split into subtasks: New Topic Detection (NTD): Given stream of emails, which ones start new topics? Topic Tracking (TT): Given a fixed topic, which newly incoming emails belong to it? DET Curves plot miss rate vs. false alarm rate for possible threshold for decision scores
22
22 Results NTD TDT New Topic Detection Task Miss: 3% False alarm: 30% better
23
23 Results TT TDT Topic Tracking Task Miss: 8% False alarm: 2% better
24
24 Comparison Comparable quality to TDT for news articles [NIST 2004] News has less metadata, email has worse text quality. Wide body of work exists on improving clustering performance on news, we haven’t tapped into that yet.
25
25 BuzzTrack View Mozilla Thunderbird plugin that provides useful view on inbox data “for free” Topics contain email from last 60 days We’re interested in current email only Reduces initial clustering time Each email is shown in one topic
26
26
27
27 Demo 1: BuzzTrack
28
28 BuzzTrack Panes Topic pane: Provides additional info Starred topics Email pane: Topics sorted by last incoming email
29
29 Future Work Distribute plugin to Thunderbird users Input on possible UI improvements Input on clustering quality Different clustering styles People-based Thread-based We hope BuzzTrack will be valuable tool for real-world users
30
30 Questions? Contact: Gabor Cselle, mail@gaborcselle.com Website: www.buzztrack.net
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.