Presentation is loading. Please wait.

Presentation is loading. Please wait.

BuzzTrack Topic Detection and Tracking in IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich Roger Wattenhofer.

Similar presentations


Presentation on theme: "BuzzTrack Topic Detection and Tracking in IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich Roger Wattenhofer."— Presentation transcript:

1 BuzzTrack Topic Detection and Tracking in Email IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich kenoa@tik.ee.ethz.ch Roger Wattenhofer ETH Zurich wattenhofer@tik.ee.ethz.ch Gabor Cselle Google gabor@google.com

2 2 Email Overload Email clients were not designed to handle volume and variety of messages users are dealing with today: Large volumes of email Task Management Personal Archiving or Filing Keeping Context [Whittaker and Sidner, 1996]

3 3 Search vs. Inbox Browsing Fast full-text search is today's solution to finding past emails. But the flat inbox view of newly incoming emails hasn’t changed. In our work, we focus on the problem of sensibly structuring emails in the inbox.

4 4 Today's Email Clients: The Three-Pane View No sense of context: unrelated messages are shown together Important emails may drop off the “first screen” “Thread-based” tree views are unsophisticated, may not pull in all relevant messages.

5 5 BuzzTrack Email client extension for Mozilla Thunderbird for displaying email grouped by topic.

6 6 Related Work

7 7 Visualizations: Conversations Gmail (Google) common conversation title one entry per email, folds out on click

8 8 Automatic Foldering Using machine learning techniques to automatically move emails into folders upon arrival Low accuracy rates [Bekkerman et al, 2005], conceptual problems: Users need to manually create folders and seed them with data.

9 9 People-Centered Email Clients Bifrost ContactMap [Bälter and Sidner, 2002] [Whittaker et al., 2004]

10 10 Task-based Email Example: TaskMaster thrasks thrask contents item contents (emails, documents, etc.) TaskMaster [Belotti et al., 2003]

11 11 BuzzTrack

12 12 BuzzTrack Mozilla Thunderbird extension to automatically group related emails into topics. Will be distributed through website: www.buzztrack.net Provides a view on the user’s inbox.

13 13 What’s a Topic? Topics are groups of emails that relate to the same idea, action, event, task, or question. Examples: A conversation about buying a digital camera. Referring a candidate for a job. All emails belonging to same newsgroup.

14 14 Clustering Process For every new incoming email: PreprocessingClustering Label generation Cluster store BuzzTrack View in Thunderbird

15 15 Preprocessing Tokenization (remove HTML tags, style sheets, punctuation, and numbers) Language detection Stemming For topic labelling: Identify Parts-of-speech Remember popular original word forms

16 16 Clustering Single-link clustering: Newly incoming emails are compared to every email in existing topics: Similarity value > threshold: assigned to topic Similarity value <= threshold: email starts new topic

17 17 Features - 1 How do we generate similarity values between emails? Via a linear combination of several similarity features. Examples: Text similarity (TFIDF Value, cosine similarity metric) People similarities (comparing sets of people in the From / To / Cc lines of email headers) Thread membership

18 18 Features - 2 Other features for deriving similarities: Subject similarity Sender domain overlaps Sender rank and percentage % of email from sender that is answered Time passed since last email in topic People and reference count for email Known people and reference % Cluster size Has attachment

19 19 Decision Score Similarities are combined into a decision score for each email / cluster pair through a linear combination of feature values: dec i,j = w a *sim a (mi,Cj) + w b *sim b (mi,Cj) + … We tested two sets of weights w x, both trained on a development set of emails: Empirical Linear SVM

20 20 Evaluation How do we evaluate clustering quality? Topic Detection and Tracking competitions by NIST. Aimed at clustering news articles. Corpus:

21 21 Clustering Tasks Clustering Task is split into subtasks: New Topic Detection (NTD): Given stream of emails, which ones start new topics? Topic Tracking (TT): Given a fixed topic, which newly incoming emails belong to it? DET Curves plot miss rate vs. false alarm rate for possible threshold for decision scores

22 22 Results NTD TDT New Topic Detection Task Miss: 3% False alarm: 30% better

23 23 Results TT TDT Topic Tracking Task Miss: 8% False alarm: 2% better

24 24 Comparison Comparable quality to TDT for news articles [NIST 2004] News has less metadata, email has worse text quality. Wide body of work exists on improving clustering performance on news, we haven’t tapped into that yet.

25 25 BuzzTrack View Mozilla Thunderbird plugin that provides useful view on inbox data “for free” Topics contain email from last 60 days We’re interested in current email only Reduces initial clustering time Each email is shown in one topic

26 26

27 27 Demo 1: BuzzTrack

28 28 BuzzTrack Panes Topic pane: Provides additional info Starred topics Email pane: Topics sorted by last incoming email

29 29 Future Work Distribute plugin to Thunderbird users Input on possible UI improvements Input on clustering quality Different clustering styles People-based Thread-based We hope BuzzTrack will be valuable tool for real-world users

30 30 Questions? Contact: Gabor Cselle, mail@gaborcselle.com Website: www.buzztrack.net


Download ppt "BuzzTrack Topic Detection and Tracking in IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich Roger Wattenhofer."

Similar presentations


Ads by Google