Motivation Conclusion Effective Access Over Public Email Conversations William Lee, Hui Fang and Yifan Li University of Illinois at Urbana-Champaign Clustering.

Slides:



Advertisements
Similar presentations
Incremental Clustering for Trajectories
Advertisements

Indexing DNA Sequences Using q-Grams
Searching for Data Relationship between searching and sorting Simple linear searching Linear searching of sorted data Searching for string or numeric data.
Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
Introduction to Information Retrieval
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
More Graph Algorithms Minimum Spanning Trees, Shortest Path Algorithms.
Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Data Structures Data Structures Topic #13. Today’s Agenda Sorting Algorithms: Recursive –mergesort –quicksort As we learn about each sorting algorithm,
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Rooks, Parts of the paragraph Objective: Enable students to write a complete outline of paragraph and a complete paragraph with the correct grammar.
BuzzTrack Topic Detection and Tracking in IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich Roger Wattenhofer.
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
1 Exploratory Tools for Follow-up Studies to Microarray Experiments Kaushik Sinha Ruoming Jin Gagan Agrawal Helen Piontkivska Ohio State and Kent State.
Course Introduction CS 1037 Fundamentals of Computer Science II.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Flash talk by: Aditi Garg, Xiaoran Wang Authors: Sarah Rastkar, Gail C. Murphy and Gabriel Murray.
Pixel Visualization of keyword search results in large databases. Jay Koven Fall 2013.
Souham Alkhazaal Web-based Collaborative writing project (ICA)
CS 61B Data Structures and Programming Methodology Aug 11, 2008 David Sun.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
India Research Lab Auto-grouping s for Faster eDiscovery Sachindra Joshi, Danish Contractor, Kenney Ng*, Prasad M Deshpande, and Thomas Hampp* IBM.
William H. Bowers – The Social Life of Information Chapter 2 – Agents and Angels.
Public Conversations Architecture Clustering Results Conversation Map Conclusion CEES: Intelligent Access to Public Conversations William Lee,
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
ISE420 Algorithmic Operations Research Asst.Prof.Dr. Arslan M. Örnek Industrial Systems Engineering.
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Web Content Development Dr. Komlodi Class 22: Wirerfames.
Searching and Sorting Topics Sequential Search on an Unordered File
04/30/13 Last class: summary, goggles, ices Discrete Structures (CS 173) Derek Hoiem, University of Illinois 1 Image: wordpress.com/2011/11/22/lig.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Bug Localization with Machine Learning Techniques Wujie Zheng
Document retrieval Similarity –Vector space model –Multi dimension Search –Range query –KNN query Query processing example.
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
Scalable Statistical Bug Isolation Authors: B. Liblit, M. Naik, A.X. Zheng, A. Aiken, M. I. Jordan Presented by S. Li.
10/20/20151 CS 3343: Analysis of Algorithms Review for final.
Feature Detection in Ajax-enabled Web Applications Natalia Negara Nikolaos Tsantalis Eleni Stroulia 1 17th European Conference on Software Maintenance.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Effective Information Access Over Public Archives Progress Report William Lee, Hui Fang, Yifan Li For CS598CXZ Spring 2005.
© 2013 The McGraw-Hill Companies, Inc. All rights reserved. Chapter 10 Productivity Center and Utilities.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
 Left Side  Mail/Contacts/Tasks  Labeled Folders  Contacts – “IM” Feature  Right Side  s.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai Microsoft Research Asia, Beijing SIGIR
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 1 Research: An Overview.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
COSC 6340 Projects & Homeworks Spring 2002
TED Talks – A Predictive Analysis Using Classification Algorithms
CSE 635 Multimedia Information Retrieval
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.
Presentation transcript:

Motivation Conclusion Effective Access Over Public Conversations William Lee, Hui Fang and Yifan Li University of Illinois at Urbana-Champaign Clustering Information within newsgroups or mailing lists has largely been underutilized. For now, access to those data is restricted to traditional searching and browsing. Goal: –Find commonly-discussed topics from a set of conversations (threads) Method: –Use agglomerative clustering with complete link –Learn similarity functions from different “perspectives” of threads: authors, date, subject, contents, contents without quote, first message, reply, reply without quote. Use Linear and Logistic Regression to learn the combined similarity function Propose two ways to access to the public archive –Clustering –Summarization Use conversation map to visualize the clustering result Future Work –Derive better algorithms to learn the similarity function –Faster clustering algorithms that work on mining patterns in conversations –LM approach to summarization Experiment Design –Data: 3 CS class newsgroups from UIUC –Judgement file: manually created by three different human taggers –Methodology: 3-way cross validation, using one group’s judgment file as training set and test on the other two. –Evaluation: Use overall entropy as comparison metric Experiment Result Search Browse Existing technologies First Message Thread 1: Subject, Authors, Date First Reply Second Reply Third Reply First Message Thread 2: Subject, Authors, Date First Reply Second Reply Third Reply Visualization---Conversation Map –Derived from Treemap –Clusters sorted by two time dimensions –Allows user to adjust the similarity threshold -- “zoom” to the more similar threads How to access those information more effectively? –Clustering –Summarization Summarization Goal: –Find the gist from a conversation (i.e. a thread) Observation: Different types of conversation need different types of summarization method –Question-driven conversation –Announcement –driven conversation Solution to Question-Driven summarization –Key observation Question plays an important role –Method Identify the question Detect the topic shifting during the conversation Divide the conversation into segments based on topic shifting Store all the segments containing the question Remove segments with the redundant question Return the remaining segments Solution to Announcement-Driven summarization –Key observation Subject plays an important role Threads with similar subjects may have common pivot words in their summaries –Method Training stage –Clustering the threads by similarities of the subjects –Find frequent words (pivot words) in the summaries –Extend the subjects by combining the corresponding pivot words Testing stage –Given a thread, find the similar subject w.r.t. the current one –Find the pivot words associated with the subject to extend the current one –Select similar sentences w.r.t. the extended subject as the summary Experiment Design –Data: 3 CS class newsgroups from UIUC –Judgement file: manually created by two different human taggers –Evaluation: Precision and Recall or user study Examples of Experiment Results Subject:MP7: Viewing Array in debugger From:Scott Stephens on Sat, 04 Dec :49: I've been debugging things using DDD with dbx, but I'm running into a weird problem. My PatriciaTree class is basically a wrapper around a root pointer, so I observe that pointer, and dereference it to give me my first PatriciaNode, and then look at all that's inside that, and one of those things is a pointer to "data" within the Array object that's embedded in my PatriciaNode. An address shows up fine, but I'd like to dereference it to take a look at the actual array, so I can continue to examine the structure of my tree. But when I dereference it in DDD, i just shows up as "(nil)". I know for a fact that there's valid data in that array somewhere, because I can access it in my program, I just can't look at it in the debugger. Anybody have any ideas? It'd be really nice to look at that info for debugging purposes. -Scott Question-Driven Announcement-Driven SUBJECT: Final Exam - PLEASE READ TIME: 1:30-4:30pm PLACE: Regular classroom (SC 1404) TOPICS: The exam is cummulative but with emphasis on the material after the midterm. Most important from earlier material are general techniques: divide-and-conquer, greedy, dynamic programming, randomization. Also topics like MST and shortest paths that have reappeared after the midterm. Student having more than two consecutive examinations: No student should be required to take more than two consecutive final examinations. N In a semester, this means that a student taking a final examination at 8:00 a.m. and another at 1:30 p.m. on the same day cannot be required to take an examination that same evening. N However, the student could be required to take an examination beginning at 8:00 a.m. the next day...