Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010 (Tacitly) Collaborative Question Answering Utilizing Web Trails 5/22/10.

Slides:



Advertisements
Similar presentations
Introducing MyWave Portal. MyWave Portal Access time-saving tools and resources Build convenience into managing your everyday work tasks Collaborate with.
Advertisements

Critical Reading Strategies: Overview of Research Process
Academic Quality How do you measure up? Rubrics. Levels Basic Effective Exemplary.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Pasewark & Pasewark Microsoft Office XP: Introductory Course 1 INTRODUCTION Lesson 1 – Microsoft Office XP Basics and the Internet.
The Experience Factory May 2004 Leonardo Vaccaro.
Introduction to the Child & Adolescent Needs and Strengths Assessment (CANS) Our Community. Our Kids. Dr. Gary Buff, Ed.D. President and COO.
Rutgers Components Phase 2 Principal investigators –Paul Kantor, PI; Design, modelling and analysis –Kwong Bor Ng, Co-PI - Fusion; Experimental design.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
MCDST : Supporting Users and Troubleshooting a Microsoft Windows XP Operating System Chapter 5: User Environment and Multiple Languages.
Overview of Search Engines
Documentation and Help GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 13.
Introduction to WebCT Sheridan College Architectural Technology.
Lawson System Foundation 9.0
What is Business Analysis Planning & Monitoring?
ARCHIBUS Log On Instructions. Log Into ARCHIBUS Web Central Log In Screen 1.Open your Internet browser. 2.Enter the URL to view the ARCHIBUS Login Page.
3.02 The Information Superhighway
Chapter 11 Databases.
1.Knowledge management 2.Online analytical processing 3. 4.Supply chain management 5.Data mining Which of the following is not a major application.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Software All parts of the computer people can NOT touch, such as programs, files, documents and any other data.
Classroom User Training June 29, 2005 Presented by:
E-Learning Services. e-Learning is transforming the way we learn and teach e-Learning can be broadly defined as technology assisted learning. It is all.
1 Lesson 6 Exploring Microsoft Office 2007 Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Requirements Analysis
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Windows Internet Explorer 9 Chapter 1 Introduction to Internet Explorer.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
Karen Herter (HMG) Mike Langley (DGS) April 15, 2008 Portfolio Manager for California State Buildings Meeting the Requirements of Executive Order S
UNIT 14 1 Websites. Introduction 2 A website is a set of related webpages stored on a web server. Webmaster: is a person who sets up and maintains a.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
Module 5 A system where in its parts perform a unified job of receiving inputs, processes the information and transforms the information into a new kind.
CSM06 Information Retrieval Lecture 6: Visualising the Results Set Dr Andrew Salway
Introducing HingX now with Capacity Development Network.
Search Engine Architecture
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
August 2005 TMCOps TMC Operator Requirements and Position Descriptions Phase 2 Interactive Tool Project Presentation.
OCLC Online Computer Library Center 1 Social Media and Advocacy.
Welcome To MOODLE Getting Started. Introductions Christa McLaughlin – High School math teacher and high school lead teacher of technology Jason Grubbs.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
COLLABORATIVE SEARCH TECHNIQUES Submitted By: Shikha Singla MIT-872-2K11 M.Tech(2 nd Sem) Information Technology.
HITIQA: Scenario Based Question Answering Tomek Strzalkowski, et al The State University of New York at Albany Paul Kantor, et al Rutgers University Boris.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Social Networking for Scientists (Research Communities) Using Tagging and Shared Bookmarks: a Web 2.0 Application Marlon Pierce, Geoffrey Fox, Joshua Rosen,
Glossary of Terms Sessions - (old name: Visits) Users - (old name: Unique Visitors) Pageviews Pages/Session Avg. Session Duration Bounce Rate %New Sessions.
The Internet and World Wide Web Sullivan University Library.
Pasewark & Pasewark Microsoft Office 2003: Introductory 1 INTRODUCTION Lesson 1 – Microsoft Office 2003 Basics and the Internet.
An Introduction to NHS Evidence
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Introduction to HTML Simple facts yet crucial to beginning of study in fundamentals of web page design!
WS1-1 ADM730, Workshop 1, September 2005 Copyright  2005 MSC.Software Corporation WORKSHOP 1 INTRODUCTION Open Retracted - Bad Retracted - Good.
UK Interest & Input to the Factories of the Future Horizon 2020 Roadmap. © ActionPlant 2011.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Lesson 11 Exploring Microsoft Office 2010 Computer Literacy BASICS: A Comprehensive Guide to IC 3, 4 th Edition Morrison / Wells.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Human Computer Interaction Lecture 21 User Support
Designing Information Systems Notes
Ask Us for Collaborative Partners
Presentation transcript:

Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010 (Tacitly) Collaborative Question Answering Utilizing Web Trails 5/22/10 1 LREC QA workshop

Collaboration Working together Efficiency, sharing vs. groupthink Tacit collaboration Professional analysts → COLLANE system Information sharing Why and when Collaborative filtering Sharing insight and experience 5/22/10 LREC QA workshop 2

Outline Introduction Collaborative Knowledge Layer Web Trails Exploratory Episodes Experiments Data Collection Results Collaborative Sharing Conclusions Future Research 5/22/10 3 LREC QA workshop

Sharing on the Internet? Internet users leave behind trails of their work What they asked What links they tried What worked and what didn’t Capture this Exploratory Knowledge Utilize this Knowledge for subsequent users Tacitly enables Collaborative Question Answering Improved Efficiency and Accuracy 5/22/10 4 LREC QA workshop

Collaborative Knowledge Layer Captures exploration paths (Web Trails) Supplies meaning to the underlying data May clarify/alter originally intended meaning Hypothesis: CKL may be utilized to Improve interactive QA Support tacit collaboration Current Experiments Capturing web exploration trails Computing degree of trail overlap 5/22/10 LREC QA workshop 5

Collaborative Space 5/22/10 LREC QA workshop 6

Web Trails 5/22/10 LREC QA workshop 7 Consists of individual exploratory moves Entering a search query Typing text into an input box Responses from the browser Offers accepted or ignored Files saved Items viewed Links clicked through, etc. Returns to the search box Contain optimal paths leading to specific outcomes

Exploratory Episodes 5/22/10 LREC QA workshop 8 Discovered overlapping subsequences of web trails Common portions of exploratory web trails from multiple network users May begin with a single user web trail Shared with new users who appear to be pursuing a compatible task

9 A BQ D T G K M F E C A-B-Q-D-G Exploratory Episode helps new user from M to G 5/22/10 LREC QA workshop

Experiment 5/22/10 LREC QA workshop 10 Evaluate degree of web trail overlap 11 Research Problems Defined Generated 100 short queries for each research problem description Used Google to retrieve the top 500 results from each query ~500MB per topic Filtered for duplicates, commercial, offensive topics, etc. 2GB Corpus of web-mined text

Experiment setup 5/22/10 LREC QA workshop Analysts per Research Topic 2.5 hours per topic Utilized two fully functional QA Systems HITQA – Analytical QA system developed under the AQUAINT program at SUNY Albany COLLANE – Collaborative extension of the HITIQA system developed under the CASE program at SUNY Albany Analyst’s Objective Find sufficient information for a 3-page report for the assigned topic

Example topic: artificial reefs 5/22/10 LREC QA workshop 12 Many countries are creating artificial reefs near their shores to foster sea life. In Florida a reef made of old tires caused a serious environmental problem. Please write a report on artificial reefs and their effects. Give some reasons as to why artificial reefs are created. Identify those built in the United States and around the world. Describe the types of artificial reefs created, the materials used and the sizes of the structures. Identify the types of man-made reefs that have been successful (success defined as an increase in sea life without negative environmental consequences). Identify those types that have been disasters. Explain the impact an artificial reef has on the environment and ecology. Discuss the EPA’s (Environmental Protection Agency) policy on artificial reefs. Include in your report any additional related information about this topic.

What is COLLANE? 5/22/10 LREC QA workshop 13 An Analytic Tool Exploits the strength of collaborative work Collaborative environment –Analysts work in teams –Synchronously and asynchronously –Information sharing on as-needed basis

Collaborating via COLLANE A team of users work on a task Each user has own working space A Combined Answer Space is created Made out of individual contributions Users interact with the system Via question answering and visual interfaces The system observes and facilitates Shares relevant information found by others  tacit collaboration Users interact with each other Exchange tips and data items via a chat facility  open collaboration 5/22/10 LREC QA workshop 14

COLLANE/HITIQA user interface 5/22/10 LREC QA workshop 15

Key Tracked Events 5/22/10 LREC QA workshop 16 Questions Asked Data Items Copied Data Items Ignored Systems offers accepted/rejected Displaying Text Words searched in user interface All dialogue between user and system Bringing up full document source Passages viewed Time spent

Experimental Results 5/22/10 LREC QA workshop 17 Aligned Episodes on common data items Only considered user copy as indicator Used document level overlap Ignored potential content overlap between different documents Lower bound on Episode overlap

5/22/10 LREC QA workshop 18

A-G & E-H 60-75% Overlap Artificial Reefs Example 5/22/10 19 LREC QA workshop

Experimental Results 5/22/10 LREC QA workshop Exploratory Episodes EE grouped by the degree of overlap 60% or higher → may be shared? OR 40% or lower → divergent? Find an overlap threshold Maximize information sharing Minimize rejection

Some topics appear more suitable for information sharing and tacit collaboration 5/22/10 LREC QA workshop 21

At 50% episode overlap threshold more than half of all episodes are candidates for sharing 5/22/10 22 LREC QA workshop

Collaborative Sharing Objective 5/22/10 LREC QA workshop 23 Leverage Exploratory Knowledge Use experience and judgment of users who faced the same or similar problem Provide superior accuracy and responsiveness to subsequent users Similar to Relevance Feedback in IR Community based rather than single user judgment

Utilize User B trail Offer D 4 -D 7 to User D After D 3 copy Avoids 2 fruitless questions Q 2 & Q 4 Finds extra potential relevant data point D 7 5/22/10 24 LREC QA workshop

Conclusion 5/22/10 LREC QA workshop 25 Users searching for information in a networked environment leave behind exploratory trails that can be captured Exploratory Episodes can be compared for overlap by data items copied Many users searching for same or highly related information are likely to follow similar routes through the data When a user overlaps an EE above a threshold they may benefit from tacit information sharing

Future Research 5/22/10 LREC QA workshop 26 Evaluate overlap utilizing semantic equivalence of data items copied Distill Exploratory Episodes into shareable knowledge elements Expand overlap metrics Question similarity Items Ignored, etc. Evaluate frequency of acceptance of offered material Varying thresholds