Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010 (Tacitly) Collaborative Question Answering Utilizing Web Trails 5/22/10 1 LREC QA workshop
Collaboration Working together Efficiency, sharing vs. groupthink Tacit collaboration Professional analysts → COLLANE system Information sharing Why and when Collaborative filtering Sharing insight and experience 5/22/10 LREC QA workshop 2
Outline Introduction Collaborative Knowledge Layer Web Trails Exploratory Episodes Experiments Data Collection Results Collaborative Sharing Conclusions Future Research 5/22/10 3 LREC QA workshop
Sharing on the Internet? Internet users leave behind trails of their work What they asked What links they tried What worked and what didn’t Capture this Exploratory Knowledge Utilize this Knowledge for subsequent users Tacitly enables Collaborative Question Answering Improved Efficiency and Accuracy 5/22/10 4 LREC QA workshop
Collaborative Knowledge Layer Captures exploration paths (Web Trails) Supplies meaning to the underlying data May clarify/alter originally intended meaning Hypothesis: CKL may be utilized to Improve interactive QA Support tacit collaboration Current Experiments Capturing web exploration trails Computing degree of trail overlap 5/22/10 LREC QA workshop 5
Collaborative Space 5/22/10 LREC QA workshop 6
Web Trails 5/22/10 LREC QA workshop 7 Consists of individual exploratory moves Entering a search query Typing text into an input box Responses from the browser Offers accepted or ignored Files saved Items viewed Links clicked through, etc. Returns to the search box Contain optimal paths leading to specific outcomes
Exploratory Episodes 5/22/10 LREC QA workshop 8 Discovered overlapping subsequences of web trails Common portions of exploratory web trails from multiple network users May begin with a single user web trail Shared with new users who appear to be pursuing a compatible task
9 A BQ D T G K M F E C A-B-Q-D-G Exploratory Episode helps new user from M to G 5/22/10 LREC QA workshop
Experiment 5/22/10 LREC QA workshop 10 Evaluate degree of web trail overlap 11 Research Problems Defined Generated 100 short queries for each research problem description Used Google to retrieve the top 500 results from each query ~500MB per topic Filtered for duplicates, commercial, offensive topics, etc. 2GB Corpus of web-mined text
Experiment setup 5/22/10 LREC QA workshop Analysts per Research Topic 2.5 hours per topic Utilized two fully functional QA Systems HITQA – Analytical QA system developed under the AQUAINT program at SUNY Albany COLLANE – Collaborative extension of the HITIQA system developed under the CASE program at SUNY Albany Analyst’s Objective Find sufficient information for a 3-page report for the assigned topic
Example topic: artificial reefs 5/22/10 LREC QA workshop 12 Many countries are creating artificial reefs near their shores to foster sea life. In Florida a reef made of old tires caused a serious environmental problem. Please write a report on artificial reefs and their effects. Give some reasons as to why artificial reefs are created. Identify those built in the United States and around the world. Describe the types of artificial reefs created, the materials used and the sizes of the structures. Identify the types of man-made reefs that have been successful (success defined as an increase in sea life without negative environmental consequences). Identify those types that have been disasters. Explain the impact an artificial reef has on the environment and ecology. Discuss the EPA’s (Environmental Protection Agency) policy on artificial reefs. Include in your report any additional related information about this topic.
What is COLLANE? 5/22/10 LREC QA workshop 13 An Analytic Tool Exploits the strength of collaborative work Collaborative environment –Analysts work in teams –Synchronously and asynchronously –Information sharing on as-needed basis
Collaborating via COLLANE A team of users work on a task Each user has own working space A Combined Answer Space is created Made out of individual contributions Users interact with the system Via question answering and visual interfaces The system observes and facilitates Shares relevant information found by others tacit collaboration Users interact with each other Exchange tips and data items via a chat facility open collaboration 5/22/10 LREC QA workshop 14
COLLANE/HITIQA user interface 5/22/10 LREC QA workshop 15
Key Tracked Events 5/22/10 LREC QA workshop 16 Questions Asked Data Items Copied Data Items Ignored Systems offers accepted/rejected Displaying Text Words searched in user interface All dialogue between user and system Bringing up full document source Passages viewed Time spent
Experimental Results 5/22/10 LREC QA workshop 17 Aligned Episodes on common data items Only considered user copy as indicator Used document level overlap Ignored potential content overlap between different documents Lower bound on Episode overlap
5/22/10 LREC QA workshop 18
A-G & E-H 60-75% Overlap Artificial Reefs Example 5/22/10 19 LREC QA workshop
Experimental Results 5/22/10 LREC QA workshop Exploratory Episodes EE grouped by the degree of overlap 60% or higher → may be shared? OR 40% or lower → divergent? Find an overlap threshold Maximize information sharing Minimize rejection
Some topics appear more suitable for information sharing and tacit collaboration 5/22/10 LREC QA workshop 21
At 50% episode overlap threshold more than half of all episodes are candidates for sharing 5/22/10 22 LREC QA workshop
Collaborative Sharing Objective 5/22/10 LREC QA workshop 23 Leverage Exploratory Knowledge Use experience and judgment of users who faced the same or similar problem Provide superior accuracy and responsiveness to subsequent users Similar to Relevance Feedback in IR Community based rather than single user judgment
Utilize User B trail Offer D 4 -D 7 to User D After D 3 copy Avoids 2 fruitless questions Q 2 & Q 4 Finds extra potential relevant data point D 7 5/22/10 24 LREC QA workshop
Conclusion 5/22/10 LREC QA workshop 25 Users searching for information in a networked environment leave behind exploratory trails that can be captured Exploratory Episodes can be compared for overlap by data items copied Many users searching for same or highly related information are likely to follow similar routes through the data When a user overlaps an EE above a threshold they may benefit from tacit information sharing
Future Research 5/22/10 LREC QA workshop 26 Evaluate overlap utilizing semantic equivalence of data items copied Distill Exploratory Episodes into shareable knowledge elements Expand overlap metrics Question similarity Items Ignored, etc. Evaluate frequency of acceptance of offered material Varying thresholds