Web Intelligence and Intelligent Agent Technology 2008.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Group 3 Chad Mills Esad Suskic Wee Teck Tan. Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
What makes an image memorable?
1 Asking What No One Has Asked Before : Using Phrase Similarities To Generate Synthetic Web Search Queries CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG.
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Information Retrieval in Practice
Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.
The College of Saint Rose CIS 460 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st.
1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Introduction to Machine Learning Approach Lecture 5.
Overview of Search Engines
Mining and Summarizing Customer Reviews
Search Query Log Analysis Kristina Lerman. What can we learn from web search queries? Characteristics – Length has steadily grown over the years 1990’s:
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Short Text Understanding Through Lexical-Semantic Analysis
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Fine-Grained Location Extraction from Tweets with Temporal Awareness Date:2015/03/19 Author:Chenliang Li, Aixin Sun Source:SIGIR '14 Advisor:Jia-ling Koh.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Improving Classification Accuracy Using Automatically Extracted Training Data Ariel Fuxman A. Kannan, A. Goldberg, R. Agrawal, P. Tsaparas, J. Shafer Search.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
Supertagging CMSC Natural Language Processing January 31, 2006.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.
Post-Ranking query suggestion by diversifying search Chao Wang.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Language Identification and Part-of-Speech Tagging
Erasmus University Rotterdam
Word AdHoc Network: Using Google Core Distance to extract the most relevant information Presenter : Wei-Hao Huang   Authors : Ping-I Chen, Shi-Jen.
Detecting Online Commercial Intention (OCI)
Modern Information Retrieval
Classification of Tests Chapter # 2
Web Mining Research: A Survey
Information Retrieval and Web Design
WSExpress: A QoS-Aware Search Engine for Web Services
Presentation transcript:

Web Intelligence and Intelligent Agent Technology 2008

Intention  An agent aiming to “plan a trip to Vienna” would need to have some means to understand that “plan a trip” is likely to involve a set of other goals or services. “contact a travel agency” “book a hotel”.

Problem  1) the goal acquisition problem which refers to the costs associated with knowledge acquisition.  2) the goal coverage problem which refers to the difficulty of capturing the tremendous variety and range in the set of human goals.

Human Subject Study  we first conducted a human subject study aiming to 1) define the notion of explicit user goals more rigorously and 2) to learn about its principal agreeability.

Explicit user goal  1) contains at least one verb  2) describes a plausible state of affairs that the user may want to achieve or avoid in  3) a recognizable way

Questionnaire Design  To explore the utility of this definition, we have conducted a questionnaire in which four human subjects.  Manually label 3000 queries randomly obtained from the AOL search query log.

Question Schema  Given a query X  Do you think that Y (with Y being the first verb in X, plus the remainder of X) is a plausible goal of a searcher who is performing the query X?

Question Schema

 “Yes” resulted in classifying the query as a “query containing an explicit goal”, each answer  “No” resulted in classifying the query as a “query not containing an explicit goal”.

Agreeability of Constructs

 243 queries out of 3000 have been labeled as containing an explicit goal by all 4 subjects.  134 queries as containing an explicit goal by 3 out of 4 subjects.  The majority of queries has been labeled as not containing an explicit goal unanimously by all subjects.

Agreeability of Constructs The κ values in our human subject study range from 0.65 to 0.76

Training Set  The negative examples were defined by the two bars on the left hand side of Figure 1 (2525 total).  Positive examples were defined by the two bars on the right hand side (377).

POS Tagging  We used the Penn Treebank tag set containing 36 word classes which provides a simple yet adequately rich set of tag classes for our purpose.

Part-of-Speech Trigrams  Each query can be translated from a sequence of tokens into a sequence of POS tags.  The sequence boundaries were expanded by introducing a single marker ($) at the beginning and at the end allowing for length two POS features.  The query “buying/VBG a/DT car/NN” would yield the following trigrams:  $ VBG DT; VBG DT NN; DT NN $

Stemmed unigrams  Queries can be represented as binary word vectors or ‘Set of Words’ (SoW).  The Porter stemming algorithm was used for word conflation and stopwords were removed.

Confusion matrix Our approach achieves a precision of 0.77, a recall of 0.63 and an F1 score of 0.69.

20 most frequent goals

Rank-frequency plot of goals Goals are plotted according to their rank and the set of different users who share them.

10 most frequent goals containing

Conclusions  (a) We present an automatic method for the acquisition of user goals from search query logs with useful precision/recall scores.  (b) We provide insights into the nature and some characteristics of these goals.  (c) We show that the goals acquired from query logs exhibit traits of a long tail distribution.