PAIRS Forming a ranked list using mined, pairwise comparisons Reed A. Coke, David C. Anastasiu, Byron J. Gao.

Slides:



Advertisements
Similar presentations
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Advertisements

Product Review Summarization Ly Duy Khang. Outline 1.Motivation 2.Problem statement 3.Related works 4.Baseline 5.Discussion.
Modern Information Retrieval Chapter 1: Introduction
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Multimedia Answer Generation for Community Question Answering.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr.
Product Review Summarization from a Deeper Perspective Duy Khang Ly, Kazunari Sugiyama, Ziheng Lin, Min-Yen Kan National University of Singapore.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Evaluating Search Engine
Modern Information Retrieval Chapter 1: Introduction
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
Chapter 5: Information Retrieval and Web Search
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
In Situ Evaluation of Entity Ranking and Opinion Summarization using Kavita Ganesan & ChengXiang Zhai University of Urbana Champaign
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
Siemens Big Data Analysis GROUP 3: MARIO MASSAD, MATTHEW TOSCHI, TYLER TRUONG.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Mining and Summarizing Customer Reviews
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Evaluation David Kauchak cs458 Fall 2012 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from:
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Which of the two appears simple to you? 1 2.
Recap Preprocessing to form the term vocabulary Documents Tokenization token and term Normalization Case-folding Lemmatization Stemming Thesauri Stop words.
Bug Localization with Machine Learning Techniques Wujie Zheng
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
MODEL ADAPTATION FOR PERSONALIZED OPINION ANALYSIS MOHAMMAD AL BONI KEIRA ZHOU.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Data Mining: Text Mining
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Multilingual Search Shibamouli Lahiri
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Linguistic Graph Similarity for News Sentence Searching
Tools for Natural Language Processing Applications
Web News Sentence Searching Using Linguistic Graph Similarity
University of Computer Studies, Mandalay
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Social Knowledge Mining
Applying Key Phrase Extraction to aid Invalidity Search
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
CS246: Information Retrieval
Information Retrieval
Presentation transcript:

PAIRS Forming a ranked list using mined, pairwise comparisons Reed A. Coke, David C. Anastasiu, Byron J. Gao

PAIRS Pairwise Automatic Inferential Ranking System dmlab.cs.txstate.edu/pairs

The Problem Given a list of items, as well as an optional attribute, how best to generate a ranked list in an online system What is the fastest way to get an accurate result? What is the most accurate way to get a fast result?

Previous approaches NLP techniques are likely the best Can be very costly time-wise – Especially with nonstandard grammar of internet PAIRS is an attempt at finding a balance between speed and accuracy

Overall Architecture 1. Query Parsing (fast) 2. Comparison Location (slow) 3. Comparison Evaluation (fast) 4. Ranking (fast)

Query Parsing Separates list into pairs: – i.e. (A, B, C)->(A,B), (A,C), (B,C) Leads to rapid explosion of searches Each pair then is expanded into 4 queries – i.e. “A vs. B”, “A, B”, etc. Finally, each query is sent alternatingly through Yahoo and Google, thanks to AbstractSearch2

Comparison Location Text is retrieved from each unique URL in the search results. The text is then sent to a Java program which tags the part of speech of each word. Line by line, the program determines whether or not the sentence is comparative. Experimental results for Comparison Location – PAIRS keyword list: 50% recall, 80% precision – Ganapathibotla & Liu list: 97.7% recall, 32% precision

Location (continued) A comparative sentence is one that meets the following criteria: – Contains a comparative word – Contains both nouns (stemmed) in the pair Special cases: – Pronouns and ellipsis, keep track of “relevancy” of past nouns – Phrases Any comparison is then evaluated immediately

Special Case: Pronouns – Jaguars are big. They are bigger than wolves. – John loves computers. In fact, he loves them more than Sally. – John loves computers. Sally does too. However, he loves them more. – John likes Michael Jordan. He is a much more loyal fan than Sally. – John likes Michael Jordan. He dunks more impressively than Sally. – (on a discussion board) I respectfully disagree with you.

Special Case: Ellipsis – Wolves are big. However, jaguars are bigger. – Wolves are big. Jaguars are bigger. – Wolves are annoying, but don't get me started on coyotes. – Wolves are annoying, but turtles aren’t. – Wolves are annoying and turtles aren’t.

Relevance Dictionary Keep track of all nouns Score is affected by recency and frequency

Comparison Evaluation 86% of time, people mention the noun that they prefer first. – i.e. n1 is better than n2, not n2 is worse than n1 – Better methods have been found, but not quicker ones Ultimately, will need a list of + and – comparisons – This will have to be done by domain: Rocky has fought more than Drago. (+) My son has fought more than your son. (-)

Creating the Ranking Create a graph with weight edges Brute force the score of the path from each node to every other node within the connected component This results in a ranked list for each component

Problems Still slow Query parsing needs experiments to determine just how many queries are needed per pair System is untested as a whole. Must be tested on a closed set of docs to determined total precision/recall Comparison evaluation could be more graceful Graph traversal algorithm could be better

Applications PAIRS has several interesting applications – College decisions – Product comparison – Any sort of popularity contest – Taking a majority vote

Future Research Polishing each component of PAIRS Testing PAIRS on a closed system BridgeFinder

Conclusion PAIRS was built from the ground up. The only pre-programmed component of PAIRS was the Stanford POS tagger. Things I learned about research – How to formulate a research topic – How to research previous work in a topic – Experimentation – How to write a technical report – How to give a presentation

References [1] X. Ding and B. Liu. The utility of linguistic rules in opinion mining. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '07, pages 811{812, New York, NY, USA, ACM. [2] X. Ding, B. Liu, and P. S. Yu. A holistic lexicon-based approach to opinion mining. In Proceedings of the international conference on Web search and web data mining, WSDM '08, pages 231{240, New York, NY, USA, ACM. [3] M. Ganapathibhotla and B. Liu. Mining opinions in comparative sentences. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, COLING '08, pages 241{248, Stroudsburg, PA, USA, Association for Computational Linguistics. [4] A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. Technical report, Stanford University, [5] M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '04, pages 168{177, New York, NY, USA, ACM. [6] N. Jindal and B. Liu. Identifying comparative sentences in text documents. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '06, pages 244{251, New York, NY, USA, ACM. [7] N. Jindal and B. Liu. Mining comparative sentences and relations. In AAAI'06, pages {1{1, [8] B. Liu. Web Data Mining. Springer, [9] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL '03, pages 173{180, Stroudsburg, PA, USA, Association for Computational Linguistics. 10 [10] K. Toutanova and C. D. Manning. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics Volume 13, EMNLP '00, pages 63{70, Stroudsburg, PA, USA, Association for Computational Linguistics.

Questions?