Meeting Presentation sept.12 Things to do since last meeting: (1) find out the number of drug name in FDA website (done, the number is 6244 which is OK.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 7: Scoring and results assembly.
Advertisements

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
 Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Business Studies Business Studies is the study of exactly how businesses work, analysing activities such as marketing, production, HRM, finance and management.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
Which of the two appears simple to you? 1 2.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
1 Query Operations Relevance Feedback & Query Expansion.
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
1 John Mitchell; James McDonagh; Neetika Nath Rob Lowe; Richard Marchese Robinson.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
ASSISTED BROWSING THROUGH SEMISTRUCTURED DATA PROBLEM The development of the RDF standard highlights the fact that a great deal of useful information is.
1 Computing Relevance, Similarity: The Vector Space Model.
Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Summary and Review. Course Objectives The main objectives of the course are to –introduce different concepts in operating system theory and implementation;
Thesis Proposal: Prediction of popular social annotations Abon.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Relevance Feedback Hongning Wang
User Modeling and Recommender Systems: recommendation algorithms
ENHANCING CLUSTER LABELING USING WIKIPEDIA David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab SIGIR’09.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Computer Science: A Structured Programming Approach Using C1 Objectives ❏ To introduce the basic concepts of linked lists ❏ To introduce the basic concepts.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction Anna Shtok and Oren Kurland and David Carmel SIGIR 2010 Hao-Chin.
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Recommendation in Scholarly Big Data
Reading Notes Wang Ning Lab of Database and Information Systems
Information Retrieval and Web Search
SIS: A system for Personal Information Retrieval and Re-Use
Multimedia Information Retrieval
Information Retrieval
Structure and Content Scoring for XML
Learning Literature Search Models from Citation Behavior
Educational Consultants - Assisting Students Who Want to Study Overseas.
Structure and Content Scoring for XML
Navigation-Aided Retrieval
Junghoo “John” Cho UCLA
Information Retrieval and Web Design
VECTOR SPACE MODEL Its Applications and implementations
Presentation transcript:

Meeting Presentation sept.12 Things to do since last meeting: (1) find out the number of drug name in FDA website (done, the number is 6244 which is OK for us to do search crawl on twitters). (2) Read papers to find out new ideas about the query cost estimate. **Predicting query performance **what makes a query difficult, by David Camel **learning to estimate query difficulty, sigir2005 best paper. **Publications of Junghoo "John" Cho

Paper Review Predicting query performance This a great paper since it introduced a new concept named clarity score which can measure the similarity between query model and collection model. It helps us to view query difficulty from a new perspective: the weakness of query terms' ability to distinguish documents may lead query difficulty. what makes a query difficult, by David Camel This is a good development of the previous paper. It expands the concept of clarity score to a higher level concept of “distance model”. Distance does not only apply to query & collection, but also apply to query & relevant documents, relevant documents & collection, etc. What is more, the paper adopt more reasonable function: Jensen-Shannon divergence (JSD).

Paper Review learning to estimate query difficulty The paper offers a new view that sub-query coverage may also affect query difficulty a lot. To support such view, the authors provide two complex machine learning method: histogram and modified decision tree. The result shows that difficult query is likely to be dominated by a single sub-query.

Some Ideas A straight forward idea from David's paper is that we can do query deletion to maximum the distance between query and collection. The idea is not hard to implement. But I am wondering how much improvement we can get through this way.

Some Ideas An advanced idea is to connect it with retrieval cost. As we see, the traditional cost for retrieval is as following: n*(complexity of function*DF(i)) Thus computing cost is easy to be precomputed. It is also interesting to consider deleting low IDF and low clarity terms. It will greatly reduce the computing cost while decrease or even increase the retrieval performance.

Some Ideas It is also interesting to discuss term proximity and query expansion here. In my opinion, term proximity and external query term expansion may help to improve query clarity. The cost of term proximity is about additional: n*(n-1)/2*(DF1+DF2+averageTF1*averageTF2*comDoc) The cost of external query term expansion is about additional: n*(complexity of function*DF(i))+k*averageDoclength+N*(complexity of function*DF(i)) where n is the number of query terms, k is the number of top documents for expansion and N is number of terms expansed. It will be interesting to discuss how many clarity could term proximity and external query term expansion can add.