Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.

Slides:



Advertisements
Similar presentations
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Advertisements

Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Evaluating Search Engine
Search Engines and Information Retrieval
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Information Retrieval in Practice
INFO 624 Week 3 Retrieval System Evaluation
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
1 Using Scopus for Literature Research. 2 Why Scopus?  A comprehensive abstract and citation database of peer- reviewed literature and quality web sources.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
PubMed/How to Search, Display, Download & (module 4.1)
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Search Engines and Information Retrieval Chapter 1.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
Multi-agent Research Tool (MART) A proposal for MSE project Madhukar Kumar.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
1 ScopusScopus Empowering Your Research. 2 As a Comprehensive Abstracts Database ~18,000 sources (90% peer-reviewed journals) from 5,000 publishers Comprehensive.
PubMed Overview From the HINARI Content page, we can access PubMed by clicking on Search inside HINARI full-text using PubMed. Note: If you do not properly.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage.
GEORGIOS FAKAS Department of Computing and Mathematics, Manchester Metropolitan University Manchester, UK. Automated Generation of Object.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Web- and Multimedia-based Information Systems Lecture 2.
Vector Space Models.
Information Retrieval
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Final Year Project – I Smart Recruiter Group Members: Uzair Siddiqui [05363] Rehma Ather [05625] Meeran Khan [05364] Syed Maaz Alam [05284] Supervisor.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
Automated Question Answering Suggestion Using User Expert and Semantic Information การแนะนำการตอบคำถามอัตโนมัติ โดยใช้ข้อมูลผู้เชี่ยวชาญ และข้อมูลเชิง.
Information Retrieval in Practice
Databases- presentation and training
GC101 Introduction to computers and programs
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval (in Practice)
CS 430: Information Discovery
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Introduction to Programming
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems

Table of Contents Problem statement Suggested algorithm Queries processing Documents ranking Experimental results 2

Problem statement Problem statement Suggested algorithm Queries processing Documents ranking Experimental results 3

Environment A question-answer portal is considered. Answers are produced by technical support persons. Several existing databases may contain the needed answer for a question. The task is to decrease the workload of the support team. 4

Workflow 5 CustomerWeb formSupport CustomerWeb formSearchSupport Search doesn’t help Before: After: Search helps

Data Processing 6 Q&A database Forums Web form Documents database (help, FAQ, etc.) Support team Search system Customers’ questions (natural language text) Links to documents Input Output

7

Using Message Subject for Search - Results 8

Why it’s not a Typical Web Search Queries consist of multiple sentences, instead of several keywords The number of documents is not very big (tens of thousands) Indexed documents consider a single subject or several related subjects 9

10 Problem statement Suggested algorithm Suggested algorithm Queries processing Documents ranking Experimental results

The Suggested Algorithm 11 Build CG for input text, filter out unrelated words Get concepts mentioned in the text (context matrix) Get documents with the same concepts (filter out irrelevant documents) Rank documents

Advantages of the Algorithm Words and sentences filtering allows excluding words and phrases which possibly do not affect the meaning of the text. The task of text search decreases to phrases search. Using concepts for articles filtering decreases the impact of polysemy on search results. Getting articles with specific concepts is expected to be faster than searching for articles with specific keywords in the entire corpus. 12

Problem statement Suggested algorithm Queries processing Queries processing Documents ranking Experimental results 13

Queries Processing Noise words filtering Phrases detection Word forms expansion (with lesser weight) Synonyms expansion (with lesser weight) 14

Phrases Detection – Punctuation Marks (During Indexing) Examples: issue-tracking tools => [N, N ]...the issue, but{stop word} tracking changes... => [N, N + 3] Object.Method() => Object[N], Method[N ] …some object. Method A shows… => object[N], Method[N + 15] 15

Phrases Detection - Semantics Despite possible errors, users tend to use correct word combinations for technical details description. A conceptual graph build from a question’s text allows filtering out unrelated words and word combinations which are not grammatically correct. 16

Morphological analysis word formation paradigms from Russian & English languages using dictionaries Semantic analysis semantic role labelling using templates How Conceptual Graphs are Built

Sample Query “Hi there! i have a script test with a bunch of checkpoints, but when it hits a checkpoint cannot be verified, the execution of the script stops and any tests after the failed checkpoint do not get executed. Thank's in advance. Randy” 18

A Conceptual Graph Fragment 19

Filtered out Sentences 20

Parsing results Phrases used as an input for full-text search. Each phrase has its own weight. As a result, the task of searching for a given text can be reduced to searching for a number of phrases. This task can be solved via the suggested algorithm. 21

Problem statement Suggested algorithm Queries processing Documents ranking Documents ranking Experimental results 22

Document Model Vector model is used to represent an indexed document or a query: The native methods of the control => [the (0.333), native(0.166), methods(0.166), of (0.166), control(0.166)] [0, 5] [1] [2] [3] [4] 23

The Sought-for Phrase is Present in an Indexed Document if…...(distance between each words pair) < M, where M is the artificial word position increment value for sentence breaks. Sample query: AJAX applications testing AJAX web applications are, indeed, difficult for testing. => total words distance: 7 No AJAX applications. Testing desktop applications is another task. => total words distance: 16 24

Phrases Relevance 25 Arithmetical mean for each phrase (p i ) detected in a query (q). Where w pi – the weight of the phrase in the query, R p – document’s relevance for p i calculated via the following formula: where – the number of words in p i, – total words distance in a document (d j ), calculated for each occurrence of p i.

Resulting Relevance 26 where R phrase – phrases relevance, W field – indexed field’s weight

Problem statement Suggested algorithm Queries processing Documents ranking Experimental results Experimental results 27

Performance Measurement Formula 28 where rel i – assessor-defined relevance [0..2], i – the result’s order number, p = 10 – the number of considered results

Experimental Results 29 The average quality of «top 10» search results (discounted cumulative gain), max=10,51 Number of queries

Conclusions It is necessary to perform phrase search when finding an answer in an automated way. Conceptual graphs allow detecting phrases in natural language queries. Storing conceptual graphs instead of document vectors as indexes and using the graphs directly for relevance calculation can be an interesting approach, which will be examined in a future work. 30

Thank you! Any questions? 31