Presentation is loading. Please wait.

Presentation is loading. Please wait.

Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.

Similar presentations


Presentation on theme: "Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG."— Presentation transcript:

1 going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG

2 Contents The BCS Information Retrieval SG What is IR anyway? How search engines work Why search is hard Where’s it all going?

3 Information Retrieval SG Growing rapidly –750+ members Annual conference (ECIR) –FDIA Various 1-day events –Search Solutions Informer Discounts for various events, e.g. SIGIR … is free to join!

4 Information Retrieval SG Traditional focus on search (text retrieval) –Knowledge management, Multimedia retrieval, User experience, Information visualisation, extraction, summarisation, etc. Latest issue of Informer: –“Searching for the Music You Like” –“Exploring Maps through Geo-referenced Images and RDF Shared Metadata” –“Using Semantic Relations to improve Question Answering” –“Modeling & Annotation of Dance Media Semantics”

5 What is IR? “Science of searching for: –information in documentsinformation –documents themselves –metadata which describe documents,metadata –within databasesdatabases …whether relational stand-alone databases or hypertextually-networked databases such as the World Wide Web”relational hypertextually World Wide Web

6 The Need for IR In a word … Infoglut 800Mb of recorded information is produced per person per year [Computing magazine] Up to 80% of corporate information is unstructured –Documents, emails, images, voicemail, etc. So …can’t we just use Google?

7 How do Search Engines Work? On the surface: 1.Understand what the user wants 2.Find documents about that topic In reality: 1.Count words 2.Apply a simple equation

8 How do Search Engines Work? 1.Measure the conceptual distance between your query and each document in the DB 2.Return the best matches [Source: Maristella Agosti, University of Padova]

9 The Central Problem in IR Information SeekerAuthor Concepts Query Terms Document Terms Do these represent the same concepts? [Source: Jimmy Lin, University of Maryland]

10 The Central Problem in IR How do you represent the concepts? –Documents and queries = “bag of words” Unordered set of terms + numeric weights How do you calculate similarity? –Set theory (e.g. Boolean) –Algebraic (e.g. vector space) –Probabilistic

11 IR models [Source: Wikipedia]

12 Assume that results are either relevant or non-relevant Precision: –Proportion of retrieved documents that are relevant Recall: –Proportion of known-relevant documents that were actually retrieved But what about: indexing / retrieval speed, query language, user experience, etc? How do we Evaluate Search? relevantretrieved

13 Why Search is Hard Document representation –Keywords are not enough Blind Venetian = Venetian Blind –Terms are not independent Structural & discourse dependencies, co- references, etc. Imperfect “stop lists” –the, and, of…

14 Why Search is Hard Morphological relationships –Computer, computing, compute, computed… Index documents using word stems –False positives: –organization, organ  organ –police, policy  polic –arm, army  arm –False negatives: –cylinder, cylindrical –create, creation –Europe, European –Prefixes are particularly difficult –Un*, dis* –Delegate = de-leg-ate –Ratify = rat-ify

15 Why Search is Hard Named entity recognition –Companies in New York –New companies in York NEs are highly discriminatory –People –Places –Organisations Many vertical applications –e.g. bioscience

16 Why Search is Hard Semantic relationships –Car = automobile –Buy = purchase –Sick = ill Synonym rings –Car, automobile, truck, bus, taxi... –Appropriate level of abstraction depends on user & task Development of subject-specific taxonomies –“concept matching”

17 Why Search is Hard Word sense disambiguation –“Bank” Financial institution? Part of a river? An aerial manoeuvre? Active research area –Categorisation & clustering of results

18 Google’s Insight Exploit the link structure inherent in the web –calculate measure of document’s value Independent of any query –“PageRank” Overall relevance based on 100+ parameters –Constant battle with SEOs Enterprise search is a different proposition… –As is desktop search

19 Where’s it all going? Vertical search –Jobs, travel, health, people, etc. Rich media search –Audio, video, TV, images Specialised content search –blogs, news, classifieds Social search Personalisation

20 Where’s it all going? Mobile search Answer engines –Active research community in Question Answering Multi / cross-lingual search Search agents Human UI

21 Further Information www.irsg.bcs.org Informer ECIR (March 2008, Glasgow) Search Solutions 2008 (Sept 2008, London)


Download ppt "Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG."

Similar presentations


Ads by Google