Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Slides:



Advertisements
Similar presentations
The Semantic Web: What, Why, and How? Ann Wrightson Principal Consultant, alphaXML Ltd
Advertisements

Maurice Hendrix (Semi-)automatic authoring of AH.
Maurice Hendrix (Semi-)automatic authoring of AH.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
“How Can Research Help Me?” Please make SURE your notes are similar to what I have written in mine.
4.01 How Web Pages Work.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
 2004 Tau Yenny, SI - Binus M0194 Web-based Programming Lanjut Session 11.
Information Retrieval
Maurice Hendrix and Alexandra Cristea (Semi-)automatic authoring of AH.
Overview of Search Engines
Search Engine Optimization March 23, 2011 Google Search Engine Optimization Starter Guide.
The Internet & The World Wide Web Notes
IDK0040 Võrgurakendused I Building a site: Publicising Deniss Kumlander.
Chapter 12 Creating and Using XML Documents HTML5 AND CSS Seventh Edition.
Adding metadata to web pages Please note: this is a temporary test document for use in internal testing only.
Creating a Simple Page: HTML Overview
Writing for the Web 101. Benefits of Writing Good Content The Web is most often the first place people go to find information –Good content improves image.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Social What? Social Bookmarking! Liane Haslauer Greater Manchester Professional Development Center
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Computer Information Technology – Section 3-4. HTML – The Language of the Internet Objectives: The Student will: 1. Look at HTML 2. Understand the basic.
Creating Webpage Using HTML
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
TODAY’S Lesson   Searching on the Internet . VOCABULARY  Search Engine  Web site  Spider  String/Indexer  Server  Link  Boolean  Query.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Metadata for the Web Andy Powell UKOLN University of Bath
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling  Web search.
● A system of Internet servers that support specially formatted documents. The documents are formatted in a markup language called HTML. What is the World.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
OWL Representing Information Using the Web Ontology Language.
HTML Basic. What is HTML HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a programming language, it.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Build Your Own Website Review of week 6 You should now have your website pretty complete You should now have your website pretty complete Are there any.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Headings are defined with the to tags. defines the largest heading. defines the smallest heading. Note: Browsers automatically add an empty line before.
Planning and Designing a Website Index Page Use it as a way to introduce yourself, and describe your website. Use it as a way to introduce yourself,
From XML to DAML – giving meaning to the World Wide Web Katia Sycara The Robotics Institute
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
 2003 Prentice Hall, Inc. All rights reserved. Outline Chapter 2 HTML (Hypertext Markup Language) Part II.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
CHAPTER 16 SEARCH ENGINE OPTIMIZATION. LEARNING OBJECTIVES How to monitor your site’s traffic What are the pros and cons of keyword advertising within.
4.01 How Web Pages Work.
Search Engine Optimization
Information Retrieval in Practice
4.01 How Web Pages Work.
4.01 How Web Pages Work.
LINKED DATA Telling the Library’s Story through
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Fred Dirkse CEO, OIC Group, Inc.
Zachary Cleaver Semantic Web.
Cataloging the Internet
Computer Terms 1 Terms Internet Terms 1 Internet Terms 2 Computer
4.01 How Web Pages Work.
Introduction to Search Engines
Presentation transcript:

Information Retrieval Liam Quin, Barefoot Computing, Toronto

Agenda Overview of Information Retrieval What people want, and how to give it to them Things people dont know they want, and how to do them

Chapter One: The Problem gooseberry

Gooseberry Picking Hurts Gooseberries have thorns. Gooseberry pickers in Botswana might not wear shirts (or shoes). When you pick one gooseberry, others fall to the ground. The harvest would be improved if we could retrieve the fallen fruit safely. There are texts on this on the Internet.

Searching for an Answer Search for information on texts about gooseberry retrieval on the web.

The Result Pages on text retrieval on information retrieval......and, at the top of the list... Cycling in Cape Gooseberry, Labrador

Lessons Indexes on words alone arent enough Word order can be important Relevance Ranking is often bogus Sometimes you have to wear shirt and shoes.

How can we Improve? Better textual analysis Word order Context Metadata External categorisation (RDF, Topic Maps) Grow thornless gooseberries

Better Textual Analysis Part of speech information during indexing Stemming (boy/boys, foot/feet, run/running, me/mine) Record more in index (caps, separation) Co-location analysis (mine next to gold) Ask the User what she means in the query (mine as in of me, or as in quarry?) Thesaurus Expansion of queries

Word Order Give added weight to word order: information retrieval vs. retrieval of information Times Square vs. square times include all words (What If Inc., The Times)

Context Co-location of words helps disambiguate The xml containing element Feedback from nearby documents (e.g. on the same website, or in the same chapter or publication) Domain-specific information at index- time

Metadata Add information to documents Dublin Core (e.g. Warwick Framework) The html meta element The html rel/rev attributes in links

External Categorisation Use xml schemas to add context information Document or site-wide information Resource Description Framework Topic Maps (iso 13250) Categorise the result set [see picture]

Grow thornless gooseberries Sometimes its easier to change the problem than to solve it as stated. Sometimes people dont describe the problem that they need solved. Sometimes its easier to solve a more general problem (thornless fruit? Or padded shirts)

What Most People Want Find this string or phrase in this element. Thats all most people ask for. Its all they want. But its hardly ever all they need.

The real needs Needs of other staff Executives who understand the problem Indirect needs –internal use by software –other departments –private uses by sneaky employees –enabling technologies change perspectives

I didnt know I could... Quality control –check for known errors –find unusual words or phrases –phrases not marked up Analysis –look for unusual markup –co-location (phrase summary)

Oooh, can you really? Automatic linking –Glossary –Glossary Index page –Dictionary Samples Add markup automatically –based on phrases in context

Summery Summary You may need more than you thought… …but it might do more than you expected… …but...

There is no gooseberry pie for lunch.

Liam Quin Barefoot Computing Toronto