Presentation is loading. Please wait.

Presentation is loading. Please wait.

How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Similar presentations


Presentation on theme: "How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”"— Presentation transcript:

1 How Do We Find Information?

2 Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom” Francis Bacon Search Engines 2

3 What are we looking for?  We are  Looking for X.  Q&A: population of China  Known-item Search: “Cather in the Rye”  Looking for something like/about X.  General/background info: Taliban  Collection Development: IR Literature  Similar to (known) X: like “Cather in the Rye”  WhatyoumacallX: “the rye-boy story”  Looking for something  Problem Resoultion: how can we fight terrorism?  Knowledge Development: what is IR?  Looking  Need something, but don’t know what – what’s it all about?  Serendipity: Web surfing Search Engines 3

4 How do we find it?  Brute force search  Easy to build, maintain, and use  Searcher does all the work; Hard to get satisfaction  Organize/structure the data (Information Organization)  Intuitive to use  Hard to build and maintain  Knowledge of builder’s language & organization structure is crucial  Use a search tool (Information Retrieval)  Easier to build and maintain: Less manipulation of data  Sometimes works, sometimes not (Helps to know the language of the data)  Ask the experts (Expert System)  Easy and satisfying to use (by definition)  “Expert” knowledge is transitory, hard to encapsulate  Go with the crowd (User Ratings > Recommender System > PageRank)  Relatively easy to build and maintain  Limited utility: doesn’t work with “unpopular” X  Zen-Fusion search. Search Engines 4

5 Information Seeking Process: Dynamic, Interactive, Iterative UserIntermediaryInformation What am I looking for? - Identification of info. need How do I find it? - Query formulation What are we looking for? - Discovery of user’s information need - Query representation Where is it? - Query-document matching What is it? - Collection - Classification How is it found? - Data structure - Representation 5 Search Engines

6 IR vs. IO Information Organization: - Add structure & annotation Information Retrieval - Create a searchable index Information Access - Retrieve information Data Mining - Discover Knowledge 6 Search Engines

7 Information Retrieval Representation - indexing, term weighting Searchable IndexRaw Data Query Formulation - “What is IR?” Search Results - (ranked) document list D1wd1 wd2 wd3 D2wd2 wd4 wd2 wd3 D3wd1 wd4 D1D2D3 wd1101 wd2120 wd3110 wd4011 1D2 2D1 3D3 7 Search Engines

8 Information Organization Representation - NLP & Machine Learning Organized DataRaw Data Query Formulation - “What is IR?” Search Results - document groups 8 Search Engines

9 Natural Language Processing (NLP)  Research Area, technique, tool for  Knowledge Discovery, Data Mining  Lexical Analysis using  Part-of-Speech (POS) tagging  Sentence Parsing 9 Search Engines

10 Machine Learning  Research Area, technique, tool for  Information Organization, Knowledge Discovery, Data Mining  Information Organization via  Supervised Learning (Automatic Classification)  Unsupervised Learning (Clustering) Class 1 Class 2 Class 1 Class 2 Classification Clustering 10 Search Engines

11  Clustering  Document Clustering  Cluster Hypothesis – Documents having similar contents tend to be relevant to the same query  Rank clusters by Query-Cluster Similarity – Cluster documents based on vector similarity  Post-retrieval clustering – Scatter-Gather Scatter-Gather  Keyword Clustering  Automatic Thesaurus Construction – Query Expansion IO for IR 11 Search Engine

12  Classification  Document Categorization  classify documents into manually defined categories – supports hierarchical browsing, query expansion via relevance feedback  Document Indexing  assign keywords to documents – automatic indexing with controlled vocabulary, metadata generation  Document Filtering  e.g. news delivery, email spam filtering  Query Classification  collection selection  algorithm selection IO for IR 12 Search Engine

13 Search Engines 13


Download ppt "How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”"

Similar presentations


Ads by Google