Presentation is loading. Please wait.

Presentation is loading. Please wait.

UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S 2 7 6 I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.

Similar presentations


Presentation on theme: "UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S 2 7 6 I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week."— Presentation transcript:

1 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S 2 7 6 I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week 1: Vocabulary, relevance, and evaluation Jonathan Furner

2 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 2 COURSE THEME n user-oriented focus... n... on system design

3 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 3 TODAY’S THEMES n what is the function of an IR system? n what are the components of an IR system? n what kinds of IR system are there? n what are the kinds of problem that need to be overcome when designing an IR system?

4 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 4 INFO-SOURCES (DOCUMENTS) - info-as-thing THE SITUATION - domain, tasks, needs, goals FUNCTION: 1 THE INFO-SEEKER 1 knowledge structure = K(S) - about World 1 - domain, situation - task, need, goal - about World 2 - about World 3 - info sources - info services 2 cognitive abilities 3 cognitive styles K(S) +  I = K (S +  S) INFO-as- knowledge =  I INFO SERVICES - info workers - ref/bib sources - IR systems info-as-process World 2 World 3

5 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 5 FUNCTION: 2 n to identify all and only those documents in a collection that (individually or collectively) satisfy the needs of the information seeker n i.e., to identify relevant documents n needs are key

6 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 6 COMPONENTS: 1 n people l seekers l authors l intermediaries l catalogers l indexers l designers l funders n things l needs l documents & collections l queries l records & databases l terms l systems l money

7 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 7 COMPONENTS: 2 SEEKER NEED REQUEST QUERY matching / ranking AUTHORS INFORMATION DOCUMENTS RECORDS QUERY REPDOCUMENT REPS display output input database creation document analysis query formulation query analysis (feedback) relevance? obj. subj.

8 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 8 DIMENSIONS OF VARIETY : 1 n document collection l data type: text vs. image vs. multimedia l coverage: subject matter, language, currency, etc. l size: hundreds vs. billions of records l location: congregated vs. distributed

9 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 9 DIMENSIONS OF VARIETY : 2 n database l storage location: remote vs. local l storage medium: optical vs. magnetic l record type: full-text vs. bibliographic vs. numeric l field structure: unstructured vs. semi- structured vs. highly structured l representation of inter-document relationships: implicit vs. explicit

10 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 10 DIMENSIONS OF VARIETY : 3 n document analysis (indexing) mechanism l agent: automatic vs. manual l unit: word vs. phrase l origin: derived vs. assigned l vocabulary: controlled vs. natural language l coverage: full-text vs. field limitation l normalization: term vs. document vs. collection weighting l syntagmatic co-ordination: at index time vs. at search time

11 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 11 DIMENSIONS OF VARIETY : 4 n query analysis mechanism l query type: Boolean vs. unstructured vs. natural language l support for user profiling

12 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 12 DIMENSIONS OF VARIETY : 5 n matching / ranking (retrieval) mechanism l status: operational vs. experimental l model: exact-match (Boolean search) vs. best-match (similarity or ranked-output search) l type of similarity measure l type of ranking algorithm

13 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 13 DIMENSIONS OF VARIETY : 6 n user interface l degree and type of support for query formulation... –... and for query re-formulation / expansion l using a thesaurus, or user feedback, or “blind” l automatic or interactive l mode of presentation of search results –supportive of relevance judgment?

14 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 14 DIMENSIONS OF VARIETY : 7 n refinements l natural language processing (NLP) techniques –e.g., for phrase identification l passage retrieval l relevance feedback and query expansion techniques l query-by-example / “more like this” / “related records” l social feedback –e.g., recommender systems l hypertext / bibliometric techniques –link / citation analysis, of document relationships l data fusion: multiple sources of evidence

15 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 15 PROBLEMS: 1 n the vocabulary (objective relevance) problem l different people use different terms to refer to the same things n solution? vocabulary control and thesauri l encoding semantic knowledge (i) about terms, and (ii) about paradigmatic relationships between terms l providing user and indexer access to this knowledge base l providing interactive/automatic support for thesaurus-based query expansion

16 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 16 PROBLEMS: 2 n the (subjective) relevance problem l subjective relevance doesn’t depend simply on semantic content, but on other characteristics... –... of the document: e.g., perceived ‘quality’ –... of contexts of need and use –... of search history, due to dynamic nature of information need n solutions? l use of descriptive (non-topical) metadata l providing interactive/automatic support for feedback-based query expansion l automatic clustering (classification): clustered docs are assumed to be equally relevant l plus all the refinements listed a couple of slides ago

17 UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 17 PROBLEMS: 3 n the evaluation problem l finding alternatives to... –precision: ratio of number of relevant records retrieved to total number of records retrieved –recall: ratio of number of relevant records retrieved to total number of relevant records l outside laboratory setting, recall can only be estimated l different people use different criteria of ‘goodness’


Download ppt "UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S 2 7 6 I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week."

Similar presentations


Ads by Google