Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Discussion Class 1 Inverted Files. 2 Discussion Classes Format: Question Ask a member of the class to answer Provide opportunity for others to comment.

Similar presentations


Presentation on theme: "1 Discussion Class 1 Inverted Files. 2 Discussion Classes Format: Question Ask a member of the class to answer Provide opportunity for others to comment."— Presentation transcript:

1 1 Discussion Class 1 Inverted Files

2 2 Discussion Classes Format: Question Ask a member of the class to answer Provide opportunity for others to comment When answering: Give your name. Make sure that the TA hears it. Stand up Speak clearly so that all the class can hear

3 3 Question 1: Terminology (a) What is a keyword? How is it used? (b) What is a controlled vocabulary? How might it be used?

4 4 Question 2: Files The book shows an inverted file implemented as three files: Index file Postings file Documents file (a) What is each used for? (b) Why are they kept separate?

5 5 (a) What is a "lexicographic index"? (b) Why are lexicographic indexes useful in information retrieval? (c) Give an example of an indexing system that is not lexicographic. Question 3: Lexicographic Indexes

6 6 The first stage in building an inverted file is to create a list of words and their locations in the text. (a) Before this list can be built, what decisions must be made? (b) What steps are involved in creating this list? Question 4: Building an Inverted File

7 7 The second stage in building an inverted file is to sort the list of words and their locations in the text. The book describes a two-step algorithm by Harman and Candela for this purpose. (a) For what circumstances is this algorithm intended? (b) What are the two steps? Question 5: Sorting an Inverted Index

8 8 In the first step of the algorithm developed by Harman and Candela: (a) What data structure is used for the index file? Why is this appropriate? (b) What data structure is used for the postings file? Why is this appropriate? (c) Which files would be stored in memory and which on disk? Question 6: Sorting an Inverted Index

9 9 The first sentence of Section 3.4.2 reads, "The second technique to produce a sorted array inverted file is a fast inversion algorithm called FAST-INV (Copyright ©Edward A. Fox, Whay C. Lee, Virginia Tech)." What is surprising about this sentence? Question 7: Footnote


Download ppt "1 Discussion Class 1 Inverted Files. 2 Discussion Classes Format: Question Ask a member of the class to answer Provide opportunity for others to comment."

Similar presentations


Ads by Google