Presentation is loading. Please wait.

Presentation is loading. Please wait.

Retrieval languages some general concepts LIS 677.

Similar presentations


Presentation on theme: "Retrieval languages some general concepts LIS 677."— Presentation transcript:

1 Retrieval languages some general concepts LIS 677

2 O.E.D. MEANING OF "INDEX" the fore-finger:
first usage reported by OED is 1398 but here from Mrs. Browning, 1844: “and the left hand's index droppeth from the lips upon the cheek” LIS 677

3 O.E.D. MEANING OF "INDEX" a piece of wood, metal, or the like, which serves as a pointer, esp. in scientific instruments: first reported usage is 1594, but here, from 1613: "so that his broad index may be set to point to the degrees of the altitude of the pole" LIS 677

4 O.E.D. MEANING OF "INDEX" that which serves to direct or point to a particular fact or conclusion; a guiding principle, reported first in 1598: "Lest when my lisping guiltie Tongue should hault, My Lookes might prove the Index to my Fault"; or, in 1887, Stevenson wrote, "His son's empty guffaws struck him with pain as the indices of a weak mind." LIS 677

5 O.E.D. MEANING OF "INDEX" an alphabetical list placed (usually) at the end of a book, of the names, subjects, etc. occurring in it, with indication of the places in which they occur (first reported 1580) LIS 677

6 SUBJECT INDEXES What is a subject index? What do index terms refer to?
What is the general form of an indexing language? What are some of the evaluative measures for subject indexes? LIS 677

7 What is a subject index? A well-organized index is a structured set of retrieval elements for a specific target area The retrieval elements are index terms The retrieval elements of subject indexes are a subset of all possible retrieval elements We are interested in the target area's topics, subjects, content We want to find those parts of the text that are significantly about a specific subject or topic LIS 677

8 What do the index terms of a subject index refer to?
Are subjects indexed? Problem: complexity of subjects Example: the manufacture of multiwall kraft paper sacks for the manufacture of cement Is knowledge indexed? Problem: misinformation; non-information; falsehoods Is “aboutness” indexed? Problem: access is enhanced by concepts not derived from the document LIS 677

9 EVALUATIVE PARAMETERS
How good is the retrieval system? Two basic questions: What proportion of the items retrieved are on topic (i.e. relevant)? What proportion of the total number relevant items are retrieved? LIS 677

10 RECALL RATIO The proportion of relevant documents retrieved to total of relevant documents If R= the number of relevant documents retrieved, and C = the number of relevant documents in the collection, then Recall ratio = R/C LIS 677

11 RELEVANCE RATIO The proportion of relevant documents among the total number of documents retrieved If R= the number of relevant documents retrieved, and L= the total number of documents retrieved, then Relevance ratio = R/L LIS 677

12 EXAMPLE Assume 100 relevant docs in collection (C), 80 relevant docs retrieved (R), 200 docs retrieved in all (L) Recall ratio is: 80/100=.80 Relevance ratio is: 80/200=.40 LIS 677

13 Two questions How can you guarantee 100% recall?
Retrieve every document in the collection If a second search retrieves more documents than the first, has recall been improved? No, because: improving recall depends upon increasing the ratio of relevant documents retrieved to the total number of relevant documents indexed in the collection the increase in numbers may be due to retrieval of irrelevant documents LIS 677

14 THE FORM OF RETRIEVAL LANGUAGES
L=Op,q,r(C1, C2, Cn), where L is the language Op,q,r is a set of operations p denotes lexical operations q denotes semantic operations r denotes syntactic operations Ci is a set of concepts LIS 677

15 SPECIFICITY "the extent to which the system permits us to be precise when specifying the subject of a document we are processing" (Foskett) it is a function of the system, the retrieval language the degree to which a lexical item (index term) reflects the precise generic level of the concept it stands for the ideal: "our specification should in every case be coextensive with the subject of the document" (Foskett) LIS 677

16 EXHAUSTIVITY The "extent to which we analyze any given document to establish exactly what subject content we have to specify" (Foskett) A management decision about the level of document importance should we index what is worth noting? The most common measure of exhaustivity is the number of different index terms used in a system LIS 677


Download ppt "Retrieval languages some general concepts LIS 677."

Similar presentations


Ads by Google