Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving precision and recall in study retrieval

Similar presentations


Presentation on theme: "Improving precision and recall in study retrieval"— Presentation transcript:

1 Improving precision and recall in study retrieval
A concept for thesaurus-based syntactic indexing Pascal Siegers and Tanja Friedrich, Data Archive for the Social Sciences at GESIS

2 What‘s inside? Subject indexing at GESIS
Ambiguity treatment in indexing A concept for syntactic indexing Conclusion and outlook

3 What‘s inside? Subject indexing at GESIS
Ambiguity treatment in indexing A concept for syntactic indexing Conclusion and outlook

4 The GESIS data catalogue
Archived studies are documented for retrieval and access (download/delivery) Contains detailed study descriptions for approx studies

5

6

7

8

9 Subject indexing at GESIS
Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is employed

10 Subject indexing at GESIS
Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is employed

11 Subject indexing at GESIS
Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is employed Satisfaction with life (happiness)

12 Subject indexing at GESIS
Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is employed Satisfaction with life (happiness)

13 Subject indexing at GESIS
Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is employed Satisfaction with life (happiness) government should provide only basic health care services

14 Subject indexing at GESIS
The good:

15 Subject indexing at GESIS
The good: Indexing according to users‘ needs: question or variable level indexing allows retrieval of constructs for secondary analysis

16 Subject indexing at GESIS
The good: The bad: Indexing according to users‘ needs: question or variable level indexing allows retrieval of constructs for secondary analysis

17 Subject indexing at GESIS
The good: The bad: Indexing according to users‘ needs: question or variable level indexing allows retrieval of constructs for secondary analysis No controlled vocabulary (thesaurus): no control of semantic ambiguity in retrieval

18 What‘s inside? Subject indexing at GESIS
Ambiguity treatment in indexing A concept for syntactic indexing Conclusion and outlook

19 Examples for semantic ambiguity
Problem with synonyms: Problem with homonyms: Users search for … … guest or visitor … enterprise or company … organic farming or biological farming  Users will obtain not all relevant items Users want to find one of … … association (political, legal) or association (psychological) … content (adjective) or content (noun)  Users will obtain irrelevant items

20 Results of semantic ambiguity
False associations and a tendency towards low recall in retrieval

21 Solution: Employ a thesaurus
To tackle the semantic ambiguity while retaining specifity (in-depth indexing on question or variable level) But be careful not to gain syntactic ambiguity

22 Example for syntactic ambiguity
Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.

23 Example for syntactic ambiguity
Nation-wide public opinion survey of U.S. attitudes on the Middle East Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.

24 Example for syntactic ambiguity
Nation-wide public opinion survey of U.S. attitudes on the Middle East PUBLIC OPINION TELEPHONE SURVEYS UNITED STATES ATTITUDES MIDDLE EAST Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.

25 Example for syntactic ambiguity
Attitudes towards Middle East in the United States? Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.

26 Example for syntactic ambiguity
Attitudes towards Middle East in the United States? OR Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.

27 Example for syntactic ambiguity
Attitudes towards Middle East in the United States? OR Attitudes towards United States in the Middle East? Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.

28 Results of syntactic ambiguity
False associations and a tendency towards low precision in retrieval

29 Summing-up ambiguity treatment

30 Summing-up ambiguity treatment
The good:

31 Summing-up ambiguity treatment
The good: Use of a thesaurus reduces semantic ambiguity and improves recall

32 Summing-up ambiguity treatment
The good: The bad: Use of a thesaurus reduces semantic ambiguity and improves recall

33 Summing-up ambiguity treatment
The good: The bad: Use of a thesaurus reduces semantic ambiguity and improves recall Abandoning the free keywording increases syntactic ambiguity and lowers precision

34 Solution: Employ a syntax
Tackle the bad in the present indexing: employ a thesaurus Take the good in the present indexing: in-depth indexing on question or variable level  Thesaurus-based syntactic indexing

35 Solution: Employ a syntax
Our syntax will be composed of Term linking and Role operators  Are both common syntactic rules in indexing

36 What‘s inside? Subject indexing at GESIS
Ambiguity treatment in indexing A concept for syntactic indexing Conclusion and outlook

37 Syntax Term linking

38 Term linking Not just one flat term list

39 Term linking Not just one flat term list

40 Term linking Not just one flat term list

41 Term linking Not just one flat term list PUBLIC OPINION
TELEPHONE SURVEYS UNITED STATES ATTITUDES MIDDLE EAST ISRAEL EGYPT ARAB NATIONS PALESTINE LIBERATION ORGANISATION PEACE CONFERENCES PEACE PALESTINIAN STATE FOREIGN AID POLITICAL LEADERS

42 Term linking Terms that refer to one question or variable are grouped/ linked, like PALESTINIAN STATE; ATTITUDES MIDDLE EAST; PEACE; ESTIMATION

43 Syntax Role operators

44 Role operators Terms are classified as directive terms and subject terms

45 Role operators Terms are classified as directive terms and subject terms Allows to identify measurable constructs, like ‚attitudes towards an independent Palestinian state‘ ‚estimation of peace in the Middle East‘

46 Social Science Construct
Role operators Social Science Construct Contents/topics (subject) any subject area relevant in social science E.g. work, family religion, education Attributes (direction): Cognition Evaluation Affection Action [objective characteristics] The goal of indexing is to represent the concept using a controlled vocabulary. With regard to research data, a construct has to be a measurable unit (otherwise no data would exist). Measurable units can be defined, both in terms of contents or topics and in terms of attributes related to the contents. In most cases, simple contents or topics cannot be measured because the range of objects is not clearly defined. For instance, it is difficult to measure euthanasia. Scholars have to specify the attribute of euthanasia they wish to measure. It has to be clearly defined whether the construct refers to an attitude or an experience or something else. Similarly, it is not possible to measure attributes like attitudes or trust without specifying which objects are addressed (attitudes do not exist without an object they refer to. The same is true for trust: the trusted object or the trust relation (trustee-truster) have to be specified for measurement although the very abstract of trust is a concept in social sciences). The attribute of the construct defines the nature of the data. In social science four broad classes of objects can be distinguished (adaption of Talcott Parsons’ General Theory of Action): (1) Cognition, (2) Evaluation, (3) Affection/Emotion, and (4) Action (which is the most important category for theory but – paradoxically – not for measurement). A fifth class would contain more ascribed characteristics (or one might prefer the term „objective characteristics“ of units under study. For individuals these might be gender, age, income or blood-pressure. For organizations these might be number of employees, branch, turnover. For objective characteristics measured in survey research, most often single descriptors from TheSoz are available for indexing. In most cases, topic and attribute form a measurable unit. Our concept for indexing uses this feature of constructs for semantic indexing. Measurable unit

47 Subject and directive terms

48 Subject and directive terms
Subject terms Specify the contents of the measurement As specific as possible Combinations of terms, if necessary

49 Subject and directive terms
Subject terms Specify the contents of the measurement As specific as possible Combinations of terms, if necessary Directive terms Specify the attributes of the measurement Limited heterogeneity in directive terms to facilitate faceted retrieval

50 Examples for directive terms
Cognition PERCEPTION KNOLEDGE AWARENESS INTEREST BELIEF ORIENTATION Evaluation ATTITUDE PREFERENCE JUDGMENT PREJUDICE SATISFACTION ACCEPTANCE/APPROVAL REJECTION/REFUSAL Affection MOOD FEAR ANGER/ANNOYANCE HAPPINESS HATE LOVE Action BEHAVIOR USE/UTILIZATION CHOICE EXPERIENCE INTERACTION ACTIVITY CONSTRUCTION/DESTRUCTION Faceted search is the more efficient the more heterogeneity of directive terms is limited.

51 The complete syntax Only exception: single descriptors from the thesaurus that include both, the topic and the attribute (e.g. life satisfaction and/or happiness).

52 Measurable Unit (e.g. survey question)
The complete syntax Measurable Unit (e.g. survey question) Only exception: single descriptors from the thesaurus that include both, the topic and the attribute (e.g. life satisfaction and/or happiness).

53 Measurable Unit (e.g. survey question)
The complete syntax Measurable Unit (e.g. survey question) subject term(s) (ST) directive terms (DT) Only exception: single descriptors from the thesaurus that include both, the topic and the attribute (e.g. life satisfaction and/or happiness).

54 The complete syntax Measurable Unit (e.g. survey question)
subject term(s) (ST) directive terms (DT) Only exception: single descriptors from the thesaurus that include both, the topic and the attribute (e.g. life satisfaction and/or happiness). Precoordination/syntactic indexing = linked terms that are specified by role operators

55 Examples: corruption „There is corruption in the in the national public institutions in Germany.“ (Eurobarometer 76.1; ZA5565) Directive term: PERCEPTION Subject term(s): CORRUPTION, PUBLIC INSTITUTIONS Syntactic Indexing: PERCEPTION; CORRUPTION; PUBLIC INSTITUTIONS „Are you personally affected by corruption in your daily activities?“ (Eurobarometer 76.1; ZA5565) This example results in the following syntactical indexation: 1.1= DT: perception & TT: (TT_1: corruption & TT_2: political institutions). 1.2= DT: experience & TT: corruption. Directive term: EXPERIENCE Subject term(s): CORRUPTION Syntactic Indexing: EXPERIENCE; CORRUPTION

56 What‘s inside? Subject indexing at GESIS
Ambiguity treatment in indexing A concept for syntactic indexing Conclusion and outlook

57 Conclusion Thesaurus-based syntactic indexing
helps us to reduce semantic ambiguity while we retain our level of specifity and depth in indexing

58 Outlook Our concept improves our study descriptions and enables new retrieval techniques

59 Outlook Syntactic indexing can enhance faceted retrieval
Refine search results by subject or directive terms

60 Refine search / narrow results
Refine by topic of study International politics (19) Conflict, security and peace (18) Society, culture (10) Refine by questions Refine by subject Middle East (20) Conflict (19) Israel (19) Peace (19) Palestinian State (19) USA (18) Egypt (17) Refine by intention Attitude (15) Behaviour (12) Knowledge (9) Refine by country USA (15) Israel (10) France (8) Australia (3) Refine by time period 2003 (10) 2012 (5) 2008 (5) 2010 (3)

61 Thank you for your attention!
Dr. Pascal Siegers Tanja Friedrich


Download ppt "Improving precision and recall in study retrieval"

Similar presentations


Ads by Google