Download presentation
Presentation is loading. Please wait.
Published byCalvin Thornton Modified over 6 years ago
1
Improving precision and recall in study retrieval
A concept for thesaurus-based syntactic indexing Pascal Siegers and Tanja Friedrich, Data Archive for the Social Sciences at GESIS
2
What‘s inside? Subject indexing at GESIS
Ambiguity treatment in indexing A concept for syntactic indexing Conclusion and outlook
3
What‘s inside? Subject indexing at GESIS
Ambiguity treatment in indexing A concept for syntactic indexing Conclusion and outlook
4
The GESIS data catalogue
Archived studies are documented for retrieval and access (download/delivery) Contains detailed study descriptions for approx studies
9
Subject indexing at GESIS
Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is employed
10
Subject indexing at GESIS
Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is employed
11
Subject indexing at GESIS
Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is employed Satisfaction with life (happiness)
12
Subject indexing at GESIS
Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is employed Satisfaction with life (happiness)
13
Subject indexing at GESIS
Currently, GESIS does not use a thesaurus for study indexing Free keywording on variable level is employed Satisfaction with life (happiness) government should provide only basic health care services
14
Subject indexing at GESIS
The good:
15
Subject indexing at GESIS
The good: Indexing according to users‘ needs: question or variable level indexing allows retrieval of constructs for secondary analysis
16
Subject indexing at GESIS
The good: The bad: Indexing according to users‘ needs: question or variable level indexing allows retrieval of constructs for secondary analysis
17
Subject indexing at GESIS
The good: The bad: Indexing according to users‘ needs: question or variable level indexing allows retrieval of constructs for secondary analysis No controlled vocabulary (thesaurus): no control of semantic ambiguity in retrieval
18
What‘s inside? Subject indexing at GESIS
Ambiguity treatment in indexing A concept for syntactic indexing Conclusion and outlook
19
Examples for semantic ambiguity
Problem with synonyms: Problem with homonyms: Users search for … … guest or visitor … enterprise or company … organic farming or biological farming Users will obtain not all relevant items Users want to find one of … … association (political, legal) or association (psychological) … content (adjective) or content (noun) Users will obtain irrelevant items
20
Results of semantic ambiguity
False associations and a tendency towards low recall in retrieval
21
Solution: Employ a thesaurus
To tackle the semantic ambiguity while retaining specifity (in-depth indexing on question or variable level) But be careful not to gain syntactic ambiguity
22
Example for syntactic ambiguity
Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.
23
Example for syntactic ambiguity
Nation-wide public opinion survey of U.S. attitudes on the Middle East Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.
24
Example for syntactic ambiguity
Nation-wide public opinion survey of U.S. attitudes on the Middle East PUBLIC OPINION TELEPHONE SURVEYS UNITED STATES ATTITUDES MIDDLE EAST Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.
25
Example for syntactic ambiguity
Attitudes towards Middle East in the United States? Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.
26
Example for syntactic ambiguity
Attitudes towards Middle East in the United States? OR Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.
27
Example for syntactic ambiguity
Attitudes towards Middle East in the United States? OR Attitudes towards United States in the Middle East? Lancaster 1998, Indexing and Abstracting in Theory and Practice, 2nd Ed. London, Library Association Publishing, p. 7.
28
Results of syntactic ambiguity
False associations and a tendency towards low precision in retrieval
29
Summing-up ambiguity treatment
30
Summing-up ambiguity treatment
The good:
31
Summing-up ambiguity treatment
The good: Use of a thesaurus reduces semantic ambiguity and improves recall
32
Summing-up ambiguity treatment
The good: The bad: Use of a thesaurus reduces semantic ambiguity and improves recall
33
Summing-up ambiguity treatment
The good: The bad: Use of a thesaurus reduces semantic ambiguity and improves recall Abandoning the free keywording increases syntactic ambiguity and lowers precision
34
Solution: Employ a syntax
Tackle the bad in the present indexing: employ a thesaurus Take the good in the present indexing: in-depth indexing on question or variable level Thesaurus-based syntactic indexing
35
Solution: Employ a syntax
Our syntax will be composed of Term linking and Role operators Are both common syntactic rules in indexing
36
What‘s inside? Subject indexing at GESIS
Ambiguity treatment in indexing A concept for syntactic indexing Conclusion and outlook
37
Syntax Term linking
38
Term linking Not just one flat term list
39
Term linking Not just one flat term list
40
Term linking Not just one flat term list
41
Term linking Not just one flat term list PUBLIC OPINION
TELEPHONE SURVEYS UNITED STATES ATTITUDES MIDDLE EAST ISRAEL EGYPT ARAB NATIONS PALESTINE LIBERATION ORGANISATION PEACE CONFERENCES PEACE PALESTINIAN STATE FOREIGN AID POLITICAL LEADERS
42
Term linking Terms that refer to one question or variable are grouped/ linked, like PALESTINIAN STATE; ATTITUDES MIDDLE EAST; PEACE; ESTIMATION
43
Syntax Role operators
44
Role operators Terms are classified as directive terms and subject terms
45
Role operators Terms are classified as directive terms and subject terms Allows to identify measurable constructs, like ‚attitudes towards an independent Palestinian state‘ ‚estimation of peace in the Middle East‘
46
Social Science Construct
Role operators Social Science Construct Contents/topics (subject) any subject area relevant in social science E.g. work, family religion, education Attributes (direction): Cognition Evaluation Affection Action [objective characteristics] The goal of indexing is to represent the concept using a controlled vocabulary. With regard to research data, a construct has to be a measurable unit (otherwise no data would exist). Measurable units can be defined, both in terms of contents or topics and in terms of attributes related to the contents. In most cases, simple contents or topics cannot be measured because the range of objects is not clearly defined. For instance, it is difficult to measure euthanasia. Scholars have to specify the attribute of euthanasia they wish to measure. It has to be clearly defined whether the construct refers to an attitude or an experience or something else. Similarly, it is not possible to measure attributes like attitudes or trust without specifying which objects are addressed (attitudes do not exist without an object they refer to. The same is true for trust: the trusted object or the trust relation (trustee-truster) have to be specified for measurement although the very abstract of trust is a concept in social sciences). The attribute of the construct defines the nature of the data. In social science four broad classes of objects can be distinguished (adaption of Talcott Parsons’ General Theory of Action): (1) Cognition, (2) Evaluation, (3) Affection/Emotion, and (4) Action (which is the most important category for theory but – paradoxically – not for measurement). A fifth class would contain more ascribed characteristics (or one might prefer the term „objective characteristics“ of units under study. For individuals these might be gender, age, income or blood-pressure. For organizations these might be number of employees, branch, turnover. For objective characteristics measured in survey research, most often single descriptors from TheSoz are available for indexing. In most cases, topic and attribute form a measurable unit. Our concept for indexing uses this feature of constructs for semantic indexing. Measurable unit
47
Subject and directive terms
48
Subject and directive terms
Subject terms Specify the contents of the measurement As specific as possible Combinations of terms, if necessary
49
Subject and directive terms
Subject terms Specify the contents of the measurement As specific as possible Combinations of terms, if necessary Directive terms Specify the attributes of the measurement Limited heterogeneity in directive terms to facilitate faceted retrieval
50
Examples for directive terms
Cognition PERCEPTION KNOLEDGE AWARENESS INTEREST BELIEF ORIENTATION Evaluation ATTITUDE PREFERENCE JUDGMENT PREJUDICE SATISFACTION ACCEPTANCE/APPROVAL REJECTION/REFUSAL Affection MOOD FEAR ANGER/ANNOYANCE HAPPINESS HATE LOVE Action BEHAVIOR USE/UTILIZATION CHOICE EXPERIENCE INTERACTION ACTIVITY CONSTRUCTION/DESTRUCTION Faceted search is the more efficient the more heterogeneity of directive terms is limited.
51
The complete syntax Only exception: single descriptors from the thesaurus that include both, the topic and the attribute (e.g. life satisfaction and/or happiness).
52
Measurable Unit (e.g. survey question)
The complete syntax Measurable Unit (e.g. survey question) Only exception: single descriptors from the thesaurus that include both, the topic and the attribute (e.g. life satisfaction and/or happiness).
53
Measurable Unit (e.g. survey question)
The complete syntax Measurable Unit (e.g. survey question) subject term(s) (ST) directive terms (DT) Only exception: single descriptors from the thesaurus that include both, the topic and the attribute (e.g. life satisfaction and/or happiness).
54
The complete syntax Measurable Unit (e.g. survey question)
subject term(s) (ST) directive terms (DT) Only exception: single descriptors from the thesaurus that include both, the topic and the attribute (e.g. life satisfaction and/or happiness). Precoordination/syntactic indexing = linked terms that are specified by role operators
55
Examples: corruption „There is corruption in the in the national public institutions in Germany.“ (Eurobarometer 76.1; ZA5565) Directive term: PERCEPTION Subject term(s): CORRUPTION, PUBLIC INSTITUTIONS Syntactic Indexing: PERCEPTION; CORRUPTION; PUBLIC INSTITUTIONS „Are you personally affected by corruption in your daily activities?“ (Eurobarometer 76.1; ZA5565) This example results in the following syntactical indexation: 1.1= DT: perception & TT: (TT_1: corruption & TT_2: political institutions). 1.2= DT: experience & TT: corruption. Directive term: EXPERIENCE Subject term(s): CORRUPTION Syntactic Indexing: EXPERIENCE; CORRUPTION
56
What‘s inside? Subject indexing at GESIS
Ambiguity treatment in indexing A concept for syntactic indexing Conclusion and outlook
57
Conclusion Thesaurus-based syntactic indexing
helps us to reduce semantic ambiguity while we retain our level of specifity and depth in indexing
58
Outlook Our concept improves our study descriptions and enables new retrieval techniques
59
Outlook Syntactic indexing can enhance faceted retrieval
Refine search results by subject or directive terms
60
Refine search / narrow results
Refine by topic of study International politics (19) Conflict, security and peace (18) Society, culture (10) Refine by questions Refine by subject Middle East (20) Conflict (19) Israel (19) Peace (19) Palestinian State (19) USA (18) Egypt (17) Refine by intention Attitude (15) Behaviour (12) Knowledge (9) Refine by country USA (15) Israel (10) France (8) Australia (3) Refine by time period 2003 (10) 2012 (5) 2008 (5) 2010 (3)
61
Thank you for your attention!
Dr. Pascal Siegers Tanja Friedrich
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.