Module 7b: Extracting/Controlling Terms and Semantic Relationships IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

Slides:



Advertisements
Similar presentations
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
Advertisements

Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 6 Advanced Data Modeling.
SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of.
Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Module 10b: Wrapup IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
SchemaLogic Workshop Part 2 Tools for Enterprise Metadata Management and Synchronization Prepared for the University of Washington Information School Applied.
Thesaurus Design and Development
Module 9a: Classification Schemes
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Module 6b: Creating Controlled Vocabularies IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Meaning Vocabulary Ch. 6 Closely related to comprehension.
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
What do you hate most about the web?
Vocabulary & languages in searching
EuroVoc, Eurlex, EU Bookshop Danica Maleková, Publications Office STS Bratislava, 22 October 2010.
Computers in Libraries Conference Search Tools Using Controlled Vocabularies Jan Herd, FLICC, Libraries and Emerging Technologies Joyce Koenemann, National.
Unit 2 — Building Web Part B) Designing the Web. Phase 1: Planning a Web Site Like an architect designing a building, adequately planning your Web site.
Computer System Analysis Chapter 10 Structuring System Requirements: Conceptual Data Modeling Dr. Sana’a Wafa Al-Sayegh 1 st quadmaster University of Palestine.
Research Strategies Step-by-Step An Introduction to Library Research Questions about this activity? Contact Kimberley Stephenson at
1 MeSH & Principles of Classification April 13, 2005.
LIBRARY OF CONGRESS SUBJECT HEADING By Ms. Preeti Patel Lecturer School of Library And Information Science DAVV, Indore
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
1 4. Content Organization In this chapter you will learn about: Organizational schemes: classification systems for organizing content into groups Organizational.
1 Catalog Displays, Retrieval, and FAST May 31, 2005.
Terminology and Standards Dan Gillman US Bureau of Labor Statistics.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
Controlled Vocabulary & Thesaurus Design Term Selection/Format & Synonyms.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Incorporating ARGOVOC in DSpace-based Agricultural Repositories Dr. Devika P. Madalli & Nabonita Guha Documentation Research & Training Centre Indian Statistical.
Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter rm.action?confKey=jhp7e.
Module 7a: Creating Controlled Vocabularies IMT530: Organization of Information Resources Winter 2008 Michael Crandall.
Chapter 8 Data Modeling Advanced Concepts Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
Conceptual Maps and Thesauri : A Comparison of Two Models of Representation Arising from Different Disciplinary Traditions Lalthoum Saàdani and Suzanne.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
1 Controlled Vocabularies Paul Miller Interoperability Focus UKOLN U KOLN is funded by Resource: the Council.
Controlled Vocabulary & Thesaurus Design Term Selection/Format & Synonyms.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Thesauri usage in information retrieval systems: example of LISTA and ERIC database thesaurus Kristina Feldvari Departmant of Information Sciences, Faculty.
Subject Analysis and Vocabulary Control Spring 2006, 6 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information.
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
IST_Seminar II CHAPTER 12 Instructional Methods. Objectives: Students will: Explain the role of all teachers in the development of critical thinking skills.
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
ORGANIZATION OF ELEMENTS OF INFORMATION The Thesaurus.
Subject Access to Your Information Sandy Tucker Texas A&M University Libraries August 1, 2006 Second International Symposium on Transportation Technology.
Welcome Opening Prayer. Content Objectives: 1.I will review the definition of texts and the teacher’s responsibility in choosing classroom materials.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
Combined Metamodel for UCM Contributed by Anthony B. Coates, Londata 17 February, 2008.
Slide 6 HMD1SPI376 - Slide 6. What is the Relationship Between BT and NT?  Normally, BT and NT are "inverse" links. In other words, if X is a broader.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
Writing Learning Outcomes Best Practices. Do Now What is your process for writing learning objectives? How do you come up with the information?
Part 3A-2: Document & Subject Analysis Documents Subjects Facets.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
Subject Analysis: An Introduction
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Information Organization
COMP6215 Semantic Web Technologies
Information Organization
Reading Objectives: Close Reading
TECHNICAL REPORTS WRITING
THESAURUS CONSTRUCTION: GROUND WATER
Presentation transcript:

Module 7b: Extracting/Controlling Terms and Semantic Relationships IMT530: Organization of Information Resources Winter 2007 Michael Crandall

IMT530- Organization of Information Resources2 Steps in Constructing CVs Define your domain Gather concepts –From user interviews, search logs, content analysis, preexisting vocabularies Select your approach Extract terminology Control your terms Organize your terms Maintain, maintain, maintain

IMT530- Organization of Information Resources3 Elements of Building CVs Select your approach –Pre- or post-coordinated (sixteenth century lute music or sixteenth century and lutes and music) –Open or closed (indexers can add terms or not) –Enumeration vs. synthesis (facets) Extract terms –Warrant (from users or domain or both) Control terms –Specificity (cats or Siamese cats?) –Control of homographs (qualifications) –Term consistency and word form (plurals, etc.) –Multiword/phrase sequence and form (inverted, normal form?) –Term definitions (scope notes) –Syntax (citation order) –Semantic factoring Organize terms –Semantic relationships

Extracting Terminology

IMT530- Organization of Information Resources5 Sources and Origins of Terminology Where do you get terms for a controlled vocabulary? Sources and origins of terminology may come from explicit statements of warrant Making a conscious decision about warrant demonstrates that as a CV designer you are aware of the different possibilities and have made considered choices

IMT530- Organization of Information Resources6 Warrant Warrant is “the authority that is used to justify decisions about what is included in a system,” (Clare Beghtol) Types of warrant: –Literary warrant –User warrant –Scholarly warrant –Cultural warrant (Beghtol, 2002)

IMT530- Organization of Information Resources7 Literary & User warrant Literary Warrant –terms or organization reflect or are taken directly from resources themselves; this includes dictionaries, encyclopedias, etc. on a topic User (aka Use, Enquiry) Warrant –terms or organization reflect use; user terminology may (or may not) be taken directly from logs of system use or from personal interactions with users

IMT530- Organization of Information Resources8 Scholarly & Cultural Warrant Scholarly Warrant –terms or organization reflect the opinions of a panel of human experts Cultural Warrant –terms or organization derived from cultural practice or understanding; for example, Dewey and LCSH reflect American/Western cultural bias; Colon Classification reflects Indian/Eastern cultural bias (this also can be partly a function of literary warrant…)

Term Control

IMT530- Organization of Information Resources10 Term control –Specificity (cats or Siamese cats?) –Control of homographs (qualifications) –Term consistency and word form (plurals, etc.) –Multiword/phrase sequence and form (inverted, normal form?) –Term definitions (scope notes) –Syntax (citation order) –Semantic factoring

IMT530- Organization of Information Resources11 Specificity Depends on user needs and time available Should be consistent throughout CV to avoid user confusion May be influenced by choice of approach –If faceted some facets may be more specific than others –If hierarchical you should be consistent throughout

IMT530- Organization of Information Resources12 Homographs Sometimes a single word or phrase has multiple meanings: e.g., “power”, “drum”, “Java”, “Jupiter” Controlled vocabularies “disambiguate” these terms to make each term have a single meaning –In thesauri & subject heading lists, parenthetical qualifiers are added, e.g. these LCSH terms “Power (Mechanics)”; “Power (Christian theology)”; “Power (Social Sciences)”; Power (Philosophy)” –In taxonomies and classifications, the meaning of homographs is contextualized by placement in a particular hierarchy (following the example above, Power will appear in the Philosophy, Christianity, Social Sciences, and Mechanics hierarchies and the terms themselves, by virtue of their location (thus, different notation), will be disambiguated)

IMT530- Organization of Information Resources13 Word Form Single word form should be consistent –Choose verbs or nouns –Singular or plural –Standard form Phrases should be standard form –Either direct (Constitutional government) –Or inverted (government, constitutional) Allows closer grouping of like terms in alphabetic display- not used much anymore

IMT530- Organization of Information Resources14 Scope Notes Scope notes are term definitions in a thesaurus or controlled vocabulary Scope notes are useful for indexers to let them know what the precise meaning of the term is; and for users to help them know if they are searching on the correct term

IMT530- Organization of Information Resources15 Syntax Syntax describes how terms are built (especially, how multiple concepts may be combined), and citation order (order of facets) –Syntax is an issue when concepts are pre- coordinated in an indexing term (whether the syntax is consistent or not) –Syntax is an issue for CVs that use synthesis with facets in that rules for synthesis (also called citation order in classification schemes) determine term syntax

IMT530- Organization of Information Resources16 Semantic Factoring “The process of analyzing some or all of the categories of an ontology into a collection of primitives” Sowa, J. F. (2003). Ontology. Glossary. Essentially, you are trying to decompose terms into their elemental concepts, to minimize duplication and maximize reuse –For example: ship = vehicle+water transport –Not always possible, especially with non-concrete concepts “Creating a thesaurus without doing semantic factoring is like trying to put together furniture from Ikea without following the instructions. You will get interesting configurations, but you will not save time.” Ezzo, J. (2005) Bella and Yakov and Tillie's Panties: What I Learned in “Construction and Maintenance of Indexing Languages and Thesauri” Bulletin of the American Society for Information Science and Technology 31(4) April/May

Relationships in CVs

IMT530- Organization of Information Resources18 Relationships in Controlled Vocabularies There are three major types of relationships between subject concepts –Equivalence Relationships –Hierarchical Relationships –Associative Relationships

IMT530- Organization of Information Resources19 Equivalence Relationships In natural language one word or phrase can refer to one or more concepts; and multiple terms can refer to a single concept In other words, there is no one-to-one correspondence between words/phrases and concepts

IMT530- Organization of Information Resources20 Preferred Terms and Cross references (Synonyms) Controlled vocabularies create one-to- one relationships between synonyms – multiple words or phrases that share similar meaning To do this we: –Select Preferred term (descriptor, subject heading) –Create cross references from non-preferred terms (entry vocabulary, lead-in terms)

IMT530- Organization of Information Resources21 Example Equivalence Display Sample display for descriptor (preferred term) “Creativity” from the ERIC Thesaurus: Creativity UF Creative ability Originality If you searched on “Originality” or “Creative ability” in the ERIC database, you would see these references: –“Creative ability” see “Creativity” OR –“Originality” use “Creativity” In other words, you would be led from the unused (lead-in) terms to the used (preferred) term.

IMT530- Organization of Information Resources22 Equivalence Relationships - Summary Exist between words or phrases that share the same (or similar) meaning Equivalent terms are considered synonymous (whether they actually are or are not) When controlling vocabulary, one equivalent term is selected as a preferred term (e.g., descriptor); the other equivalent terms are treated as “lead in” terms or cross references References used in the CV to show equivalence relationships include: “UF” (use for); and “Use” “See”; and “Search under”

IMT530- Organization of Information Resources23 Hierarchical Relationships Hierarchical Relationships: –May be strictly defined as: Genus-species (also called class inclusion or “is-a”) relationships Whole-part relationships (sometimes these are treated as associative relationships)

IMT530- Organization of Information Resources24 Hierarchical Relationships Hierarchical Relationships: –May be illustrated by set notation: Set G (green) is a subset of Set B (blue) –All Gs are also Bs (in other words, a G is a B) –Using a real-world analogy, if Gs are gorillas, and Bs are animals, all gorillas are animals

IMT530- Organization of Information Resources25 Ideal CV Hierarchical Relationships Ideally, all hierarchical relationships indicated in a controlled vocabulary are also controlled and defined as genus- species (and sometimes also whole- part) relationships ALL other relationships between terms are associative relationships In real life CVs, this is not always the case!

IMT530- Organization of Information Resources26 References for Hierarchical Relationships Hierarchically related terms are shown by the BT (broader term), NT (narrower term), and sometimes See also/Search also references. Examples of two entries in the ERIC thesaurus: Creativity BT Psychological characteristics Psychological characteristics NT Creativity Intelligence Cognitive style

IMT530- Organization of Information Resources27 BTs & NTs In the previous slide, both Creativity and Psychological characteristics are preferred terms Each has its own display; the Creativity display (Creativity as a preferred term display) shows the reference to the broader, preferred term “Psychological characteristics”

IMT530- Organization of Information Resources28 Testing for Hierarchical Relationships To test for a hierarchical relationship between terms, use the ‘is-a’ test. The relationship between “robin” and “bird”? (A robin is a (type of) bird, so the relationship is hierarchical; Bird is the broader term, Robin is the narrower) The relationship between Water and Hydronomy? (Water is not a hydronomy or a type of hydronomy; Hydronomy is not a water or a type of water; so the relationship here is an associative relationship)

IMT530- Organization of Information Resources29 Examples of Hierarchical Relationships What is the relationship between these sets of terms? –books and library materials –water and floods –buildings and chimneys –painting and acrylic paints –water and groundwater

IMT530- Organization of Information Resources30 Answers Books and Library materials (hierarchical) Water and floods (associative because a flood is not the same type of thing as water--one way you can tell is that one is a count noun, and the other is not--but maybe hierarchical is ok depending on context) Buildings and chimneys (hierarchical if you include whole-part relationships; associative if you don’t) Painting and acrylic paints (associative) Water and ground water (hierarchical)

IMT530- Organization of Information Resources31 More on Hierarchical Relationships A characteristic of the hierarchical relationship between terms that are strictly hierarchically related (genus-species only, not whole part) is Hierarchical Force When a narrower term is hierarchically related to a broader term, the narrower terms (NT) inherits all of the characteristics of the terms above it in a hierarchy

IMT530- Organization of Information Resources32 Associative Relationships Include all relationships not encompassed by equivalence and hierarchical relationships In Controlled Vocabularies, these relationships are shown by the following references: –Related Term (RT), see also (SA) Examples of types of associative relationships (there are many of these!): –Thing and property (rubber, elasticity) –Complementary activities (teaching, learning) –Agent and activity (artist, painting)

IMT530- Organization of Information Resources33 Associative Relationships Many of these are semantic relationships Some of these are syntactic relationships too: –Children see related term Games Problems – when to stop? How close in meaning or syntactic relation do two terms have to be to show them in a CV? Note: associative relationships are rarely shown in classifications & taxonomies

IMT530- Organization of Information Resources34 Example Associative Relationship Display From the ERIC thesaurus: Comprehension RT Concept formation Misconceptions Scientific literacy Thinking skills Again, remember that both Comprehension and all of the RTs are preferred terms; however, this is the display for the preferred term Comprehension

IMT530- Organization of Information Resources35 Some Guidelines Does the taxonomy cover the domain appropriately? Is it within scope? Do draft definitions for concepts express them clearly? Are duplicate concepts removed? Are basic-level concepts represented? Does extracted terminology express them? Is the structure useful and sensible?

IMT530- Organization of Information Resources36 Questions? If not, take a break!!!

IMT530- Organization of Information Resources37 Exercise 7b Take your term lists from last week, and use those in Exercise 7b to begin building a controlled vocabulary Turn in your initial controlled vocabularies before Tuesday via