Download presentation
Presentation is loading. Please wait.
Published byBlaise Blake Modified over 9 years ago
1
IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have good authority control to enable the users to: –Find known items from the information they have available –Gather all the items of a similar nature together –Choose the right one from among retrieved items Authority control takes time and effort, but pays off in better results for users –Need to balance cost against benefits and make a decision on your approach for each project –Don’t do it halfway, because it’s not worth it
2
Module 5b: Subject Analysis and Indexing IMT530: Organization of Information Resources Winter 2008 Michael Crandall
3
IMT530- Organization of Information Resources3 Module 5b Outline Subject analysis –Definition –Why do this? –Mai’s domain-centered analysis –Consistency Subject indexing –Definition and purpose of subject indexing –Types of subject indexing –Indexing non-text objects –Types of terms used in subject indexing –The subject indexing process
4
IMT530- Organization of Information Resources4 Some Questions Library catalogs often lump fiction into one subject heading– why? Would you describe the subject of “The Organization of Information” to your mother the same way you would to a classmate? Would you use the same subjects to describe Chapter 9 in Taylor that you would to describe the whole book? If you wanted to assign a subject to your kitchen or garage, what would it be? What if you had to describe snow to a Papua New Guinea native? What words would you use? Would they be the same for an Inuit? How do you describe the subject of a picture or film?
5
IMT530- Organization of Information Resources5 Subject Analysis - Definition The process of determining the subject and other content-related attributes of an object The purpose of subject analysis is to come to an understanding of or judgment regarding: –what an object is about, in the context of how it might be used; –what an object exemplifies; –what discipline (or other aspect, including community) an object reflects (for classification)
6
IMT530- Organization of Information Resources6 Why Subject Analysis? One of the primary means of access to information is through “subjects” In order for a computer to access those subjects, there has to be some way to get to them– an index of some kind –Remember Soergel’s model, and the necessity for a means to match user requests to information objects Automatic indexing works for some situations, but not all –As we’ll see, subject concepts are not necessarily contained in words (especially not in images!!) –A specific audience may dictate specific analysis
7
IMT530- Organization of Information Resources7 Wilson on Subjects One of the main purposes of Wilson’s chapter on subjects is to analyze the subject analysis process – to take it apart Starts with the words, then the sentences, then the work itself, and asks questions about how you can elicit descriptions of “aboutness” Wilson suggests four different ways to approach this: –Purposive- why did the author write –Figure-ground: what stands out among all the possible subjects –Objective- count what is most frequently mentioned –Appeal to unity and completeness- what questions are answered within the work Ultimately, he concludes that any extraction will miss some part of the work, and not satisfy some user
8
IMT530- Organization of Information Resources8 Subject Analysis in Context Subject analysis should always be done in context Context considerations include: –user (children, medical practitioners, etc.) –uses (developing egg substitutes, learning how to cook) –the document itself (the “text” of a document, intended audience, uses, etc.) –institution (public library, corporate intranet) –administrative and information systems context
9
IMT530- Organization of Information Resources9 Mai’s Domain-Centered Approach
10
IMT530- Organization of Information Resources10 Relevance Taylor’s stages in development of an information need –The visceral need –The conscious need –The formalized need –The compromised need Relevance is usually measured against the last of these, while ignoring the more complex situational aspects that affect the other states –Mai concludes that evaluation should be less mechanistic (focused on terminology matches) and more humanistic (focused on the visceral needs) –Requires contextual analysis and qualitative research rather than just precision/recall measures
11
IMT530- Organization of Information Resources11 Consistency Taylor points out the difficulty of getting people to assign similar subjects to objects But when controlled vocabularies and rules for selecting subject terms from those vocabularies are used, consistency is much better –Assumes trained subject indexers –Not likely to be the case in most settings other than libraries –Again points out need to determine what your objectives in building a taxonomy are before you make the investment So how do you go about subject indexing?
12
IMT530- Organization of Information Resources12 Definition and Purpose of Subject Indexing Subject indexing is the process or technique of identifying and selecting terms (words, phrases, sentences, taxonomic categories, notation) used in a domain of information to indicate the subject content of a resource for users and to provide subject access Purposes of subject indexing may be seen in light of Cutter’s objects of the catalog: –To facilitate finding a particular object on the basis of its subject content (finding function) –To display to a user all of the objects that exhibit particular subject content (collocating function) –To aid a user in the selection of a particular object (choice function).
13
IMT530- Organization of Information Resources13 Rowley Article Trade off between precision and recall 4 eras in indexing –Era1: Pre-computer access- Title indexing –Era 2: Online age- Cranfield and other retrieval studies showed free indexing worked as well as controlled in abstract databases –Era 3: Full-text vs. subject indexing- shown to complement each other (Taylor also points out the tradeoff between summarization for document retrieval vs. depth indexing for information retrieval) –Era 4: Tests with real users instead of controlled experiments- difficulty in using search interfaces because of complex and varied systems
14
IMT530- Organization of Information Resources14 Types of Subject Indexing: Derived Indexing Derived Indexing: in derived indexing, terms used for indexing are limited to those that actually appear in the document or resource. Derived indexing may be done manually or automatically –Search engine indexes are examples of automatic derived indexing
15
IMT530- Organization of Information Resources15 Assigned Indexing Assigned Indexing: in assigned indexing, terms used for indexing are not limited to those in the object, but may come from the object, the mind of the indexer, or from a controlled vocabulary There are two types of Assigned indexing: Free Indexing and Indexing from controlled vocabularies
16
IMT530- Organization of Information Resources16 Free Indexing In free indexing, the indexer or indexing program is free to assign terms from anywhere inside or outside the object –the indexer may take terms from the object, or use any terms that occur to them –In some “free” indexing settings, very detailed instructions guide indexers in their selection of terms –Other settings are much looser, users can pick any terms that mean something to them or others Pictures (http://flickr.com)http://flickr.com Folksonomies (http://del.icio.us)http://del.icio.us
17
IMT530- Organization of Information Resources17 Controlled Vocabulary Indexing In indexing from controlled vocabularies, indexers are constrained by the terms that are available in lists of terms called “controlled vocabularies” - they must assign one or more terms from the controlled vocabulary. Controlled vocabulary indexing is much like choosing terms from a very large drop-down menu.
18
IMT530- Organization of Information Resources18 Automatic Indexing In automatic indexing, it is common for indexing software applications to use derived indexing techniques only, enhanced with word stemming and spelling algorithms to improve matching However, more advanced programs are being developed that mimic free indexing (e.g., text summarization programs) Some advanced automatic indexing programs (particularly those in medicine) are making use of controlled vocabularies in term selection and identification.
19
IMT530- Organization of Information Resources19 Mai’s Conceptions of Indexing Simplistic conception of indexing –automatic extraction (derived indexing) Document-oriented indexing –focus on document & document parts Content-oriented indexing –focus on content in document (still document oriented) User-oriented indexing –focus on user & possible uses of the document Requirement-oriented indexing –relies on in-depth knowledge of users & uses of documents; complete knowledge of context
20
IMT530- Organization of Information Resources20 Types of Terms Used in Subject Indexing Words or short phrases –descriptors, identifiers, subject headings, or keywords Sentences – derived indexing may use whole sentences, but rarely done – used in some web documents and for derived abstracts –abstracts, summaries, or annotations Taxonomic categories (such as the type used in the Yahoo directory) Notation (such as the type used in the Dewey Decimal Classification)
21
IMT530- Organization of Information Resources21 Sample ERIC Indexing Record PERSONAL AUTHOR: Magnuson,-Sandy; Norem,-Ken TITLE: Challenges for Higher Education Couples in Commuter Marriages: Insights for Couples and Counselors Who Work with Them. PUBLICATION YEAR: 1999 SOURCE (JOURNAL CITATION): Family-Journal:-Counseling-and-Therapy-for-Couples-and-Families; v7 n2 p125-34 Apr 1999 DOCUMENT TYPE: Journal-Articles (080); Reports-Research (143) LANGUAGE: English MAJOR DESCRIPTORS: *Counseling-Techniques; *Dual-Career-Family; *Job- Satisfaction; *Marital-Satisfaction; *Marriage- MINOR DESCRIPTORS: Trust-Psychology MAJOR IDENTIFIERS: *Career-Commitment MINOR IDENTIFIERS: Quality-Time ABSTRACT: Focuses on the experiences of dual-career couples that maintain two homes to attain career satisfaction. Findings include support for the potential strength and satisfaction of commuting relationships. Trust, commitment, regular communication, and quality shared time were endorsed as factors contributing to successful distance marriages. (Author/GCP)
22
IMT530- Organization of Information Resources22 Indexing Non-text Objects Layne discusses the indexing of images and points out some useful distinctions –Defines four general types of attributes Biographical Subject Exemplified Relationship –While she discusses in the context of images, these can prove useful when indexing almost any object
23
IMT530- Organization of Information Resources23 Identification of Concepts Taylor lists several concepts that can be helpful in teasing out subject terms –Topics –Names Persons, corporations, geographic, other –Time periods –Form (genre) http://isotropic.org/papers/chicken.pdf See the appendix in Taylor for an example and checklist
24
IMT530- Organization of Information Resources24 Indexing Policies Many indexers are guided by indexing policies that determine the types of terms that are finally used in indexing Three characteristics of indexing upon which indexing policies may be built: –Exhaustivity –Specific entry (sometimes called “specificity”, but incorrectly) –Coextensivity
25
IMT530- Organization of Information Resources25 ISO 5963 Despite Wilson’s assertion that subject analysis is impossible, a variety of standards exist prescribing how it should be done – the British Standard ISO 5963 in your readings this week is one of them Viewed from Wilson’s or Mai’s perspective (and your own), what are the problems with this standard?
26
IMT530- Organization of Information Resources26
27
IMT530- Organization of Information Resources27 Steps in Free and Assigned Indexing 1.Identify subject content 2.Identify disciplinary context or domain (for classifications or taxonomies) 3.Express or describe content (steps 1-3 describe the subject analysis process) 4.Select or create terms and add them to the document representation 5.If working with a controlled vocabulary (CV), update and maintain the CV based on the indexing experience
28
IMT530- Organization of Information Resources28 Questions? If not, take a break!!!
29
IMT530- Organization of Information Resources29 Exercise 5 Purpose is to try different methods of extracting concepts from an article, so you can see the impact on users Spend the rest of class working through the questions in Exercise 5 We’ll discuss before the end of class
30
IMT530- Organization of Information Resources30 Differences Hopefully, this exercise gave you a chance to see a couple things: –How difficult it can be to actually determine what something is about –How different methods of assigning terms would result in very different access for users We didn’t throw in Mai’s perspective on domain indexing in this exercise, which makes it even more difficult –This is obviously not a simple thing to do well –But you now are aware of the issues, and can keep them in mind when working in this area
31
IMT530- Organization of Information Resources31 Next Week We’ll start looking in more detail at controlled vocabularies and discuss how they might interact with emergent social tagging systems Remember to read assignments BEFORE class Important– your mid-term assignments are due at the start of class next week!!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.