LIS 6771 Indexing with a Controlled Vocabulary Basic Concepts.

Slides:



Advertisements
Similar presentations
LIS618 lecture 2 Thomas Krichel Structure Theory: information retrieval performance Practice: more advanced dialog.
Advertisements

Requirements gathering
What is a Database By: Cristian Dubon.
Software Requirements
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
CSE3201/CSE4500 Information Retrieval Systems Introduction to Information Retrieval.
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
1 CS 430: Information Discovery Lecture 3 Inverted Files and Boolean Operations.
Designing a Database Unleashing the Power of Relational Database Design.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Introduction to databases from a bioinformatics perspective Misha Taylor.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Overview of Search Engines
Databases Ms. Scales. What is a Database? Database  A collection of data organized for fast search and retrieval  Examples: Telephone Directories Hospital.
CORE 2: Information systems and Databases STORAGE & RETRIEVAL 2 : SEARCHING, SELECTING & SORTING.
Test Taking Tips How to help yourself with multiple choice and short answer questions for reading selections A. Caldwell.
International Atomic Energy Agency INIS Training Seminar Principles of Information Retrieval and Query Formulation 07 – 11 October 2013 Vienna, Austria.
CEDROM-SNi’s DITA- based Project From Analysis to Delivery By France Baril Documentation Architect.
INTRODUCTION TO DATABASE USING MS ACCESS 2013 PART 2 NOVEMBER 4, 2014.
Why classification matters The foundations of bibliographic classification.
IT 244 Database Management System Data Modeling 1 Ref: A First Course in Database System Jeffrey D Ullman & Jennifer Widom.
CHAPTER 9 DATABASE MANAGEMENT © Prepared By: Razif Razali.
ระบบฐานข้อมูลขั้นสูง (Advanced Database Systems) Lecturer AJ. Suwan Janin Phone:
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
2002 October 10SFWR ENG 4G030 Translating from English into Mathematics SFWR ENG 4G Robert L. Baber.
1 Advanced Computer Programming Databases. Overview What is a database? Database Basics Database Components Data Models Normalization Database Design.
1 The BT Digital Library A case study in intelligent content management Paul Warren
 A databases is a collection of data organized to make it easy to search and easy to retrieve in a useful, usable form.
LIS618 lecture 4 before searching + introduction to dialog Thomas Krichel
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Prof. Sujata Rao Introduction to Computers & MIS Data Base Concepts Lesson 6.
CountryData Technologies for Data Exchange SDMX Information Model: An Introduction.
SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Tutorial 13 Validating Documents with Schemas
Chapter 1 Introduction Major Data Structures in Compiler
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Information Retrieval
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
1 Chapter 2 Database Environment Pearson Education © 2009.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Query Methods Simple SQL Statements Start ….
System Software Unit-1 (Language Processors) A TOY Compiler
A Simple Syntax-Directed Translator
Text Based Information Retrieval
DATA MODELS.
Lesson 6: Databases and Web Search Engines
Multimedia Information Retrieval
What is a Database and Why Use One?
Search Techniques and Advanced tools for Researchers
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
Advanced search techniques in databases
SDMX Information Model: An Introduction
Lesson 6: Databases and Web Search Engines
Introduction to Information Retrieval
IT 244 Database Management System
Tips For Effective Research
Presentation transcript:

LIS 6771 Indexing with a Controlled Vocabulary Basic Concepts

LIS 6772 Indexing: Topics Covered The “concept triangle”concept triangle The five-axiom theory of indexingfive-axiom theory of indexing The indexing processindexing process

LIS 6773 The “Concept Triangle” Referent Concept Expression

LIS 6774 The Referent “The referent is everything about which a meaningful statement can be made.” For example, about a certain table many statements can be made concerning the material of which it is made, its price, purpose, producer, weight, the structure of its surface, etc.

LIS 6775 The Concept “We define the concept as the sum of the essential statements that can be made about a referent.” Essential statements are those which contribute to the characterization of the referent itself. Inessential statements are those which do not contribute to the characterization of the referent itself.

LIS 6776 Kinds of Concepts General concepts The general concept describes a class of interrelated referents. For example: metal, oxidation, information Individual concepts The individual concept is one to which no meaningful conceptual feature can be added. For example: Albert Einstein; Fritz the Cat.

LIS 6777 General vs. Individual Concepts in Indexing “It is the task of subject indexes to provide access to documents or text passages relevant to general concepts.” “An information system which works quite well for individual concepts, may totally fail when it is required to manage general concepts too.”

LIS 6778 The Mode of Expression Lexical expressions linear strings of characters commonly agreed upon to express concepts or concept connections Non-lexical expressions linear strings of characters by which concepts or concept relations are expressed and upon which no firm agreement has been made

LIS 6779 Forms of Expression & Indexing Lexical expressions require little indexing work Often appear in Identifier fields rather than in Descriptor fields of database records Non-lexical expressions require indexing work non-lexical expressions exhibit ambiguity and multiplicityambiguity and multiplicity

LIS Concepts & Expressions Individual concepts are almost always expressed lexically General concepts are almost always expressed non-lexically In natural, uncontrolled language there is an unlimited multitude of non-lexical, paraphrasing expressions for concepts Multiplicity & ambiguity of natural language expressions are largely restricted to general concepts

LIS Five-Axiom Theory of Indexing Definability Order Sufficient degree of order Representational predictability Representational fidelity

LIS Axiom of Definability The compilation of information relevant to a topic can be delegated (to a skilled specialist or a programmed search mechanism) only to the extent to which the inquirer can define the topic in terms of concepts and concept relations.

LIS Axiom of Order Any compilation of information relevant to a topic is an order-creating process. Order is defined as the meaningful proximity of the parts of a whole at a foreseeable place.

LIS Axiom of Sufficient Degree of Order The demands made on the degree of order increase as the size of the collection and/or the frequency of the searches and/or the specificity of the searches increases.

LIS Axiom of Representational Predictability The completeness of any search for documents relevant to a topic of interest depends on the predictability of the modes of expression for concepts in the search file. Successful searches require a language with predictable modes of expression for concepts.

LIS Axiom of Representational Fidelity The precision of any search for documents relevant to a topic of interest depends on the fidelity with which the modes of expression for concepts can be expressed in the system’s language.

LIS The Indexing Process Step 1: Determine the essence of a document Step 2: Represent this essence with sufficient degrees of predictability and fidelity

LIS Importance of Categories “The predictability of essence selection is markedly enhanced when the indexers have an orientation to conceptual categories.” For example, in some chemistry databases, all descriptors belong to the following categories: MATTER LIVING ENTITY APPARATUS PROCESSS In ERIC, the nine Descriptor Groups serve as categories.

LIS Natural Language Indexing “Natural language expressions, as derived from original texts, can only in the case of individual concepts lead to an information system of adequate quality and survival power.” The specificity of natural language expressions is compromised by their lack of predictability.

LIS Importance of “Cutter’s Rule” Precise and complete searches require that the most specific descriptors that the vocabulary provides be chosen for the indexing of a subject. A query with a specific descriptor must not retrieve concepts that are more general than the search descriptor.

LIS Importance of Syntax In the interests of enhanced representational fidelity any advanced indexing language needs a syntax in addition to its vocabulary. The syntax should represent the manner in which the concepts are connected with each other in the texts to be stored.