Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting 24-25 September 2009.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst Information Semantics Command & Control Center July 17, 2007 Ontologies Can't Help Records Management Or Can They?
Advertisements

Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006 Marjorie M.K. Hlava President
Database Searching: How to Find Journal Articles? START.
Searching for Medicines Information New Zealand College of Pharmacists.
Advanced Searching Engineering Village.
Overview of PubWEST Patent and Trademark Depository Library Training Seminar April 2006.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
Engineering Village ™ ® Basic Searching On Compendex ®
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
WMES3103 : INFORMATION RETRIEVAL
Learn how to search for information the smart way Choose your own adventure!
Coolheads Consulting Copyright © 2003 Coolheads Consulting The Internal Revenue Service Tax Map Michel Biezunski Coolheads Consulting New York City, USA.
Thesaurus Design and Development
© Tefko Saracevic1 Search strategy & tactics Governed by effectiveness&feedback.
Using Digital Resources In or Out of a Library. Initial Search First decide what your topic is. Be sure that the topic is neither too broad, nor too narrow.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Jump to first page Information Management Process Information adapted from Prince William County Information Management Manual.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
International Atomic Energy Agency INIS : International Nuclear Information System Yves Turgeon Head, INIS Unit International Atomic Energy Agency.
Vocabulary & languages in searching
How do I know the differences and uses of keyword versus subject searching in a database?
MS 640: Introduction to Biomedical Information Medical Professionalism Finding Information Using Alumni Medical Library Resources.
EuroVoc, Eurlex, EU Bookshop Danica Maleková, Publications Office STS Bratislava, 22 October 2010.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Searching Databases. What is in the Library? The Online Library has thousands of journal articles and electronic books available for your use. Also available.
1 4. Content Organization In this chapter you will learn about: Organizational schemes: classification systems for organizing content into groups Organizational.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Beyond Search Engines: Advanced Web Searching Subject Directories  Librarians’ Index to the Internet  Infomine Finding Databases on a Subject  The Invisible.
Medline on OvidSP. Medline Facts Extensive MeSH thesaurus structure with many synonyms used in mapping and multidatabase searching with Embase Thesaurus.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Current Events and Issues Using Index Databases for Finding Answers.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Thesauri usage in information retrieval systems: example of LISTA and ERIC database thesaurus Kristina Feldvari Departmant of Information Sciences, Faculty.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
1. 2 Content The Historisches Wörterbuch der Rhetorik [Historical Dictionary of Rhetoric] is the only comprehensive academic reference work in the field.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
Subject Access to Your Information Sandy Tucker Texas A&M University Libraries August 1, 2006 Second International Symposium on Transportation Technology.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
Expanding the Notion of Links DeRose, S.J. Expanding the Notion of Links. In Proceedings of Hypertext ‘89 (Nov. 5-8, Pittsburgh, PA). ACM, New York, 1989,
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
The Agricultural Ontology Server (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Food and Agriculture Organization.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
GUIDE. P UB M ED
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Using computers to search electronic databases
Multimedia Information Retrieval
Search Techniques and Advanced tools for Researchers
CAB Abstracts, Medline & Zoological Record
IL Step 3: Using Bibliographic Databases
Data Model.
Introduction to Information Retrieval
PubMed.
THESAURUS CONSTRUCTION: GROUND WATER
Presentation transcript:

Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting September 2009

What it is … Why to use it … How to use it … What else.. Languages are bridges … not barriers

Is there any place left for thesauri in this new information retrieval environment? What

for sure there is a place for thesauri but they must change in order to continue to be of value. A true thesaurus has equivalence relationships but it also supports other kinds of relationship and provides navigation assistance by means of scope notes and other aids. What

A thesaurus suggest other ways of expressing an idea which is already in the user's mind and remind the user of related ideas that might be valuable in searching. What

It’s useful recounts some classic moments of indexation because the documents are changing rapidly, because the habit of making the same things and leads to repetitive behavior and not considered, because the thesaurus is to be used as a thesaurus ! What

it must be remembered that, though a thesaurus appears to be made up of a natural language terms, it is an artificial language, a controlled vocabulary with a limited number of descriptors the meaning of each being understood through the: –context provided by the descriptors as a whole in a bibliographical context (as VET bib) these information provided by the whole system of descriptors are also helped by –the title of the document –the abstract of the document What

Is not –a dictionary which contains definitions and pronunciations. Unlike a dictionary, a thesaurus entry does not define words. –a glossary which contains explanations of concepts relevant to a certain field of study or action. –a lexicon because the lexicon of a language is its vocabulary, including its words and expressions. –a vocabulary which is the set of words they are familiar with in a language. A vocabulary usually grows and evolves with age, and serves as a useful and fundamental tool for communication and acquiring knowledge. What

The thesaurus is a thesaurus What

The thesaurus is a thesaurus With his propre Hierarchical relationships that are used to indicate terms which are narrower and broader in scope. A "Broader Term" (BT) is a more general term, e.g. “Apparatus” is a generalization of “Computers”. Reciprocally, a Narrower Term (NT) is a more specific term, e.g. “Digital Computer” is a specialization of “Computer”. BT and NT are reciprocals; a broader term necessarily implies at least one other term which is narrower. BT and NT are used to indicate class relationships, as well as part-whole relationships. What

With his propre Equivalency relationship that are used primarily to connect synonyms and near-synonyms. Use (USE) and Used For (UF) indicators are used when an authorized term is to be used for another, unauthorized, term. Reciprocally, the entry for the unauthorized term would have a indicator "USE". Unauthorized terms are often called "entry vocabulary", "entry points", "lead-in terms", or "non-preferred terms", pointing to the authorized term (also referred to as the Preferred Term or Descriptor) that has been chosen to stand for the concept. The thesaurus is a thesaurus What

The thesaurus is a thesaurus With his propre Associative relationships that are used to connect two related terms whose relationship is neither hierarchical nor equivalent. This relationship is described by the indicator "Related Term" (RT). Associative relationships should be applied with caution, since excessive use of RT will reduce specificity in searches. Consider the following: if the typical user is searching with term "A", would they also want resources tagged with term "B"? If the answer is no, then an associative relationship should not be established. What

To translate the concept you are looking for into key- words Multilingualism and standardisation are the main advantages of this powerful indexing tool covering the fields of VET The thesaurus is an operational tool used to retrieve documents according to their semantic content Thesaurus must be delivered to users to identify their information needs Thesaurus provides a conceptual framework for understanding reality through graphic presentations that preserve the specificity It presents in an unambiguous way the conceptual content of documents. Why

A thesaurus is fit for the digital environment to show his versatility Is open to the interoperability information because the thesaurus context is not only an operating environment but an organizational criterion It can be integrated with other tools of information retrieval Why

research in systems of unstructured information → web Why

ETT is used to index and represent the content of a document. It is mostly used by documentalists and librarians to identify the concepts laid down in the text and to represent them by attributing keywords from the thesaurus. This operation enables extracting the relevant records from a collection of bibliographic references or from a full-text documentary database to answer the user’s query. End-users can combine ETT descriptors in order to represent their search query. The indexation through ETT enables all documents on the same subject to be retrieved through a single query. Why

ETT is useful for taxonomy and semantic web applications. The main role of a thesaurus is to standardise the indexing process in order to make searches simpler, more efficient and consistent regardless of the language of the query. It is a multilingual conceptual thesaurus which strives to satisfy both the Community and national needs on a wide range of subjects. Each descriptor is related to one concept in each of the languages. Why

Another interesting option offered by ETT is the possibility for users to ask questions in one language and retrieve the answers in different languages and this Google doesn’t do, or not yet !! Why

Is only a term Why In this case the descriptor ‘transparency of qualifications’ represents a precise concept and can be able to retries many web pages, not necessarily documents, that have the descriptor in the exact form in the text

Why In this case ‘transparency of qualifications’ is more than a descriptor: is a concept. We can find documents relating to the subject even if: 1. the term is not within the text 2. the document is in a different language.

ETT is also used in Cedefop website for automatic categorisation or classification of documents in websites and in Library’s reference desk to categorize user’s questions. A simple click enables crosslingual information access to the translation of a descriptor or of the complete semantic chain of a descriptor. These advanced options open the door to many cross- lingual applications, such as calculating document similarity across languages. Why

Indexing with the ETT’s update version … knowing how something is stored makes finding it easier How

Hierarchical presentation KWICKWIC index Alphabetical presentation with semantic relation How

The main, word-by-word alphabetical display the most familiar since it provides a variety of information for each descriptor. The term’s main entry in the alphabetical display shows the appropriate coordination. This includes a SN, a BT and NT, USE and UF relations, RT But be careful … this approach is easy to understand but non so easy for end-user for example the fact that BT and NT mean that two terms are related hierarchically is obvious only to specialists ! How

Showing to the users hierarchical structures is a useful mechanism for query expansion also because … - users with varying levels of domain knowledge make use of thesauri in different ways - thesauri are capable of providing end-users with additional, useful terms for query formulation and expansion How

A KWIC index is formed by sorting and aligning the words within an article title to allow each word (except the stop words) in titles to be searchable alphabetically in the index. It was a useful indexing method for technical manuals before computerized full text search became common. The term permuted index is another name for a KWIC index, referring to the fact that it indexes all cyclic permutations of the headings. A permutation is called a cyclic permutation if and only if it will be constructed with exactly 1 cycle A cyclic permutation is built from one or more sets of elements in cyclic order.

Indexing with the ETT’s update version New 465 descriptors = have added to the thesaurus since 2008 edition so you can not search previous literature using these descriptors Oldest literature on topics represented by these terms is searchable using related descriptors. How

415 Deleted descriptors = are non longer used in indexing but they may be used for searching data base entries prior to ETT’s 2008 edition More recent literature on topics represented by these terms is searchable using related descriptors. How Indexing with the ETT’s update version

How can I add the new descriptors using VET det ? 1) introduce the new descriptors (p of ETT printed version) in the field notes preceding of the word, NEWDESCRIPTOR, and separating these with commas. i.e. Notes field: NEWDESCRIPTOR certification of learning outcomes, key competences –If the new descriptor is a main descriptor NEWMAINDESCRIPTOR at the beginning 2) not to introduce the deleted descriptors (p of ETT printed version) How

Fundamental, basic, classic indexing rules really important because VEt BIB contains records!!! Index ONLY what is in the document and Index at the LEVEL of specificity of the document 1.Statements or assumptions are not indexed How

Fundamental indexing rules 2. Very general descriptors are not used unless the document covers a topic very broadly 3. Main descriptor cover the main focus or subject of a document 4. Other descriptors indicate less important aspects within the document How

Fundamental indexing rules 5. ETT avoids ‘indexing up’ to a broader descriptor when an appropriate more specific exists How

Fundamental indexing rules How

Fundamental indexing rules Indexing is complementary to information found in other parts of the document (mainly title and abstract) How

The number of the descriptors should be proportioned with the number of pages How Fundamental indexing rules

How Fundamental indexing rules

“Indexable” concepts are translated into descriptors using the thesaurus helps maintain consistency and prevents proliferation of concepts How Fundamental indexing rules

Thus a single descriptor may be imprecise even ambiguous while the greater the number of descriptors used together the greater the precision Fundamental indexing rules How

This world precision is used in a technical sense to mean the ratio of relevant to irrelevant documents in a retrieved set Fundamental indexing rules How

The word recall is used to mean the ratio of relevant documents retrieved to those wich are relevant and not retrieved How Fundamental indexing rules

… for the future Permitting the searcher to switch between navigating the thesaurus and searching the database can only improve access an obvious way in which a thesaurus can be applied directly in retrieval is to use the relationship as a means of expanding the search. Research, however, has shown that these relationship must be used with caution (precision/recall) What else …

… for the future In general, expanding a search to include the narrower terms tends to improve recall without great sacrifice in precision. Expanding to include broader or related terms while does improve recall typically has a significant negative impact on precision. What else …

… for the future How is it possible to remain positive about the need for continued use of thesauri ? Because only a thesaurus can become the basis of a more extensive semantic network that provide information not just on what terms are used in indexing but on how they are used within the system. What else …