CS336 Lecture 8: Indexing Languages. File organizations or indexes are used to increase performance of system –Inverted files, signature files, bitmaps.

Slides:



Advertisements
Similar presentations
Support.ebsco.com Nursing Reference Center Tutorial.
Advertisements

Lecture Outline 1. Function of Introduction 2. Length 3. Parts 4. Examples How to write the introduction.
Researching Efficiently and Cost Effectively on Lexis Advance™ and Lexis.com 1.
Taxonomies of Knowledge: Building a Corporate Taxonomy Wendi Pohs, Iris Associates
are viruses that can be transmitted to man by arthropod vectors. Humans are usually not the natural reservoir for the virus.
Discovering the World of Viral Hemorrhagic Fevers By Christy Leaman.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Engineering Village ™ ® Basic Searching On Compendex ®
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Thesaurus Design and Development
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
CS336: Intelligent Information Retrieval Why is Information Retrieval difficult?
The Wharton School of the University of Pennsylvania OPIM 101 2/16/19981 The Information Retrieval Problem n The IR problem is very hard n Why? Many reasons,
Information Retrieval
Indexing Overview Approaches to indexing Automatic indexing Information extraction.
DR. M MOHAMMED ARIF. ASSOCIATE PROFESSOR. CONSULTANT VIROLOGIST. HEAD OF THE VIROLOGY UNIT. Arboviruses.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Life Science MCA Power Point Review **RESEARCH: With your partner, do the necessary research to complete the topic you were assigned & fill in your topic's.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
32.1 The Science of Epidemiology
Modern Information Retrieval Computer engineering department Fall 2005.
Medical Coding Chapter 4.
What do the following have in common?.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
A rbovirus means “arthropod-borne Transmitted by: * Arthropodes: female mosquito's and female ticks * Mother tick transmit virus.
Librarians vs. Automation Carolyn Weber Lucio Campanelli Will Hohyon Ryu.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
The Developments and Applications of Disease-based Statistics in Taiwan Pi-Joen Lee Statistics Office Department of Health Oecd/Korea.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Health Research in Thailand: A Gap Analysis Krit Pongpirul, MD. International Health Policy Program (IHPP-Thailand)
Searching PubMed® TTUHSC Preston Smith Library presents Rev. 04/03/13.
By Abhinay Sharma Bhugoo EBOLA VIRUS.  Family Filoviridae  Genus Ebolavirus  History  First emerged in 1976  Ebola River Valley, Africa  Sub-types.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
West Nile Virus Kimberly Signs, DVM Michigan Department of Community Health.
A PRESENTATION BY: DETTA MOHAMAD ALNAAL JAMES BURGESS BUROOJ MUSHTAQ ANIMAN RANDHAWA Assignment 2: Ebola.
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
Pathogen Epidemic & Pandemic Test Review. 1) Compare & Contrast Disease Pathogens Please use notes, book, info to complete chart BacteriaVirusParasiteFungus.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
CATEGORY: VACCINES & THERAPEUTICS HIV-1 Vaccines Shokouh Makvandi-Nejad, University of Oxford, UK HIV-1 Vaccines © The copyright for this work resides.
FIND IT! USING LIBRARY CATALOGING CONCEPTS TO ORGANIZE AND MAKE RECORDS FINDABLE DIONNE L. MACK, INTERIM DIRECTOR OF QUALITY OF LIFE DEPARTMENTS.
Slide 1 Copyright © 2014 by Saunders, an imprint of Elsevier Inc. CHAPTER 2 AN OVERVIEW OF ICD-10-CM.
Warm Up March 2 nd, )Viruses are non-_______. They also will attack and use other organisms to reproduce. What good could a virus do? 2)What is a.
Demonstration of Cross-Protective Vaccine Immunity against an Emerging Pathogenic Ebolavirus Species May 20, 2010 PLoS Pathogens Volume 6 Issue 5.
GUIDE. P UB M ED
African horse sickness Japanese encephalitis African swine fever
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
How to Search PubMed® TTUHSC Preston Smith Library presents
Table of Contents: Part B
CHAPTER 4 USING ICD-9-CM SXS11ierPPT-INTC04_P1.
Multimedia Information Retrieval
Information Retrieval
CS 430: Information Discovery
Review Key Teaching Points
IL Step 3: Using Bibliographic Databases
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
How to Search in PubMed and ESGO Journal
PubMed.
Information Retrieval and Web Design
Recuperação de Informação
Table of Contents – Part B
Presentation transcript:

CS336 Lecture 8: Indexing Languages

File organizations or indexes are used to increase performance of system –Inverted files, signature files, bitmaps Text indexing is the process of deciding what terms will be used to represent a given document index terms are then used to build indexes for the documents A retrieval model describes how the indexed terms are incorporated into a model –Relationship between retrieval model and indexing model

Generating Document Representations Want to use significant terms to build representations Manual indexing: professional indexers –Manually assign terms from a controlled vocabulary –Typically phrases Automatic indexing: machine selects –Terms can be single words, phrases, or other features from the text of documents –Takes ~ 1 hour to index 10 GB

Index Languages Language used to describe docs and queries Exhaustivity: number of different topics indexed, completeness or breadth –increased exhaustivity => higher recall/ lower precision Specificity - accuracy of indexing, detail –increased specificity => higher precision/lower recall Pre-coordinate indexing –combinations of terms (e.g. phrases) used as an indexing label Post-coordinate indexing –combinations generated at search time –Most common

The Trade Off 0.5 Recall Precis i on 0.5 Broad terms Narrow terms Students want high precision: narrow terms. Lawyers want high recall: broad terms. For unknown population use terms in the middle

MeSH Medical Subject Headings Faceted classification:

Disadvantages of Manual Indexing Human effort considerable Controlled vocabulary per collection Subjective –intersection between indexers is only about 40% –But … –Human experts that use indexing aids describing allowable vocabulary and usage (e.g. “scope notes”) achieve good indexing uniformity

Development of Automatic Methods 60’s: search services relied on manual approaches –automatic methods were sometimes an add-on –focus remained the use of intermediaries (specialists) –strong belief that manual must be better than natural language What caused focus to shift? –sheer volume of text: very costly to maintain vocabulary and indexing –full text of documents became more readily available … less reliance on abstracts and titles –computing power and access increased –The Web! Encouraged direct searching by user reduced dependence on professional searchers

Which is better? Salton - claims result of automatic comparable to manual –Based on small databases Can depend upon task and environment Experiments have shown that using both manual and automatic improves performance –“combination of evidence” Typically, manual indexing not a practical option Why?

Automatic Indexing with Full Text more flexible: no decisions about doc content are made at the time of indexing –no a priori assumptions about future search needs –indexing effort not devoted to docs outside search scope –document left open to a variety of index descriptions post-coordination indexing lets user define representation but, no effort given to explain document content –pressures user to think more carefully about search –pressures system designer to develop tools to aid user

Manual vs Automatic Indexing

MeSH Medical Subject Headings Faceted classification:

Category C. Diseases C1. Bacterial Infections and Mycoses C2. Virus Diseases C3. Parasitic Diseases C4. Neoplasms C5. Musculoskeletal Diseases C6. Digestive System Diseases C7. Stomatognathic Diseases C8. Respiratory Tract Diseases C9. Otorhinolaryngologic Diseases C10. Nervous System Diseases C11. Eye Diseases C12. Urologic and Male Genital Diseases C13. Female Genital Diseases and Pregnancy Complications C14. Cardiovascular Diseases C15. Hemic and Lymphatic Diseases C16. Neonatal Diseases and Abnormalities C17. Skin and Connective Tissue Diseases C18. Nutritional and Metabolic Diseases C19. Endocrine Diseases C20. Immunologic Diseases C21. Injuries, Poisonings, and Occupational Diseases C22. Animal Diseases C23. Symptoms and General Pathology Category C2. Virus Diseases Arbovirus Infections African Horse Sickness Bluetongue Dengue Dengue Hemorrhagic Fever Encephalitis, Epidemic Encephalitis, California Encephalitis, Japanese Encephalitis, St. Louis Encephalitis, Tick-Borne West Nile Fever Encephalomyelitis, Equine Encephalomyelitis, Venezuelan Equine Phlebotomus Fever Rift Valley Fever Tick-Borne Diseases African Swine Fever Colorado Tick Fever Encephalitis, Tick-Borne Hemorrhagic Fever, Crimean Hemorrhagic Fever, Omsk Kyasanur Forest Disease Nairobi Sheep Disease West Nile Fever Yellow Fever

Example “Ebola” document Nat Med 1998 Jan;4(1):37-42 Immunization for Ebola virus infection. Xu L, Sanchez A, Yang Z, Zaki SR, Nabel EG, Nichol ST, Nabel GJ Department of Biological Chemistry, University of Michigan Medical Center, Ann Arbor , USA. Infection by Ebola virus causes rapidly progressive, often fatal, symptoms of fever, hemorrhage and hypotension. Previous attempts to elicit protective immunity for this disease have not met with success. We report here that protection against the lethal effects of Ebola virus can be achieved in an animal model by immunizing with plasmids encoding viral proteins. We analyzed immune responses to the viral nucleoprotein (NP) and the secreted or transmembrane forms of the glycoprotein (sGP or GP) and their ability to protect against infection in a guinea pig infection model analogous to the human disease. Protection was achieved and correlated with antibody titer and antigen-specific T-cell responses to sGP or GP. Immunity to Ebola virus can therefore be developed through genetic vaccination and may facilitate efforts to limit the spread of this disease.

Indexing If you were to look for documents about immunization against the Ebola virus, what might your query look like?

Example “Ebola” document Nat Med 1998 Jan;4(1):37-42 Immunization for Ebola virus infection. Xu L, Sanchez A, Yang Z, Zaki SR, Nabel EG, Nichol ST, Nabel GJ Department of Biological Chemistry, University of Michigan Medical Center, Ann Arbor , USA. Infection by Ebola virus causes rapidly progressive, often fatal, symptoms of fever, hemorrhage and hypotension. Previous attempts to elicit protective immunity for this disease have not met with success. We report here that protection against the lethal effects of Ebola virus can be achieved in an animal model by immunizing with plasmids encoding viral proteins. We analyzed immune responses to the viral nucleoprotein (NP) and the secreted or transmembrane forms of the glycoprotein (sGP or GP) and their ability to protect against infection in a guinea pig infection model analogous to the human disease. Protection was achieved and correlated with antibody titer and antigen-specific T-cell responses to sGP or GP. Immunity to Ebola virus can therefore be developed through genetic vaccination and may facilitate efforts to limit the spread of this disease. MH - Animal MH - Antibody Formation MH - Disease Models, Animal MH - Ebola Virus/*immunology MH - Female MH - Guinea Pigs MH - Hemorrhagic Fever, Ebola/*immunology/*prevention & control MH - Human MH - Male MH - Mice MH - Mice, Inbred BALB C MH - Nucleocapsid Proteins/immunology MH - Plasmids MH - T-Lymphocytes/immunology MH - Transfection MH - *Vaccines, DNA MH - Viral Proteins/biosynthesis/immunology MH - *Viral Vaccines