ABOUTNESS: THE HUMAN FACTOR soasis&t Aboutness: Automatic Indexing & Categorization June 21, 2001.

Slides:



Advertisements
Similar presentations
IB Portfolio Tasks 20% of final grade
Advertisements

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Setup Computer Based Training Launch Reg Manager and Log-in Launch Training Manager Create CBT Program Link Training Material (video, document, seminar,
1 Leonard Will Willpower Information Evaluation of HILT 2.
KMF With thanks to Zach Wahl of PPC 1 Taxonomy Definition Definition: A hierarchical organizational structure for the classification of concepts.
© Arjen P. de Vries Arjen P. de Vries Fascinating Relationships between Media and Text.
Conspectus method used for collections mapping and structuring of portals in Czech libraries Bohdana Stoklasová National Library of the Czech Republic.
1 To Tag or Not to Tag!: Should we be structuring our Knowledge Assets? Is Free-Text Search Good Enough Boston KM Forum, March 16, 2006 Lynda Moulton,
Conceptual Clustering
Introducing: wiki.leveragingtechnology.com Keith Crossley; June 2010.
Warren Township Schools. Warren Township combines Cooperative Learning and the use of 21st Century Skills to enhance student achievement.
Taxonomies of Knowledge: Building a Corporate Taxonomy Wendi Pohs, Iris Associates
Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant.
Design of icons for use by Chinese in Mainland China Interacting with computers 9(1998) Yee-Yin Choong, Gavriel Salvendy Report: Yang Kun, Ou.
Merging Taxonomies. Assertion Creation and maintenance of large ontologies will require the capability to merge taxonomies This problem is similar to.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
Ontologies IS 277 Spring Outline n Ontologies n Types of ontologies n Examples n Ontology engineering n Ontology standards n Machine-readable ontologies.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Evolution of NBII Search-Based Technologies Oct 24, 2002 Donna Roy USGS Center for Biological Informatics.
Jump to first page The objective of our final project is to evaluate several supervised learning algorithms for identifying pre-defined classes among web.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
Aardvark Anatomy of a Large-Scale Social Search Engine.
The Cognitive Perspective in Information Science Research Anthony Hughes Kristina Spurgin.
1 4. Content Organization In this chapter you will learn about: Organizational schemes: classification systems for organizing content into groups Organizational.
Runaway. Placements.. The following graphics are designed to help you to navigate through this Computer Based Training. The navigational guides require.
Tag Data and Personalized Information Retrieval 1.
Modern Information Retrieval Computer engineering department Fall 2005.
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
ApplicationsApplications Mills Davis Ana Cristina Garcia Peter Mika Gerti Orthofer Giovanni Sacco Maria A. Wimmer (Moderator)
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
1 1 Best practice template Introduction Prepared for the 6 th Oslo Group meeting in Canberra 2 – 5 May 2011 Elisabeth Isaksen Senior Executive Officer.
A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results Kummamuru et al. Presented by Bei Yu Sept. 22 nd,
DITA packaging diagrams with verbal descriptions in the boxes.
Topical Categorization of Large Collections of Electronic Theses and Dissertations Venkat Srinivasan & Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
Hierarchical Classification
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Catalog Integration R. Agrawal, R. Srikant: WWW-10.
IA Tools to Inform IA Summit 2003 Madonnalisa G. Chan.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Content Presentation: Content Presentation: The content of this lesson will be projected onto a whiteboard and the students will follow along on their.
Concept Mapping: A Graphical System for Understanding the Relationship between Concepts. ERIC Digest.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
WHAT IS MAAGNET?  Maag’s online library catalog  Contains all materials owned by Maag  Books, Journals, Audio/Video, Ebooks, Microforms, Government.
CMPD 434 MULTIMEDIA AUTHORING Chapter 05 Part B Multimedia Authoring Process III.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
An Extension of Table Lens CPSC 533 Information Visualization Course Project, Term 2, 2003 Fengdong Du.
ARMA Boston Spring Seminar 2011 Jesse Wilkins, CRM.
FIND IT! USING LIBRARY CATALOGING CONCEPTS TO ORGANIZE AND MAKE RECORDS FINDABLE DIONNE L. MACK, INTERIM DIRECTOR OF QUALITY OF LIFE DEPARTMENTS.
RDA Test “Train the Trainer” Module 9: Review of main concepts, changes, etc. [Content as of Mar. 31, 2010]
Automated Information Retrieval
Research on Knowledge Element Relation and Knowledge Service for Agricultural Literature Resource Xie nengfu; Sun wei and Zhang xuefu 3rd April 2017.
Roy B. Clariana, Assistant Professor The Pennsylvania State University
Template library tool and Kestrel training
EBSCOhost Basic Searching for Academic Libraries
Diane Vizine-Goetz OCLC Research
Cataloging the Internet
Transportation Research Thesaurus:
Vocabulary Review Topic.
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Statistics Denmark’s presentation of metadata
Investigating associations between categorical variables
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Information Networks: State of the Art
Text Categorization Berlin Chen 2003 Reference:
Indexing CHARLYN P. SALCEDO, RL.
7.2. Check of courses materials & …
Presentation transcript:

ABOUTNESS: THE HUMAN FACTOR soasis&t Aboutness: Automatic Indexing & Categorization June 21, 2001

Inherent contradictions? n Indexing, cataloging, classification depend on ability to discover and label topics in data n A consistent, controlled, classification scheme facilitates –Data analysis & visualization –Intra-document linking by taxonomy nodes –Investigative analysis of content (Shewhart June 21, 2001) n The world cannot be organized into a single coherent vocabulary (Tobias 1996 )

AboutnessThe view from TREC-9 n The notion of aboutness is considered as a set of terms evoking a subject concept, which is hopefully shared by many people including authors, indexers and the users of the system (Fujita 2000) n In almost all computer applications, users must enter correct words for the desired objects or actions. For success without extensive training... the system must recognize terms that will be chosen spontaneously. We studied spontaneous word choice for objects... and found the variability to be surprisingly large. In every case two people favored the same term with probability <0.20. (Furnas et al. 1987)

The Human Factor Whose concept? n A subject authority file (or a controlled vocabulary list or a thesaurus or list of approved topics or selected key words) is essential to an indexing project because is maintains consistency and accuracy. (Semonche 1993, p. 373)

The Human FactorThe right label n HEMMER: So... the FBI mess right now... why you believe we were led to this point, to the FBI documents? n HARRIS: Well, you know, most people... do word searches. And the same in Lexis legal research. You put a word in and you can find any document in which that word appears. Unfortunately, the FBI does not use that system in its document storage. They use indexes. So the index is only as good as the indexer. So if I index a certain document in certain ways, and for example, in the yellow pages, if you look under refrigerators, you won't find anything. You have to go to major appliances. And because the FBI uses this kind of an indexing system, I understand that it is very difficult to retrieve each and every document that might be relevant. (CNN May 31, 2001, Bill Hemmer interview with Jeffrey Harris, Former Federal Prosecutor)

Consistently applied n At most 90 percent of the Census returns are correctly classified by the human editors, and the Dow Jones editors achieve only 83 percent recall and 88 percent precision when compared to an expert editor.... Human systems set the targets for acceptable levels of inconsistency and error for an automated system. (Fidel 1993, p. 285)

n Topics are inherently fuzzy n Searchers are naive n All documents retrieved should be relevant n All relevant documents should be retrieved Indexing is about access

Exercise n What are the articles about? –Note language that is evidence of aboutness –Is there more than one subject in the document? –What is their relative importance to overall meaning? n Assign index terms –Select from list of subject terms –Apply all relevant terms –Rank terms in order of importance to the document –Note prominent subjects that are NOT on the approved list

Discussion n Major References n Minor References, Subtopics n Document types –Survey –Summary –Review

Bibliography n Albrechtsen, Hanne. Subject analysis and indexing: from automated indexing to domain analysis. The Indexer 16:234-8 n Farrow, John. All in the mind: concept analysis in indexing. The Indexer 19:243-7 n Fidel, Raya, Trudi Ballardo Hahn, Edie M. Rasmussen, Philip J. Smith. Challenges in Indexing Electronic Text and Images. ASIS monograph. Medford, N.J.: Learned Information, n Fujita, Sumio. Reflections on aboutness: TREC-9 evaluation experiments at Justsystem. Proceedings of the Ninth Text REtrieval Conference (TREC-9 ), Gaithersburg, MD, November 13-16, Available as of June 18, 2001 at n Furnas G, T.K. Landauer, L.M. Gomez, S.T. Dumais. The vocabulary problem in human-system communication. Communications of the ACM 1987 (30):

Bibliography n Semonche, Barbara. Newspaper indexing policies and procedures. In News Media Libraries: A Management Handbook. Westwood, CT: Greenwood,1993. n Star, Susan Leigh. To classify is human. Keynote talk, Hypertext 96, Champaign-Urbana, GLIS, March 3, n Tobias, Jennifer. Seeking the subject. Library Trends 47 (1998), no. 2: n Weinberg, Bella Hass. Why indexing fails the searcher. The Indexer 16:3-6.

The Associated Press State & Local Wire May 9, 2001, Wednesday TUCSON, Ariz. Trial renewed for man who blamed slayings on alien orders A man sentenced to death for two murders he blamed on orders from space aliens will not attend his second trial. ROBERT J MOODY (95%); MICHAEL CRUIKSHANK (91%); ALAN P GOLDBERG (78%); PATRICIA MAGDA (62%); ARIZONA, USA (55%); LAWYERS (90%); SENTENCING (90%); TESTIMONY (90%); CAPITAL PUNISHMENT (90%); MURDER (78%); WITNESSES (78%); COMPETENCE (73%); COCAINE (51%) ;

Associated Press State & Local Wire December 27, 200 LAS VEGAS Las Vegas minor league team adopts out-of-this-world Area 51 image The mascot and theme music possibilities are, well, otherworldly. Known for 18 years as the Stars, Las Vegas' minor league baseball team is remaking itself. They're not the Blackjacks, Royal Flushes or Silver Dollars, but the 51s, as in Area 51, the famously top-secret test site in the Nevada desert oft-rumored to be the mysterious location of captured UFO spacecraft and even alien beings. LOS ANGELES DODGERS (78%); AIR FORCE (78%) NEVADA, USA (57%); LAS VEGAS, NV, USA (83%); SEATTLE, WA, USA (53%); LOS ANGELES, CA,USA (53%) AMERICAN BANK OF COMMERCE (74%); LOS ANGELES DODGERS (78%); AIR FORCE (78%) BASEBALL & SOFTBALL (90%); DESERTS (77%); ALIENS

Business Wire January 30, 2001, Tuesday SANTA MONICA, Calif., Jan. 30, 2001 Atlantis Group and Alien Voices Form an Extraterrestrial Alliance Atlantis Group, Inc., an audio recording and post-production studio,announces an alliance with Alien Voices, a multimedia production company co-owned by Leonard Nimoy and John de Lancie, veteran actors of the cult classic sci-fi series, "Star Trek." LEONARD NIMOY (93%); JOHN CHOMINSKY (88%); JOHN DE LANCIE (76%); ATLANTIS GROUP INC (97%); ALIEN VOICES (94%) INDUSTRY: ENTERTAINMENT MARKETING AGREEMENTS SIC3081 UNSUPPORTED PLASTICS FILM & SHEET CALIFORNIA NEW YORK SANTA MONICA (59%); KANSAS CITY, MO, USA (53%); CALIFORNIA, USA (59%) ENTERTAINMENT & ARTS (92%); PRESS RELEASES (92%); ALLIANCES & PARTNERSHIPS (90%); MOVIE INDUSTRY (90%); RECORDING INDUSTRY (90%); ADVERTISING AGENCIES (60%); ADVERTISING (50%)

Los Angeles Times May 24, 2001 Thursday YUMA, Ariz. 12 Border-Crossers Die, 4 Still Missing in Desert At least 12 men trying to cross a remote stretch of scorching desert from Mexico into southwestern Arizona died from exposure Wednesday and a search was launched for additional victims. Southern Arizona has become a popular crossing point for illegal immigrants since Border Patrol crackdowns in Texas and California prompted people to try to enter the United States through more isolated, inhospitable areas. Scores have died from exposure. It is believed that some of the undocumented immigrants were from the state of Veracruz. BORDER PATROL (96%); YUMA REGIONAL MEDICAL CENTER (61%); ILLEGAL ALIENS; IMMIGRANTS; HOT WEATHER; DEHYDRATION NATIONAL BORDERS (93%); SMUGGLING (90%); ILLEGAL IMMIGRANTS (90%); ACCIDENTAL FATALITIES (78%); WILDLIFE (76%); DESERTS (73%)

The New York Times May 30, 2001, Wednesday Panel Advises Quarantine for Any Material From Mars Rocks and soil brought back to Earth from Mars by a future space mission should be handled as if they were chock full of deadly microbes, even though they will almost certainly prove lifeless, a panel of experts said yesterday. Upon arrival on Earth, the material should be quarantined in a special laboratory similar to those used to study Ebola and other highly contagious, lethal diseases, the panel said, and unless it is completely devoid of any possible signs of life, it should be sterilized through heat or radiation before being released to researchers outside of the quarantine SPACE; MEDICINE AND HEALTH; ROCK AND STONE; MARS (PLANET); LIFE, EXTRATERRESTRIAL BACTERIA (90%); RESEARCH (90%); SPACE EXPLORATION (78%); TROPICAL DISEASES (76%) ORGANIZATION: AMERICAN GEOPHYSICAL UNION (59%); NATIONAL RESEARCH COUNCIL (59%) CHANG, KENNETH JOHN A WOOD (59%)

St. Louis Post-Dispatch June 8, 2001 Friday "EVOLUTION" DESCENDS FROM A LONG LINE OF MOVIES; BUT IT IS THE CAST THAT MAKES THE LIGHTWEIGHT COMEDY WORK. Half the movies at the multiplex are really just an excuse for Hollywood to show off its toys. From computer-generated imagery to surgically enhanced bodies, summer movies are all about spectacle. Often, the difference between good and bad (that is, the difference between a "Shrek" and a "Pearl Harbor") is the quality of the script that the producers drape over the effects. But in the case of "Evolution," this effects-laden comedy succeeds because of the easygoing charm of its stars. DAVID DUCHOVNY (94%); ORLANDO JONES (85%); IVAN REITMAN (52%); DANIEL EDWARD 'DAN' AYKROYD (51%) ARIZONA, USA (57%) ALIEN; HUMOR MOVIE REVIEWS (78%); SPECIAL EFFECTS (78%); CELEBRITIES (78%); COMMUNITY COLLEGES (70%); TELEVISION PROGRAMMING (62%)

AP Worldstream February 26, 2001; Monday WASHINGTON Study: Crystal in meteorite proves life once existed on Mars 1/4 A crystal found in a meteorite from Mars could only have been formed by a microbe and may be evidence of the oldest life form ever found, U.S. researchers say. Scientists at the Johnson Space Center in Houston say that a crystalized magnetic mineral, called magnetite, found in Martian meteorite is similar to crystals formed on Earth by bacteria. KATHIE THOMAS-KEPRTA (92%); E IMRE FRIEDMANN (78%) JOHNSON SPACE CENTER (68%); CALIFORNIA, USA (50%); RESEARCH (92%); RESEARCH REPORTS (91%); SPACE EXPLORATION (90%); BACTERIA (90%); SCIENCE NEWS (90%)

Business Wire June 12, 2001, Tuesday FAIRFAX, Va., June 12, 2001 Xybernaut's Wearable Computer Chosen for Mars Training Mission in Northern Canada; Mobile Assistant to be Integrated Into Space Suit Xybernaut(R) Corporation (Nasdaq:XYBR), the leader in mobile wearable computing and wireless communications, today announced that it has been chosen to provide its Mobile Assistant(R) wearable computers for the 2001 Haughton-Mars Project (HMP). Heavily funded through NASA and the non-profit SETI (Search for Extra-Terrestrial Intelligence) Institute, this research project is dedicated to the exploration of the planet Mars. COMPUTERS/ELECTRONICS ELECTRONIC GAMES/MULTIMEDIA GOVERNMENT HARDWARE SOFTWARE VIRGINIA INTERNATIONAL CANADA CANADA (89%); WEARABLE COMPUTERS (96%); PRESS RELEASES (91%); COMPUTING & TECHNOLOGY (90%); MULTIMEDIA SOFTWARE (90%); SPACE EXPLORATION (90%); WIRELESS & BROADCAST EQUIPMENT

Tampa Tribune June 11, 2001, Monday Fast Forward Air pollution and infants: Pollution in our atmosphere may be responsible for as much as 9 percent of infant deaths in the United States, concludes a study by an international team of researchers. Investigators evaluated air quality in eight cities from and compared it to total infant mortality in the same period, eliminating such variables as race, education and marital status. LOS ANGELES, CA, USA (55%); PHILADELPHIA, PA, USA (55%); SEATTLE, WA, USA (55%); CHICAGO, IL, USA (55%); DETROIT, MI, USA (55%); HOUSTON, TX, USA (55%); ENVIRONMENTAL PROTECTION AGENCY (82%); FASTFORWARD; BRIEF POLLUTION (94%); ENVIRONMENT (90%); AIR POLLUTION (90%); AIR QUALITY (90%); LIFE EXPECTANCY (90%); RESEARCH (90%); RESEARCH REPORTS (78%); WEATHER (78%); CHILDREN (78%); PHYSICIANS & SURGEONS (76%); CARDIOLOGY (72%); PAPER & PACKAGING INDUSTRY (72%); CARDIOVASCULAR DISEASE (68%); AGING (68%); HYPERTENSION (65%); CARDIOVASCULAR DRUGS (65%); TELEVISION PROGRAMMING (62%); LANDFILLS (52%)