Workshop on The Transformation of Science Max Planck Society, Elmau, Germany June 1, 1999 TOWARDS INFORMATIONAL SCIENCE Indexing and Analyzing the Knowledge of Scientific Communities Bruce R. Schatz CANIS Laboratory Graduate School of Library and Information Science University of Illinois at Urbana-Champaign
Informational Science Towards the Fourth Branch of Science Computer Science => Computational Science Information Science => Informational Science Correlation of Knowledge across Sources Distributed Community Repositories Semantic Indexing of Community Knowledge Analysis Environments on the Net
Community Repositories in the Interspace Every Person performs Every Role USERrequest LIBRARIANreference INDEXERclassify PUBLISHERquality AUTHORgenerate The Distributed World
Community Systems browse and share all the knowledge of a community data results (database management)(electronic mail) literature news (information retrieval) (bulletin boards ) knowledge (hypertext annotations) Formal Informal
Worm Community System WCS Information: Literature BIOSIS, MEDLINE, newsletters, meetings Data Genes, Maps, Sequences, strains, people WCS Functionality Browsingsearch, navigation Filteringselection, analysis Sharinglinking, publishing WCS: 250 users at 50 labs across Internet (1991)
WCS
THE THIRD WAVE OF NET EVOLUTION PACKETS OBJECTS CONCEPTS
from Objects to Concepts from Syntax to Semantics Infrastructure is Interaction with Abstraction Internet is packet transmission across computers Interspace is concept navigation across repositories CONCEPT SPACES
Technology Engineering Electrical FORMAL INFORMAL (manual) (automatic) IEEE communities groups individuals LEVELS OF INDEXES
SCALABLE SEMANTICS Automatic indexing Domain-Independent indexing Statistical clustering Compute Context of concepts within documents documents within repositories
CROSS-OVERS IN SEMANTIC INDEXING
COMPUTING CONCEPTS ‘92: 4,000 (molecular biology) ‘93: 40,000 (molecular biology) ‘95: 400,000 (electrical engineering) ‘96: 4,000,000 (engineering) ‘98: 40,000,000 (medicine)
SIMULATING A NEW WORLD Obtain discipline-scale collection MEDLINE from NLM, 10M bibliographic abstracts human classification: Medical Subject Headings Partition discipline into Community Repositories 4 core terms per abstract for MeSH classification 32K nodes with core terms (classification tree) Community is all abstracts classified by core term 40M abstracts containing 280M concepts concept spaces took 2 days on NCSA Origin 2000 Simulating World of Medical Communities 10K repositories with > 1K abstracts (1K w/ > 10K)
COMMUNITY PROCESSING
INTERSPACE NAVIGATION Semantic Indexes for Community Repositories Navigating Abstractions within Repository concept space category map Interactive browsing by Community experts
Interspace Remote Access Client
Navigation in MEDSPACE For a patient with Rheumatoid Arthritis Find a drug that reduces the pain (analgesic) but does not cause stomach (gastrointestinal) bleeding Choose Domain
Concept Search
Concept Navigation
Retrieve Document
Navigate Document
Retrieve Document
Category Map
Category Navigation
Concept Navigation
SWITCHING In the Interspace… each Community maintains its own repository Switching is navigating Across repositories use your vocabulary to search another specialty
Medicine Session
Categories and Concepts
Concept Switching
Document Retrieval
CONCEPT SWITCHING “Concept” versus “Term” set of “semantically” equivalent terms Concept switching region to region (set to set) match term Semantic region Concept Space
Building Your Interspace Gather the Information Sources external bibliographic and community documents community meta-data and specialty data Generate the Community Repositories concept spaces (terms) & category maps (documents) Construct the Analysis Environment concept switching and community links Evolve Community Interspace concept navigation and object sharing
THE NET OF THE 21st CENTURY Beyond Objects to Concepts Beyond Search to Analysis Every Community has its own special library Every Community does semantic indexing Problem Solving via Cross-Correlating Concepts Across the Interspace
The Zen of the Net