Graduate School of Informatics Kyoto University, November 21, 2001 Technologies of the Interspace Peer-Peer Semantic Indexing Bruce Schatz CANIS Laboratory Graduate School of Library and Information Science University of Illinois at Urbana-Champaign
THE THIRD WAVE OF NET EVOLUTION PACKETS OBJECTS CONCEPTS
SCALABLE SEMANTICS Automatic indexing Domain-Independent indexing Statistical clustering Compute Context of concepts within documents documents within repositories
CROSS-OVERS IN SEMANTIC INDEXING
COMPUTING CONCEPTS ‘92: 4,000 (molecular biology) ‘93: 40,000 (molecular biology) ‘95: 400,000 (electrical engineering) ‘96: 4,000,000 (engineering) ‘98: 40,000,000 (medicine)
SIMULATING A NEW WORLD Obtain discipline-scale collection MEDLINE from NLM, 10M bibliographic abstracts human classification: Medical Subject Headings Partition discipline into Community Repositories 4 core terms per abstract for MeSH classification 32K nodes with core terms (classification tree) Community is all abstracts classified by core term 40M abstracts containing 280M concepts concept spaces took 2 days on NCSA Origin 2000 Simulating World of Medical Communities 10K repositories with > 1K abstracts (1K w/ > 10K)
COMMUNITY PROCESSING
Existing Technologies Extracting Concepts (AI) Canonical noun phrases Generic statistical parser Computing Context (IR) Co-occurrence frequency, in collection Useful interactively, not strict ordering
CONCEPT NAVIGATION Semantic Indexes for Community Repositories Navigating Abstractions within Repository concept space category map Interactive browsing by Community experts
Category Map
Category Navigation
Concept Navigation
CONCEPT SWITCHING “Concept” versus “Term” set of “semantically” equivalent terms Concept switching region to region (set to set) match term Semantic region Concept Space
Medicine Session
Categories and Concepts
Concept Switching
Document Retrieval
Future Technologies Concept Switching Spreading activation, similarity clusters Path Matching Aggregating indexes, many repositories Dynamic Indexing On-the-fly collections, during session
Peer-Peer Computations Local Interaction Your PC does small computations e.g. screensaver for SETI Global Merging Partition computation into small parts Each local forms part of global whole Large-Scale Distribution 3M users of Public Health.
THE NET OF THE 21st CENTURY Beyond Objects to Concepts Beyond Search to Analysis Problem Solving via Cross-Correlating Multimedia Information across the Net Every community has its own special library Every community does semantic indexing
Zen of Information Retrieval Searching without Searching Navigate concepts into documents Based on interactive recognition Indexing without Indexing Compute context on dynamic collections Based on distributed extraction Sharing without Sharing Record paths during user sessions Based on community practices