Download presentation
Presentation is loading. Please wait.
Published byNancy Hopkins Modified over 8 years ago
1
Department of Computer Science seminar University of Illinois, February 14, 2005 The Evolution of the Net: Predicting Global Infrastructure Bruce R. Schatz CANIS Laboratory Graduate School of Library & Information Science schatz@uiuc.edu, www.canis.uiuc.edu
2
Art of Physical Architecture
3
Art of Logical Architecture
4
The Evolution of the Net Niels Bohr on Quantum Theory “Prediction is very Difficult, especially about the Future”
5
THE THIRD WAVE OF NET EVOLUTION PACKETS OBJECTS CONCEPTS
6
Transparent Federation across Sources Generic Protocols for Global Infrastructure Ultimate Goal is cyberspace visions of “being one with all the world’s knowledge” Computer Science and Infrastructure
7
1985Operating Systemscaching 1995 Database Managementtagging 2005Information Retrievalclustering 2015Artificial Intelligencerecognizing Computer Science and Infrastructure
8
1985SyntaxFiles (wholes) 1995 StructureRecords (parts) 2005SemanticsConcepts (meaning) 2015PragmaticsFeatures (reality) Linguistics Levels and Universal Units
9
196019701980199020002010 Grand Visions Text Search Document Search Concept Search StructureSyntaxSemantics Evolution of Information Retrieval across the Net from: Bruce R. Schatz, “Information Retrieval in Digital Libraries: Bringing Search to the Net” cover article in Science, vol 275, Jan 17, 1997 special issue on Bioinformatics Evolution of Information Retrieval
11
Same Query into Multiple Sources Results return Uniform Packages Packets are for Bits, but Objects need more Information Units are for Database Items 1985 Syntax Federation
12
CMU Computer Science – Andrew Apollo Domain – distributed file system Xerox Star – multimedia document system Bellcore Network Systems – Fibers Telenet – International Packet Switches Dialog – Bibliographic Text Searches 1985 Technology Environment
13
Distributed Documents Distributed Collections Multimedia Documents Networked Hypertext Document Browsing (links across sources) Document Search (texts across sources) Telesophy Prototype
14
Telesophy Session
15
Bitmapped Workstation with Custom Software $30K Apollo with 10Mb/s WAN Windows via Brown [hypertext] Objects via Xerox [Smalltalk] Information Units and Data Items 300K Units across 20 sources Bellcore R&D, $2.5M 1984-1988 Telesophy Implementation
16
Browsing requires Caching across Internet Raw bandwidth insufficient 200ms Ping versus 250ms Saccade Lookahead Applications Specific Protocols 1987 Internet Research Task Force 1989 ARPANET 20 th Anniversary 1990 Dissertation on Interactive Retrieval Operating System Research
18
Search using Parts of Documents Transparent merge different Schema Results return Complete Displays Displayers invoked for all types 1995 Structure Federation
19
NCSA and the World-Wide Web Mosaic – multimedia document browsing HTTP – standard query protocol University Library and Online Retrieval Ovid – full-text journal searching SGML – standard document protocol 1995 Technology Environment
20
Full Distributed Documents Full Displays with tables and equations Distributed Collections from publishers Single Federated Collection Streamlined search using tag structure Canonical tag schema with translation DeLIver System
21
DeLIver Session
22
Desktop PC plus Custom Software Integration $5K IBM Personal Computer Mosaic via NCSA [hypertext] Displays via SoftQuad [viewers] Custom DTD and SSL for tags and styles 100K articles for 3000 users NSF DLI, $5M 1994-1998 DeLIver Implementation
23
Metadata Extraction for Structure Federation Raw schema insufficient Different names and different types Author tags in physics vs mathematics 1995 interactive databases using Mosaic 1997 Beat Elsevier using canonical tags 1999 production distributed XML federation Database Management Research
25
Search using Concepts above Words Extraction of Concepts from Documents Statistical Index on Community Collections Concept Navigation across Collections 2005 Semantic Federation
26
Web Portals and statistical NLP Google – statistical linked contexts NLP – statistical generic parsers Fast Processors and Big Disks Gigaflops – Beowulfs and cluster computing Terabytes – RAIDs and literature scaling 2005 Technology Environment
27
Fully Parsed Documents Concepts and Entities auto generated Distributed Collections from communities Fully Related Concepts Switching across Community Repositories Automatic Links to Entity Databases BeeSpace System
28
BeeSpace Session
29
Commodity PC plus Custom Software $1K Dell Personal Computer $15K Server 1 Gflops 2 TBytes Semantic Indexing generic scalable Concept Extraction and Normalization Concept Co-occurrence on Collections 50M articles across 50K repositories BeeSpace Implementation
30
Statistical Clustering Equivalent Phrases Raw phrases insufficient Phrase parsing with normalization Entity recognition with normalization 1998 semantic indexing (concepts from terms) 1999 information spaceflight (categories from documents) Information Retrieval Research
31
from Objects to Concepts from Syntax to Semantics Infrastructure is Interaction with Abstraction Internet is packet transmission across computers Interspace is concept navigation across repositories CONCEPT SPACES
32
Technology Engineering Electrical FORMAL INFORMAL (manual) (automatic) IEEE communities groups individuals LEVELS OF INDEXES
33
Technology Trends IEEE Computer for January 2002 Information Infrastructure for Trends issue Document Representation (Semantic Web) Language Parsing (TIPSTER) Statistical Indexing (TREC) Peer-Peer Networking (SETI@home) Vocabulary Switching (UMLS)
34
SCALABLE SEMANTICS Automatic indexing Domain-Independent indexing Statistical clustering Compute Context of concepts within documents documents within repositories
35
COMPUTING CONCEPTS ‘92: 4,000 (molecular biology) ‘93: 40,000 (molecular biology) ‘95: 400,000 (electrical engineering) ‘96: 4,000,000 (engineering) ‘98: 40,000,000 (medicine)
36
SIMULATING A NEW WORLD Obtain discipline-scale collection MEDLINE from NLM, 10M bibliographic abstracts human classification: Medical Subject Headings Partition discipline into Community Repositories 4 core terms per abstract for MeSH classification 32K nodes with core terms (classification tree) Community is all abstracts classified by core term 40M abstracts containing 280M concepts concept spaces took 2 days on NCSA Origin 2000 Simulating World of Medical Communities 10K repositories with > 1K abstracts (1K w/ > 10K)
37
COMMUNITY PROCESSING
38
INTERSPACE NAVIGATION Semantic Indexes for Community Repositories Navigating Abstractions within Repository concept space & category map Interactive browsing by Community experts *www.canis.uiuc.edu/interspace-prototype
39
Interspace Remote Access Client
40
Navigation in MEDSPACE For a patient with Rheumatoid Arthritis Find a drug that reduces the pain (analgesic) but does not cause stomach (gastrointestinal) bleeding Choose Domain
41
Concept Search
42
Concept Navigation
43
Retrieve Document
44
Navigate Document
45
Retrieve Document
46
Concept Navigation
48
SWITCHING In the Interspace… each Community maintains its own repository Switching is navigating Across repositories use your vocabulary to search another specialty
49
CONCEPT SWITCHING “Concept” versus “Term” set of “semantically” equivalent terms Concept switching region to region (set to set) match term Semantic region Concept Space
50
Biomedical Session
51
Categories and Concepts
52
Concept Switching
53
Document Retrieval
54
THE NET OF THE 21st CENTURY Beyond Objects to Concepts Beyond Search to Analysis Problem Solving via Cross-Correlating Multimedia Information across the Net Every community has its own special library Every community does semantic indexing The Interspace approximates Cyberspace
56
Beyond Words and Concepts to Reality Feature Vectors describing Situation Each Individual has Vector (< Community) Discrete Samples into Continuous Monitors 2015 Pragmatics Federation
57
Continuous Vector Recording Health Grid – personal lifestyle monitors Peer-to-Peer – beyond Napster and Amazon Individual User Modeling Cohort Grouping – custom clustering Adaptable Interfaces – multiple levels 2015 Technology Environment
58
Continuous Monitoring Adaptive Questionnaires full-spectrum Distributed Collections from individuals Situational Analysis Structured Vectors custom for Individuals Population Cohorts for Decision Support Lifestyle Monitor System
59
Lifestyle Monitor Questions How good is your health? What is your typical energy level? Do you eat well-balanced foods? How much do you eat? Do you exercise for at least half an hour? How often are you tired without exercising? How much do you sleep a night? Do you get enough sleep (to not be tired)? How often are you in pain? Do you feel happy with your life? Can you lead a full life with your current health? Can you deal adequately with all your problems? Are you worried about things you cannot control? Do you feel too tired to function properly? Does time hang heavy on you in an average day? Sample General Health Questions for User Modeling
60
Lifestyle Monitor Session
61
Structured Vectors Individual customized Raw concepts insufficient Adaptive Concepts for individual situations Structured Vectors for cohort clustering Situational Analysis infrastructure support 2007 Internet Health Monitors prototypes 2011 Population Health Monitors for chronic illness regionally deployed Artificial Intelligence Research
62
THE DISTRIBUTED WORLD Community Repositories in the Interspace Peer to Peer Networking Infrastructure Every Person performs Every Role USERrequest LIBRARIANreference INDEXERclassify PUBLISHERquality AUTHORgenerate
63
from Concepts to Features from Semantics to Pragmatics Infrastructure is Interaction with Abstraction Interspace is concept navigation across repositories Intermind is feature comparison across individuals FEATURE VECTORS
64
Towards the Intermind Beyond Concepts to Features Beyond Analysis to Synthesis Problem Solving via Cross-Correlating Universal Knowledge across the Net Every individual has its own special vector Every viewpoint does semantic clustering true The Intermind is true Cyberspace
65
Today the Hive Tomorrow the HiveMind
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.