Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science seminar University of Illinois, February 14, 2005 The Evolution of the Net: Predicting Global Infrastructure Bruce R. Schatz.

Similar presentations


Presentation on theme: "Department of Computer Science seminar University of Illinois, February 14, 2005 The Evolution of the Net: Predicting Global Infrastructure Bruce R. Schatz."— Presentation transcript:

1 Department of Computer Science seminar University of Illinois, February 14, 2005 The Evolution of the Net: Predicting Global Infrastructure Bruce R. Schatz CANIS Laboratory Graduate School of Library & Information Science schatz@uiuc.edu, www.canis.uiuc.edu

2 Art of Physical Architecture

3 Art of Logical Architecture

4 The Evolution of the Net Niels Bohr on Quantum Theory “Prediction is very Difficult, especially about the Future”

5 THE THIRD WAVE OF NET EVOLUTION PACKETS OBJECTS CONCEPTS

6 Transparent Federation across Sources Generic Protocols for Global Infrastructure Ultimate Goal is cyberspace visions of “being one with all the world’s knowledge” Computer Science and Infrastructure

7 1985Operating Systemscaching 1995 Database Managementtagging 2005Information Retrievalclustering 2015Artificial Intelligencerecognizing Computer Science and Infrastructure

8 1985SyntaxFiles (wholes) 1995 StructureRecords (parts) 2005SemanticsConcepts (meaning) 2015PragmaticsFeatures (reality) Linguistics Levels and Universal Units

9 196019701980199020002010 Grand Visions Text Search Document Search Concept Search StructureSyntaxSemantics Evolution of Information Retrieval across the Net from: Bruce R. Schatz, “Information Retrieval in Digital Libraries: Bringing Search to the Net” cover article in Science, vol 275, Jan 17, 1997 special issue on Bioinformatics Evolution of Information Retrieval

10

11 Same Query into Multiple Sources Results return Uniform Packages Packets are for Bits, but Objects need more Information Units are for Database Items 1985 Syntax Federation

12 CMU Computer Science – Andrew Apollo Domain – distributed file system Xerox Star – multimedia document system Bellcore Network Systems – Fibers Telenet – International Packet Switches Dialog – Bibliographic Text Searches 1985 Technology Environment

13 Distributed Documents Distributed Collections Multimedia Documents Networked Hypertext Document Browsing (links across sources) Document Search (texts across sources) Telesophy Prototype

14 Telesophy Session

15 Bitmapped Workstation with Custom Software $30K Apollo with 10Mb/s WAN Windows via Brown [hypertext] Objects via Xerox [Smalltalk] Information Units and Data Items 300K Units across 20 sources Bellcore R&D, $2.5M 1984-1988 Telesophy Implementation

16 Browsing requires Caching across Internet Raw bandwidth insufficient 200ms Ping versus 250ms Saccade Lookahead Applications Specific Protocols 1987 Internet Research Task Force 1989 ARPANET 20 th Anniversary 1990 Dissertation on Interactive Retrieval Operating System Research

17

18 Search using Parts of Documents Transparent merge different Schema Results return Complete Displays Displayers invoked for all types 1995 Structure Federation

19 NCSA and the World-Wide Web Mosaic – multimedia document browsing HTTP – standard query protocol University Library and Online Retrieval Ovid – full-text journal searching SGML – standard document protocol 1995 Technology Environment

20 Full Distributed Documents Full Displays with tables and equations Distributed Collections from publishers Single Federated Collection Streamlined search using tag structure Canonical tag schema with translation DeLIver System

21 DeLIver Session

22 Desktop PC plus Custom Software Integration $5K IBM Personal Computer Mosaic via NCSA [hypertext] Displays via SoftQuad [viewers] Custom DTD and SSL for tags and styles 100K articles for 3000 users NSF DLI, $5M 1994-1998 DeLIver Implementation

23 Metadata Extraction for Structure Federation Raw schema insufficient Different names and different types Author tags in physics vs mathematics 1995 interactive databases using Mosaic 1997 Beat Elsevier using canonical tags 1999 production distributed XML federation Database Management Research

24

25 Search using Concepts above Words Extraction of Concepts from Documents Statistical Index on Community Collections Concept Navigation across Collections 2005 Semantic Federation

26 Web Portals and statistical NLP Google – statistical linked contexts NLP – statistical generic parsers Fast Processors and Big Disks Gigaflops – Beowulfs and cluster computing Terabytes – RAIDs and literature scaling 2005 Technology Environment

27 Fully Parsed Documents Concepts and Entities auto generated Distributed Collections from communities Fully Related Concepts Switching across Community Repositories Automatic Links to Entity Databases BeeSpace System

28 BeeSpace Session

29 Commodity PC plus Custom Software $1K Dell Personal Computer $15K Server 1 Gflops 2 TBytes Semantic Indexing generic scalable Concept Extraction and Normalization Concept Co-occurrence on Collections 50M articles across 50K repositories BeeSpace Implementation

30 Statistical Clustering Equivalent Phrases Raw phrases insufficient Phrase parsing with normalization Entity recognition with normalization 1998 semantic indexing (concepts from terms) 1999 information spaceflight (categories from documents) Information Retrieval Research

31 from Objects to Concepts from Syntax to Semantics Infrastructure is Interaction with Abstraction Internet is packet transmission across computers Interspace is concept navigation across repositories CONCEPT SPACES

32 Technology Engineering Electrical FORMAL INFORMAL (manual) (automatic) IEEE communities groups individuals LEVELS OF INDEXES

33 Technology Trends IEEE Computer for January 2002 Information Infrastructure for Trends issue Document Representation (Semantic Web) Language Parsing (TIPSTER) Statistical Indexing (TREC) Peer-Peer Networking (SETI@home) Vocabulary Switching (UMLS)

34 SCALABLE SEMANTICS Automatic indexing Domain-Independent indexing Statistical clustering Compute Context of concepts within documents documents within repositories

35 COMPUTING CONCEPTS ‘92: 4,000 (molecular biology) ‘93: 40,000 (molecular biology) ‘95: 400,000 (electrical engineering) ‘96: 4,000,000 (engineering) ‘98: 40,000,000 (medicine)

36 SIMULATING A NEW WORLD Obtain discipline-scale collection MEDLINE from NLM, 10M bibliographic abstracts human classification: Medical Subject Headings Partition discipline into Community Repositories 4 core terms per abstract for MeSH classification 32K nodes with core terms (classification tree) Community is all abstracts classified by core term 40M abstracts containing 280M concepts concept spaces took 2 days on NCSA Origin 2000 Simulating World of Medical Communities 10K repositories with > 1K abstracts (1K w/ > 10K)

37 COMMUNITY PROCESSING

38 INTERSPACE NAVIGATION Semantic Indexes for Community Repositories Navigating Abstractions within Repository concept space & category map Interactive browsing by Community experts *www.canis.uiuc.edu/interspace-prototype

39 Interspace Remote Access Client

40 Navigation in MEDSPACE For a patient with Rheumatoid Arthritis Find a drug that reduces the pain (analgesic) but does not cause stomach (gastrointestinal) bleeding Choose Domain

41 Concept Search

42 Concept Navigation

43 Retrieve Document

44 Navigate Document

45 Retrieve Document

46 Concept Navigation

47

48 SWITCHING In the Interspace… each Community maintains its own repository Switching is navigating Across repositories use your vocabulary to search another specialty

49 CONCEPT SWITCHING “Concept” versus “Term” set of “semantically” equivalent terms Concept switching region to region (set to set) match term Semantic region Concept Space

50 Biomedical Session

51 Categories and Concepts

52 Concept Switching

53 Document Retrieval

54 THE NET OF THE 21st CENTURY Beyond Objects to Concepts Beyond Search to Analysis Problem Solving via Cross-Correlating Multimedia Information across the Net Every community has its own special library Every community does semantic indexing The Interspace approximates Cyberspace

55

56 Beyond Words and Concepts to Reality Feature Vectors describing Situation Each Individual has Vector (< Community) Discrete Samples into Continuous Monitors 2015 Pragmatics Federation

57 Continuous Vector Recording Health Grid – personal lifestyle monitors Peer-to-Peer – beyond Napster and Amazon Individual User Modeling Cohort Grouping – custom clustering Adaptable Interfaces – multiple levels 2015 Technology Environment

58 Continuous Monitoring Adaptive Questionnaires full-spectrum Distributed Collections from individuals Situational Analysis Structured Vectors custom for Individuals Population Cohorts for Decision Support Lifestyle Monitor System

59 Lifestyle Monitor Questions How good is your health? What is your typical energy level? Do you eat well-balanced foods? How much do you eat? Do you exercise for at least half an hour? How often are you tired without exercising? How much do you sleep a night? Do you get enough sleep (to not be tired)? How often are you in pain? Do you feel happy with your life? Can you lead a full life with your current health? Can you deal adequately with all your problems? Are you worried about things you cannot control? Do you feel too tired to function properly? Does time hang heavy on you in an average day? Sample General Health Questions for User Modeling

60 Lifestyle Monitor Session

61 Structured Vectors Individual customized Raw concepts insufficient Adaptive Concepts for individual situations Structured Vectors for cohort clustering Situational Analysis infrastructure support 2007 Internet Health Monitors prototypes 2011 Population Health Monitors for chronic illness regionally deployed Artificial Intelligence Research

62 THE DISTRIBUTED WORLD Community Repositories in the Interspace Peer to Peer Networking Infrastructure Every Person performs Every Role USERrequest LIBRARIANreference INDEXERclassify PUBLISHERquality AUTHORgenerate

63 from Concepts to Features from Semantics to Pragmatics Infrastructure is Interaction with Abstraction Interspace is concept navigation across repositories Intermind is feature comparison across individuals FEATURE VECTORS

64 Towards the Intermind Beyond Concepts to Features Beyond Analysis to Synthesis Problem Solving via Cross-Correlating Universal Knowledge across the Net Every individual has its own special vector Every viewpoint does semantic clustering true The Intermind is true Cyberspace

65 Today the Hive Tomorrow the HiveMind


Download ppt "Department of Computer Science seminar University of Illinois, February 14, 2005 The Evolution of the Net: Predicting Global Infrastructure Bruce R. Schatz."

Similar presentations


Ads by Google