Department of Computer Science seminar University of Illinois, February 14, 2005 The Evolution of the Net: Predicting Global Infrastructure Bruce R. Schatz.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Distributed Data Processing
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
Bioinformatics Director Lecture University of Michigan Medical School February 7, 2000 Building Analysis Environments Beyond the Genome and the Web Bruce.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Michigan Life Sciences Corridor Bioinformatics, University of Michigan March 14, 2001 Building Analysis Environments Beyond the Genome and the Web Bruce.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Jacob Boston Josh Pfeifer. Definition of HyperText Transfer Protocol How HTTP works How Websites work GoDaddy.com OSI Model Networking.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
High-Performance Digital Library Classification Systems: PI: Hsinchun Chen, The University of Arizona From Information Retrieval to Knowledge Management.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
BeeSpace: An Interactive Environment for Analyzing Nature and Nurture in Societal Roles Bruce Schatz Institute for Genomic Biology University of Illinois.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
GESIS Dr. Maximilian Stempfhuber Head of Research and Development Social Science Information Centre, Bonn, Germany How to deal with heterogeneity when.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Federated Search of Scientific Literature Presented by Jozsef Vass Multimedia Communications and Visualization Laboratory Department of Computer Engineering.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
University of Illinois at Urbana-Champaign INSTITUTE FOR GENOMIC BIOLOGY BeeSpace: An Interactive Environment for Functional Analysis of Social Behavior.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
International Conference on Digital Libraries November 16, 2000 Kyoto, Japan Digital Libraries of Community Knowledge: The Coming World of the Interspace.
IEEE Knowledge Media Networking KMN’02 Keynote Address, CRL, Kyoto Japan, July 11, 2002 Concept Switching in the Interspace: Networking Infrastructure.
MEDLINE for Medical Research Juliet Ralph and César Pimenta Hilary Term 2007.
GSLIS Proseminar February 24, 2003 The Evolution of the Net: Predicting Network Infrastructure Bruce R. Schatz Graduate School of Library and Information.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
We have displayed the Browse publisher drop down menu. This You have full access to: list for an institution where all the material is included in the.
CNI Spring Meeting April 26, 1999 Washington, DC THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory Graduate School.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
To Find contents by publisher, click on the drop down menu. This is different than the Partner publishers services where users enter the publisher’s portals.
1 CS 430: Information Discovery Lecture 25 Cluster Analysis 2 Thesaurus Construction.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
CODE (Committee on Digital Environment) July 26, 2000 Rice University THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory.
Workshop on The Transformation of Science Max Planck Society, Elmau, Germany June 1, 1999 TOWARDS INFORMATIONAL SCIENCE Indexing and Analyzing the Knowledge.
Graduate School of Informatics Kyoto University, November 21, 2001 Technologies of the Interspace Peer-Peer Semantic Indexing Bruce Schatz CANIS Laboratory.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Revolutionary System Models, The Net, & The Public Interest The Interspace Prototype ( ) Digital Libraries Initiative ( ) Worm Community.
Revolution & Kids: Building the Future of the Net & Understanding the Structures of the World Bruce R. Schatz CANIS - Community Systems Laboratory University.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Department of Social Informatics Graduate School of Informatics Kyoto University, Japan July 8, 2004 The Social Informatics of Healthcare Infrastructure.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
BeeSpace: An Interactive Environment for Functional Analysis of Social Behavior Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Clinical Department of Psychiatry University of Michigan Medical School Ann Arbor, June 28, 2002 Why Medicine Should be an Information Science Bruce R.
1 CS 430: Information Discovery Lecture 28 (a) Two Examples of Cluster Analysis (b) Conclusion.
Graduate School of Informatics Kyoto University, November 14, 2001 Functions of the Interspace Infrastructure for Concept Spaces Bruce Schatz CANIS Laboratory.
Applications of the Interspace Analysis for Community Repositories
Using computers to search electronic databases
Information Retrieval and Web Search
Information Retrieval and Web Search
Introduction to Information Retrieval
Information Retrieval in Digital Libraries: Bringing Search to the Net
PHARM Library Orientation
Presentation transcript:

Department of Computer Science seminar University of Illinois, February 14, 2005 The Evolution of the Net: Predicting Global Infrastructure Bruce R. Schatz CANIS Laboratory Graduate School of Library & Information Science

Art of Physical Architecture

Art of Logical Architecture

The Evolution of the Net Niels Bohr on Quantum Theory “Prediction is very Difficult, especially about the Future”

THE THIRD WAVE OF NET EVOLUTION PACKETS OBJECTS CONCEPTS

Transparent Federation across Sources Generic Protocols for Global Infrastructure Ultimate Goal is cyberspace visions of “being one with all the world’s knowledge” Computer Science and Infrastructure

1985Operating Systemscaching 1995 Database Managementtagging 2005Information Retrievalclustering 2015Artificial Intelligencerecognizing Computer Science and Infrastructure

1985SyntaxFiles (wholes) 1995 StructureRecords (parts) 2005SemanticsConcepts (meaning) 2015PragmaticsFeatures (reality) Linguistics Levels and Universal Units

Grand Visions Text Search Document Search Concept Search StructureSyntaxSemantics Evolution of Information Retrieval across the Net from: Bruce R. Schatz, “Information Retrieval in Digital Libraries: Bringing Search to the Net” cover article in Science, vol 275, Jan 17, 1997 special issue on Bioinformatics Evolution of Information Retrieval

Same Query into Multiple Sources Results return Uniform Packages Packets are for Bits, but Objects need more Information Units are for Database Items 1985 Syntax Federation

CMU Computer Science – Andrew Apollo Domain – distributed file system Xerox Star – multimedia document system Bellcore Network Systems – Fibers Telenet – International Packet Switches Dialog – Bibliographic Text Searches 1985 Technology Environment

Distributed Documents Distributed Collections Multimedia Documents Networked Hypertext Document Browsing (links across sources) Document Search (texts across sources) Telesophy Prototype

Telesophy Session

Bitmapped Workstation with Custom Software $30K Apollo with 10Mb/s WAN Windows via Brown [hypertext] Objects via Xerox [Smalltalk] Information Units and Data Items 300K Units across 20 sources Bellcore R&D, $2.5M Telesophy Implementation

Browsing requires Caching across Internet Raw bandwidth insufficient 200ms Ping versus 250ms Saccade Lookahead Applications Specific Protocols 1987 Internet Research Task Force 1989 ARPANET 20 th Anniversary 1990 Dissertation on Interactive Retrieval Operating System Research

Search using Parts of Documents Transparent merge different Schema Results return Complete Displays Displayers invoked for all types 1995 Structure Federation

NCSA and the World-Wide Web Mosaic – multimedia document browsing HTTP – standard query protocol University Library and Online Retrieval Ovid – full-text journal searching SGML – standard document protocol 1995 Technology Environment

Full Distributed Documents Full Displays with tables and equations Distributed Collections from publishers Single Federated Collection Streamlined search using tag structure Canonical tag schema with translation DeLIver System

DeLIver Session

Desktop PC plus Custom Software Integration $5K IBM Personal Computer Mosaic via NCSA [hypertext] Displays via SoftQuad [viewers] Custom DTD and SSL for tags and styles 100K articles for 3000 users NSF DLI, $5M DeLIver Implementation

Metadata Extraction for Structure Federation Raw schema insufficient Different names and different types Author tags in physics vs mathematics 1995 interactive databases using Mosaic 1997 Beat Elsevier using canonical tags 1999 production distributed XML federation Database Management Research

Search using Concepts above Words Extraction of Concepts from Documents Statistical Index on Community Collections Concept Navigation across Collections 2005 Semantic Federation

Web Portals and statistical NLP Google – statistical linked contexts NLP – statistical generic parsers Fast Processors and Big Disks Gigaflops – Beowulfs and cluster computing Terabytes – RAIDs and literature scaling 2005 Technology Environment

Fully Parsed Documents Concepts and Entities auto generated Distributed Collections from communities Fully Related Concepts Switching across Community Repositories Automatic Links to Entity Databases BeeSpace System

BeeSpace Session

Commodity PC plus Custom Software $1K Dell Personal Computer $15K Server 1 Gflops 2 TBytes Semantic Indexing generic scalable Concept Extraction and Normalization Concept Co-occurrence on Collections 50M articles across 50K repositories BeeSpace Implementation

Statistical Clustering Equivalent Phrases Raw phrases insufficient Phrase parsing with normalization Entity recognition with normalization 1998 semantic indexing (concepts from terms) 1999 information spaceflight (categories from documents) Information Retrieval Research

from Objects to Concepts from Syntax to Semantics Infrastructure is Interaction with Abstraction Internet is packet transmission across computers Interspace is concept navigation across repositories CONCEPT SPACES

Technology Engineering Electrical FORMAL INFORMAL (manual) (automatic) IEEE communities groups individuals LEVELS OF INDEXES

Technology Trends IEEE Computer for January 2002 Information Infrastructure for Trends issue Document Representation (Semantic Web) Language Parsing (TIPSTER) Statistical Indexing (TREC) Peer-Peer Networking Vocabulary Switching (UMLS)

SCALABLE SEMANTICS Automatic indexing Domain-Independent indexing Statistical clustering Compute Context of concepts within documents documents within repositories

COMPUTING CONCEPTS ‘92: 4,000 (molecular biology) ‘93: 40,000 (molecular biology) ‘95: 400,000 (electrical engineering) ‘96: 4,000,000 (engineering) ‘98: 40,000,000 (medicine)

SIMULATING A NEW WORLD Obtain discipline-scale collection MEDLINE from NLM, 10M bibliographic abstracts human classification: Medical Subject Headings Partition discipline into Community Repositories 4 core terms per abstract for MeSH classification 32K nodes with core terms (classification tree) Community is all abstracts classified by core term 40M abstracts containing 280M concepts concept spaces took 2 days on NCSA Origin 2000 Simulating World of Medical Communities 10K repositories with > 1K abstracts (1K w/ > 10K)

COMMUNITY PROCESSING

INTERSPACE NAVIGATION Semantic Indexes for Community Repositories Navigating Abstractions within Repository concept space & category map Interactive browsing by Community experts *

Interspace Remote Access Client

Navigation in MEDSPACE For a patient with Rheumatoid Arthritis Find a drug that reduces the pain (analgesic) but does not cause stomach (gastrointestinal) bleeding Choose Domain

Concept Search

Concept Navigation

Retrieve Document

Navigate Document

Retrieve Document

Concept Navigation

SWITCHING In the Interspace… each Community maintains its own repository Switching is navigating Across repositories use your vocabulary to search another specialty

CONCEPT SWITCHING “Concept” versus “Term” set of “semantically” equivalent terms Concept switching region to region (set to set) match term Semantic region Concept Space

Biomedical Session

Categories and Concepts

Concept Switching

Document Retrieval

THE NET OF THE 21st CENTURY Beyond Objects to Concepts Beyond Search to Analysis Problem Solving via Cross-Correlating Multimedia Information across the Net Every community has its own special library Every community does semantic indexing The Interspace approximates Cyberspace

Beyond Words and Concepts to Reality Feature Vectors describing Situation Each Individual has Vector (< Community) Discrete Samples into Continuous Monitors 2015 Pragmatics Federation

Continuous Vector Recording Health Grid – personal lifestyle monitors Peer-to-Peer – beyond Napster and Amazon Individual User Modeling Cohort Grouping – custom clustering Adaptable Interfaces – multiple levels 2015 Technology Environment

Continuous Monitoring Adaptive Questionnaires full-spectrum Distributed Collections from individuals Situational Analysis Structured Vectors custom for Individuals Population Cohorts for Decision Support Lifestyle Monitor System

Lifestyle Monitor Questions How good is your health? What is your typical energy level? Do you eat well-balanced foods? How much do you eat? Do you exercise for at least half an hour? How often are you tired without exercising? How much do you sleep a night? Do you get enough sleep (to not be tired)? How often are you in pain? Do you feel happy with your life? Can you lead a full life with your current health? Can you deal adequately with all your problems? Are you worried about things you cannot control? Do you feel too tired to function properly? Does time hang heavy on you in an average day? Sample General Health Questions for User Modeling

Lifestyle Monitor Session

Structured Vectors Individual customized Raw concepts insufficient Adaptive Concepts for individual situations Structured Vectors for cohort clustering Situational Analysis infrastructure support 2007 Internet Health Monitors prototypes 2011 Population Health Monitors for chronic illness regionally deployed Artificial Intelligence Research

THE DISTRIBUTED WORLD Community Repositories in the Interspace Peer to Peer Networking Infrastructure Every Person performs Every Role USERrequest LIBRARIANreference INDEXERclassify PUBLISHERquality AUTHORgenerate

from Concepts to Features from Semantics to Pragmatics Infrastructure is Interaction with Abstraction Interspace is concept navigation across repositories Intermind is feature comparison across individuals FEATURE VECTORS

Towards the Intermind Beyond Concepts to Features Beyond Analysis to Synthesis Problem Solving via Cross-Correlating Universal Knowledge across the Net Every individual has its own special vector Every viewpoint does semantic clustering true The Intermind is true Cyberspace

Today the Hive Tomorrow the HiveMind