1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

Slides:



Advertisements
Similar presentations
1 William Y. Arms Cornell University October 25, 2002 The National Science Digital Library (NSDL) as an Example of Information Science Research.
Advertisements

1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University.
1 Building the NSDL William Y. Arms Cornell University Thinking aloud about the NSDL.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
1 The Impact of the Internet on Research Universities Examples from Distance Education & Digital Libraries William Y. Arms Department of Computer Science.
Search Engines and Information Retrieval
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
1 CS 430 / INFO 430 Information Retrieval Lecture 15 Usability 3.
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
1 CS 430 / INFO 430 Information Retrieval Lecture 13 Architecture of Information Retrieval Systems.
1 DLESE in Context: Educational Computing, Digital Libraries and Scientific Education William Y. Arms Cornell University.
1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.
1 CS 430 / INFO 430 Information Retrieval Lecture 22 Metadata 4.
1 NSDL The National Science Foundation's National Digital Library for Science, Mathematics, Engineering and Technology Education [a.k.a. Smete, NSDL, Learns,...]
SCORM-NSDL Workshop May 18, Educational Materials are Scattered across the Internet NASA Math Forum State standards Scientific American Ask.
Mixed content, mixed metadata: Information discovery in the NSDL.
1 CS 430: Information Discovery Lecture 21 Web Search 3.
1 CS 502: Computing Methods for Digital Libraries Lecture 16 Web search engines.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
1 Automated Digital Libraries William Y. Arms Department of Computer Science Cornell University.
1 Minerva The Web Preservation Project. 2 Team Members Library of Congress Roger Adkins Cassy Ammen Allene Hayes Melissa Levine Diane Kresh Jane Mandelbaum.
1 William Y. Arms Cornell University April 4, 2003 Free Access to Information Today Who Benefits? What are the Risks? Who Pays?
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
1 An introduction to the NSDL William Y. Arms Cornell University.
1 CS 430: Information Discovery Lecture 15 Library Catalogs 3.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
Search Engines and Information Retrieval Chapter 1.
Serenate1 Non-standard users: The Library Raf Dekeyser K.U.Leuven.
1 CS 502: Computing Methods for Digital Libraries Lecture 28 Current work in preservation.
Master Thesis Defense Jan Fiedler 04/17/98
1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2.
7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.
NSDL: OAI and a large- scale digital library Carl Lagoze, Cornell University NSDL Director of Technology
Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
1 CS 430: Information Discovery Lecture 26 Automated Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Architecture of Information Retrieval Systems.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Mixed content, mixed metadata: Information discovery in the NSDL.
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
1 The Digital Library Landscape Looking for Trends William Y. Arms Department of Computer Science Cornell University.
Core Integration Web Services Dean Krafft, Cornell University
1 The NSDL Program Stephen Griffin National Science Foundation.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Information Retrieval
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
1 CS 430: Information Discovery Lecture 18 Web Search Engines: Google.
1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems.
1 CS 430: Information Discovery Lecture 13 Case Study: the NSDL.
Automated Information Retrieval
How to Use Google Scholar An Educator’s Guide
Digital Video Library - Jacky Ma.
CS 430: Information Discovery
NSDL: OAI and a large-scale digital library
CS 430 / INFO 430 Information Retrieval
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Building a large-scale digital library for education
Discussion Class 9 Google.
Presentation transcript:

1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example

2 A Scenario A faculty member wished to find a paper for students to read in a class. He began by asking an expert. She suggested the original research paper as suitable. Later, he typed a few terms into Google, browsed the hits, selected one that led to ResearchIndex, found the paper, and downloaded a PDF version from the author's web site.

3 Computer Science Internet Web Google ResearchIndex PDF Computer Science

4 HCI Browsing Searching User interface design Human Computer Interaction Computer Science

5 HCI: Eye Tracking

6 Roles of expert/instructor/student Cognitive psychology Linguistics Natural language processing Cognitive Studies HCI Cognitive Studies Computer Science

7

8 Organizational change Economics Ethics Social culture Law Society Cognitive Studies HCI Society Computer Science

9 Society Cognitive Studies HCI Computer Science Applications Information Science

10 Open Access to Scientific, Scholarly and Professional Information

11 Before the Web Access to scientific, medical, legal information In the United States: excellent if you belonged to a rich organization (e.g, a major university) very poor otherwise In many countries of the world: very poor for everybody

12 Some Light Reading William Y. Arms, "Economic models for open-access publishing." iMP, March William Y. Arms, "Automated digital libraries." D-Lib Magazine, July/August William Y. Arms, "What are the alternatives to peer review? Quality control in scholarly publishing on the web." Journal of Electronic Publishing, 8(1), August

13 Research Libraries are Expensive library materials buildings & facilities staff

14 Baumol's Cost Disease Year Price Bundle of goods and services Labor-intensive services Manufactured goods 2050

15 Baumol's Cost Disease Year Price Bundle of goods and services Labor-intensive services Manufactured goods 2050 Moore's Law

16 Brute Force Computing Few people really understand Moore's Law Computing power doubles every 18 months Increases 100 times in 10 years Increases 10,000 times in 20 years Simple algorithms plus immense computing power can outperform human intelligence

17 Example: Catalogs and Indexes Cost disease: catalogs and indexes Catalog, index and abstracting records are very expensive when created by skilled professionals Moore's Law: automatic indexing of full text Retrieval effectiveness using automatic indexing can be at least as effective as manual indexing with controlled vocabularies (Cleverdon 1967, reporting on experiments by Salton)

18 Resistance to Change "I used to be a heavy user of INSPEC. Now I use Google instead."

19 Information Discovery: 1992 and Contentprintdigital Computingexpensiveinexpensive Choice of contentselectivecomprehensive Index creationhumanautomatic Frequencyone timemonthly Vocabularycontrollednot controlled Query Booleanranked retrieval Userstraineduntrained

20 Brute Force Computing: Substitutes for Human Intelligence Automated algorithms for information discovery Similarity of two documents Vector space and statistical methods (Salton, Sparc Jones, et al.) Importance of digital object Rank importance of web pages by analysis of the graph of web links (Kleinberg, Page, et al.)

21 Brute Force Computing: Automated Metadata Extraction Informedia (Carnegie Mellon) Automatic processing of segments of video, e.g., television news. Algorithms for: dividing raw video into discrete items generating short summaries indexing the sound track using speech recognition recognizing faces (Wactlar, et al.)

22

23 Simple algorithms plus immense computing power plus the intelligence of the user can replace labor-intensive services Cognitive Studies HCI Low Cost Information Computer Science

24 The National Science Digital Library (NSDL)

25 Scope All digital information relevant to any level of education in any branch of science. Scientific and technical information Materials used in education Materials tailored to education

26 All branches of science, all levels of education, very broadly defined: Five year targets 1,000,000 different users 10,000,000 digital objects 10,000 to 100,000 independent sites How Big might the NSDL be?

27 Resources Integration team Budget $4-6 million Staff Management Diffuse How can a small team, without direct management control, create a very large-scale digital library?

28 It is possible to build a very large digital library with a small staff. But... Every aspect of the library must be planned with scalability in mind. Some compromises will be made. Philosophy

29 Basic Assumptions The integration team will not manage any collections The integration team will not create any metadata

30... to provide a coherent set of collections and services across great diversity The Integration Task...

31 Interoperability The Problem Conventional approaches require partners to support agreements (technical, content, and business) But NSDL needs thousands of very different partners... most of whom are not directly part of the NSDL program The challenge is to create incentives for independent digital libraries to adopt agreements

32 Function Versus Cost of Acceptance Function Cost of acceptance Many adopters Few adopters

33 Example: Textual Mark-up Function Cost of acceptance SGML ASCII HTML XML

34 The Spectrum of Interoperability LevelAgreementsExample FederationStrict use of standardsAACR, MARC (syntax, semantic, Z and business) HarvestingDigital libraries exposeOpen Archives metadata; simplemetadata harvesting protocol and registry GatheringDigital libraries do not Web crawlers cooperate; services mustand search engines seek out information

35 What to Index? Full text indexing is excellent, but full text indexing is not possible for all materials (non-textual, no access for indexing). Comprehensive metadata is an alternative, but available for very few of the materials. What Architecture to Use? Few collections support an established search protocol (e.g., Z39.50). Searching

36 Broadcast Searching does not Scale User interface server User Collections

37 Users Collections Metadata repository The Metadata Repository Services The metadata repository is a resource for service providers. It holds information about every collection and item known to the NSDL.

38 Search Architecture Portal Search and Discovery Services Collections SDLIP OAI http Metadata repository James Allan, Bruce Croft (University of Massachusetts, Amherst)

39 Other Topics User interfaces: data driven portals using a channel architecture Selection: selective web crawling, machine learning Quality measures: ???

40 The Mortal behind the Portal [This space left intentionally blank.]

41 The NSDL is a program of the National Science Foundation's Directorate for Education and Human Resources, Division of Undergraduate Education. The NSDL Core Integration is a collaboration between the University Center for Atmospheric Research (Dave Fulker), Columbia University (Kate Wittenberg) and Cornell University (Bill Arms). The Technical Director is Carl Lagoze (Cornell University). Acknowledgement

42 Society Cognitive Studies HCI Computer Science Applications Information Science