Digital Libraries: An Overview President’s Board Room, 210 Burruss Jan. 18, 1999 Edward Fox, John Carroll, Gail McMillan, Clifford Shaffer, Robert Williges
Digital Libraries F Why of global interest? F Why of interest in computing? F Why of interest in universities? F Definitions F NSF Digital Libraries Initiative
DLs: Why of Global Interest? F National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly F Knowledge and information are essential to economic and technological growth, education F DL - a domain for international collaboration –wherein all can contribute and benefit –which leverages investment in networking –which provides useful content on Internet & WWW –which will tie nations and peoples together more strongly and through deeper understanding
Why of Interest in Computing? F Presents exciting challenges in key fields like: database management, multimedia, hypertext, information retrieval F Efficiency requires advances in, e.g., –software: programs, algorithms –hardware: storage, computers –networking: faster, more reliable, quality multimedia F Effectiveness requires advances in, e.g., –HCI (ex., visualization, DLs embedded in dist. ed.) F Computing can help others who want DLs
Why of Interest in Universities? F Source of funding for select research groups F DLs can be used for research and teaching –California Digital Library –NSF’s Science, Mathematics, Engineering and Technology Education (SMETE) initiative to build a digital library to support undergraduates F DLs can support outreach, dissemination –Scholarly Publishing: ETDs, e-journals, tech reports –Extension information, public relations,... F Students can find jobs if trained suitably
DLs: Definitions F Super information systems F Knowledge management systems with persistence, organization, and usability F Collections of digital objects with expanded services, for distributed user communities, without limitations of space, time, physical copies F Latest implementation of visions of Bush, Licklider, Nelson, and previous scholars F Systems, services, institutions, enterprises, and projects of the digital library community
DIGITAL LIBRARIES INITIATIVE Funded through a joint initiative of: National Science Foundation Defense Advanced Research Projects Agency National Aeronautics and Space Administration Stephen M. Griffin Division of Information and Intelligent Systems National Science Foundation National Synchronization Home Page
Computing (flops) Digital content Communicat i ons (bandwidth, connectivity) Locating Digital Libraries in Computing and Communications Technology Space Digital Libraries technology trajectory: intellectual access to globally distributed information lessmore
Core Sponsors: NSF, DARPA, NLM, LoC, NASA, NEH F ~$8-10 million/yr for 4-5 years (beginning FY98) F sponsor a full-spectrum of activities –fundamental research, content & collections development, domain applications, testbeds, operational environments, new resources for education and preserving America’s cultural heritage F address topics over entire DL lifecycle –information creation, dissemination, access, use, preservation, impact, contexts F implement a modular, open program structure –add new sponsors, performers, projects at any time Digital Libraries Initiative - Phase 2 Program Goals: new DL research, technologies and applications to advance the use of distributed, networked information of all types around the nation and the world
Goals for the Future F Gather information and build collections (to understand the incompleteness of our knowledge) Create new communities (to communicate and collaborate) Make technology disappear (from our awareness and experience)
Prior/Current Grants F NCSTRL F ENVISION F EI F CSTC, CRIM, NRG F NDLTD
NCSTRL: CS TECHNICAL REPORTS F CS TR project supported by ARPA (Berkeley, CMU, Cornell, MIT, Stanford) F WATERS project funded by NSF and led by ODU, SUNY Buffalo, UVA, VPI&SU F Merger summer 1995 to (Networked CS Tech Report Library) F Most large departments now have joined F “Central” server: UVA, “backup”: VPI&SU F 1998 extension to preprint service, with LANL
Larger NSF Grants F ENVISION: one of first DL projects F “Interactive Learning with a Digital Library in CS”: (11M accesses to over 45 courses) F “Computer Science Teaching Center” by NSF and ACM Education Board: F “Curriculum Resources in Interactive Multimedia”:
ENVISION F A User-Centered Database from the Computer Science Literature ( ) F Collected bib. data, converted to SGML F Converted typesetter data to SGML F Scanned thousands of page images F MARIAN search engine (also applied to the Virginia Tech library catalog) used as part of a prototype object-based DL, with tailored visualization interface (L. Nowell dissertation)
NSF Education Innovation (EI) F NSF “Interactive Learning with a Digital Library in Computer Science” ( ) F 45 online courses, 100+K accesses/wk, plus: DL courseware, overall EI project pages F Tools: SWAN (visualization), QUIZIT F Evaluation –traditional –network logging and analysis –tools for visualization
PAPERLESS COURSES F CS1604: Introduction to the Internet F CS3604: Professionalism in Computing F CS4624: Multimedia, Hypertext and Information Access (MHIA) F CS5604: Information Storage and Retrieval F CS6604: Digital Libraries F Self Study Course on Digital Libraries
CS Teaching Center (CSTC) F Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. F Learners benefit from having well-crafted modules that have been reviewed and tested. F Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built. (See NSF SMETE-Lib Study at )
CSTC -> CRIM F CSTC will have a variety of focused centers so that different types of resources can be collected, tested, and suitably packaged: –laboratory exercises, activities, assignments (CNJ) –visualizations/visualization tools (U Ill. Springfield) –interactive multimedia resources (VT & GWU) F CRIM focuses on interactive multimedia –repository of materials (like SMETE-lib) –curriculum for courses in CS and other areas, sequences, undergraduate and graduate programs
Network Research Group F NSF 3 year grant on WWW logging, characterization, and optimization: Abrams, Fox, Pollard (CNS) F Core member of Web Characterization Activity of World-Wide Web Consortium F Providing DL to support WCA: –logs –tools –publications
NDLTD F News, Background/History F Vision, Benefits, Approach, Possibilities F Concerns, Problems, Opposition F Solutions, Implementation, Results, Plans
What led to today’s situation? F 1987 mtg in Ann Arbor: UMI, VT, … F 1992 mtg in Washington: CNI, CGS, UMI, VT and 10 universities with 3 reps each F 1993 mtg in Atlanta to start Monticello Electronic Library (MEL): SURA, SOLINET F 1994 mtg in Blacksburg re ETD project: std of PDF + SGML + multimedia objects F 1996 funding by SURA and US Dept. of Education (FIPSE) for regional, national projects (NDLTD)
F Aiding universities to enhance grad educ., publishing and IPR efforts: to help improve the availability and content of theses and dissertations F Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i.e., are Information Literate and can be more expressive) F Demonstrating how, for other organizations What are we doing?
A Digital Library Case Study F Electronic theses and dissertations (ETDs) F Submission: F Collection: Networked Digital Library of Theses and Dissertations (NDLTD) (formerly “National” because of Fed. funds, before international members started joining)
Something for Everyone F Students - contribute -> gain acclaim F Universities - join -> help your students, gain increased DL experience + visibility F Researchers - use, encourage -> content F Publishers - liaise, support -> have more knowledgeable authors + backup details F DL enthusiasts - adapt resources / ideas -> have exemplary pilot / model project
What are the key ideas? F People can switch to electronic documents –Becoming more expressive with hypermedia F Mandating ETDs will change all future scholarship (for 100’s of thousands/yr) F Scalability –Empower authors to submit to DL, as a natural part of the educational process –Study workflow & apply automation, so institutions streamline processing and build their part of the DL –Federate along most suitable cultural/political lines
Key Ideas: Networked infrastructure Scalability Education is the rationale University collaboration Workflow, automation Authors must submit Maximal access PDF, SGML, MM Standards Federated search 8th graders vs. grads MARC, DC, URNs
User Search Support Note: There are 51 members worldwide, growing
ETD Initiative (and UMI) Students Learn about DL, EPub TDs become more expressive N. Amer. (T)Ds are accessible, archived Global TDs become more accessible, archived UMI Universities
Support Services Developed F CD/WWW site with > 300M: student guidelines, listservs, FAQs, press info, multimedia training materials F Automated submission system F SGML DTD for ETDs, SGML to HTML (web generator) F Donations: Adobe, Microsoft F Evaluation: instruments, analysis
NDLTD Future Work F Recruit more members, support current ones: May Blacksburg workshop F Interoperability tests among universities and with UMI to provide integrated services F Study with testbed that emerges, to improve information retrieval, browsing, interface, and other types of user support F Evaluation, improving learning experience, spread to worldwide initiative, sustainable support and coordination
Pending/Future Proposals F NUDL, NATO, other int’l F PetaPlex (RI, MRI) F EI2, Smaller DLI2 F SMETE-lib F DL4U
NUDL/NATO F July 1998: Proposal to help Russia establish NDLTD, with assistance from US, Portugal F 1/15/99: Proposal to NSF under DLI2 international program for $.5M –Fox, Kleiner, McMillan, Eaton –Partners: UK (2), Singapore, Russia, Korea, Germany, plus Iberoamerican group (Spain, Portugal, Argentina, Brazil, Chile) –Multilingual search, multimedia submissions, requirements/usability,...
PetaPlex F High-performance “superstore” F 1000 to 1,000,000 gigabytes (terabyte/petabyte) F Parallel computer, video WWW server, … F Part of NSF CISE proposal of 11/98 (40Tbytes) F Preproposal submitted for NSF MRI –CWT: wireless connections for flexibility –CPES: low cost power (CS): software, applications, experiments for digital library server
EI2, Other DLI2 F EI2: successor to effort, to be led by Osman Balci, focusing on DL of models and simulations to help learners F DLI2 second competition supports small, medium and large grants –Recommender for a library of software –Experiments using PetaPlex for DL applications –Possible joint efforts, e.g., evaluation methods with UNC Chapel Hill
SMETE-lib F Central coordination: manage NSF’s DL for undergraduates F Content coordination: help collect and provide access to body of material by topic or genre –CS, Math, … –Models/simulations, algorithms/programs, laboratory materials, … F Partnering necessary: OCLC?...
DL4U F Submitted 7/15/98 for $4M (5 years) F Virtual corporation (similar model to efforts of ECpE but with different type of activity) F 5 Divisions –User Support: Local, Remote –Collection and Testbed –User Interface & Environments –Evaluation & Usability –Business
DL4U Organization
DL4U VT Investigators
DL4U VT Investigators cont’d
DL4U Partners