Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building the digital library community John Mark Ockerbloom Carnegie Mellon University February 8, 1999.

Similar presentations


Presentation on theme: "Building the digital library community John Mark Ockerbloom Carnegie Mellon University February 8, 1999."— Presentation transcript:

1 Building the digital library community John Mark Ockerbloom Carnegie Mellon University February 8, 1999

2 The library community Centuries of people who have created the works we can read and research in the library The modern-day community of library maintainers and users More than just a collection of information and people –information is organized in a usable manner –people are trained professionals, supported by their universities, governments More than just one library

3 The digital revolution Population of the Internet (both people and data) is growing exponentially –much of the information library users want is now digital Particular appeals: –Near-instant, cheap access to information –Breadth (and sometimes depth) of content –Easy to store, serve large quantities of information –Just about anyone can easily publish content, as well as “consume” it Although… –it’s often hard to find the high-quality information you really need (obscure, or not put on-line) –little coordination of information –distractions abound (noise mixed in with signal)

4 A vision: Revolutions in learning, teaching, creating, collaborating Very large libraries (tens of millions of “volumes”) available to scholars, students, general public Distributed: enables distance learning, collaboration Accessible quickly, at low cost, through a variety of methods Easily searchable, citable, extendable, preservable New kinds of resources available (e.g. conceptual networks, intelligent tutors)

5 A nightmare? Thousands of uncoordinated, isolated systems? Prohibitive cost structures? Hard to use systems, lacking the affordances of previous media? Data and metadata that become unusable within 10-20 years -- or less? –“404 Not found” –Vendor becomes unreachable… content becomes unreadable!

6 An opportunity Libraries can take the lead in forming communities for digital information –have expertise in managing information and organizing it for users –acquire much of the content University libraries can play special role –have large collections, variety of expertise –University DLs both serve the university and promote it –they need to be both conservative and innovative Challenges are both technical and social –library science, computer science, HCI, sociology/anthropology... –organization and support (politics, economics…)

7 Some design challenges in digital libraries Acquisition Cataloging/Searching –can be done at much finer grain –new ways of searching for things –new kinds of metadata may be important Access control Presentation / Interface Preservation and Maintenance

8 A key design principle: Sharing the work Useful even at small scale –Coordinated cataloging: The On-Line Books Page »8400 listings, 1M hits/month (60% nongraphical) –Sharing crucial metadata: Catalog of Copyright Entries –Coordinated acquisition: Catholic Encyclopedia –Inter-project dialogue: Book People mailing list Larger libraries, projects can enhance each other’s collections at larger scale

9 A specific problem: Data format mismatch Much of the information in a digital library is from outside sources, in variety of formats Most clients only understand a few formats They therefore cannot effectively use many materials –data may be in incomprehensible form –data may be in form not easily worked with Particularly problematic: –formats that have complex (but useful) structure –legacy data and programs (obsolete format assumptions) Most of the information in large libraries is “legacy”; long lifespan essential!

10 Standards are a partial solution “The wonderful thing about standards is that there are so many to choose from” –Data: SGML/XML, Word processor formats, HTML, PDF, Quark, specialized scientific formats, page image formats…. –Metadata: USMARC, Dublin Core, RDF... Standards allow common understandings... …But no one standard fits all –different sources may make different data choices –lowest common denonimator often not good enough –needs, applications, standards change (sometimes quickly)

11 TOM: A data model for mediating among diverse data formats Allows unfamiliar formats to be –operated on via outside services that understand the format –related to familiar formats –converted into usable formats »for the needs of a particular application or user »for migration from obsolete technology to new technology Works for data that is: –accessible as a (typed or typable) sequence of bytes, or –accessible through a well-defined, working protocol

12 TOM lets you get this...

13 …from this From: Sherry T Haddock To: caeti@nosc.mil Subject: CAETI Community Meeting Info Date: Thu, 15 Feb 1996 17:12:52 -0600 (CST) Mime-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="608184028-521714262-824425972=:20798" Cc: Sherry T Haddock This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --608184028-521714262-824425972=:20798 Content-Type: TEXT/PLAIN; charset=US-ASCII Here are maps detailing the March CAETI Community Meeting Location.... Thanks again, Sherry --608184028-521714262-824425972=:20798 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="CaetiMap.hqx" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: KFRoaXMgZmlsZSBtdXN0IGJlIGNvbnZlcnRlZCB3aXRoIEJpbkhleCA0LjAp DQoNCjojODBLQ0E0VCklZUtGISI2NiUzYzgmIjgtYCMzIiQpIU4hLSlEMyVt ZC1tNGkrJ2EnWiUhTiIhbCEhLSFyW20NCg0KKiEhQiFOIVgiISohJCEzIzMj IiEhISFtIU4hLSIhKiEkcltxMyFgIzMjMnEzcnJxM1hJaHJOIS1BISohJCFg Iw0KDQozIWAzIU4hLSYhKiEkIkojMyFgRiFOIS0pISohJCMzIzMhYFMhTiEt LCEqISQkISMzIWBkIU4hLTEhKiEkcltxDQoNCjMhcmxyTiEtNCEqISQlSiMz IWEtIU4hLTghKiEkJjMjMyFhQiFOITJxcmohJHJbcTNycnEzVCEiNSEqIXE (Emailed, MIME-attached, base64, binhexed, Powerpoint 3)

14 TOM: Key ideas Formats can be described by what information they contain how they represent it how they relate to other formats Object-oriented models capture these aspects I use an object-oriented metadata schema to describe data formats Much useful format info, services, distributed throughout the Net. Mediators give uniform access to diverse knowledge bases, services I use a network of mediators (type brokers) to assist with unfamiliar formats 1 2 So:

15 The architecture supporting TOM (simplified) Type Broker ServerClient Server Type Broker Clients get info on formats, request operations (e.g. conversions) Servers implement operations Brokers maintain info on formats, invoke servers for operations Brokers can trade info, consult other brokers Clients can also register new formats, operations, server information...

16 What’s good about this design? It’s simple (and therefore flexible): –Minimal, basic, well understood standards It’s accommodating: –Describes past, present and future data formats with good breadth and depth of expressiveness –It can be composed with a wide variety of programs and databases (including the Web, off-the-shelf programs) –Benefits start with very low investment, then increase It’s scalable (largely by taking advantage of distributed, interactive nature of Net): –Anyone can define new formats and services –Brokers coordinate contributions from Net community

17 A cooperative library network (simplified) Library Broker ServerClient Server Library Broker Clients get info on materials, request services (search, convert...) Servers provide materials, services Brokers maintain info on services, invoke servers for operations Brokers can trade info, consult other brokers Clients can also register new materials, metadata, services...

18 The importance of open content Sharing content and/or metadata gives each library boost from others Enables distributed indexing and cross- referencing –Alta Vista et al made possible by open content Enables replication, minimizing risk of information loss More flexibility in adapting and migrating information to new situations Users can improve, augment resources and feed them back to libraries

19 Summary The best large-scale digital libraries are built around community –technically: distributed, cooperative infrastructure (e.g. broker architecture of TOM) developed by experts in multiple domains –socially: cooperation between disciplines, organizations; designs that meet the needs of various constituencies University libraries like Penn’s can take a leading role in creating DL community –Have much of the collections, experts –Can provide testbeds, help their users, gain visibility Potential for revolutionary benefits, with the right designs

20 To find out more: TOM home page: –Conversion service, other demos, technical details, thesis document http://tom.cs.cmu.edu/ The On-Line Books Page: –Catalog, Book People archives, copyright entries, selected resources on on-line texts and libraries http://www.cs.cmu.edu/books.html Personal home page: http://www.cs.cmu.edu/~spok/


Download ppt "Building the digital library community John Mark Ockerbloom Carnegie Mellon University February 8, 1999."

Similar presentations


Ads by Google