Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search William H. Mischo Grainger Engineering Library.

Slides:



Advertisements
Similar presentations
LOCALIZED REFERENCE LINKING PROJECT Dale Flecker NFAIS/NISO Linking Workshop February 24, 2002 Philadelphia.
Advertisements

Overview Environment for Internet database connectivity
CrossRef Linking and Library Users “The vast majority of scholarly journals are now online, and there have been a number of studies of what features scholars.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
XML: Extensible Markup Language
Tying it all Together: Integrating Digital Collections William H. Mischo, Mary C. Schlembach Grainger Engineering.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
BC Integration of Systems and Resources MetaLib at Boston College Theresa Lyman Digital Resources Reference Librarian Boston College Libraries.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Beth Forrest Warner Director, Digital Library Initiatives University of Kansas Presentation to Oregon State University Library May 5, 2003.
NSDL 2 nd Generation Mathematics Digital Library ASEE Annual Meeting June 13, 2005 Portland, OR William H. Mischo
Metasearch Technologies: Definitions, Issues, Reference Applications William H. Mischo & Mary C. Schlembach
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
Content Management at Grainger Engineering Library Case studies from various digital library research projects Tom Habing
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Publishing Solutions for Contemporary Scholars: The Library as Innovator and Partner Sarah E. Thomas University Librarian Cornell University Ithaca, NY.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
Digital Library Architecture and Technology
Digital Library Technologies at the Grainger Library William H. Mischo, Timothy W. Cole, Tom Habing Grainger Engineering Library Information.
Digital Library Issues and Trends William H. Mischo Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign.
University of Illinois at Urbana-Champaign OAI Alpha Experiences Timothy W. Cole Thomas G. Habing Grainger Engineering.
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
The Illinois Digital Library Initiative: Processing and Access Issues for Full-Text Journals May 27, 1998 Pennsylvania State University William H. Mischo.
Localized Linking Prototype CNI April 10, 2001 Dale Flecker, Larry Lannom, Rick Luce, Bill Mischo, Ed Pentz.
DPubS: An Open Source Electronic Publishing System Sarah E. Thomas Cornell University Library CNI December 2005.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
1 CrossRef - a DOI Implementation for Journal Publishers January 29, 2003 CENDI Workshop.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
1 Ed Pentz, CrossRef CrossRef and DOIs: New Developments 32 nd LIBER Annual General Conference Extending the Network: libraries and their partners 18 June.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
© Ex Libris Ltd. All Rights Reserved. SFX: An Open Linking Framework for the Hybrid Library Tamar Sadeh ELAG 2001.
OpenURL Link Resolvers 101
Interfacing Registry Systems December 2000.
Linking electronic documents and standardisation of URL’s What can libraries do to enhance dynamic linking and bring related information within a distance.
The DOI Standard Nettie Lagace NISO Associate Director for Programs CEAL Workshop on Electronic Resources Standards and Best Practices March.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
UNESCO ICTLIP Module 1. Lesson 61 Introduction to Information and Communication Technologies Lesson 6. What is the Internet?
DOI & Crossref Arnoud de Kemp Springer-Verlag
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Primary funding is provided by the JISC and ESRC. Based at Manchester Computing, The University of Manchester. 1 1 Getting Technical - Linking UKSG Serial.
Extending Access To Information Resource Discovery Service William E. Moen, Ph.D. Kathleen R. Murray, Ph.D. School of Library and Information Sciences.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Corporation For National Research Initiatives Technical Issues in Electronic Publishing Corporation for National Research Initiatives William Y. Arms.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
DSpace - Digital Library Software
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
No Longer Under Our Control? The Nature and Role of Standards in the 21 st Century Library William E. Moen School of Library and Information Sciences Texas.
Distributed Service Registry Workshop, Warwick, U.K. 1 Distributed Functionality in the UIUC OAI Registry
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Networked Information Resources Federated search, link server, e-books.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Web Services Overview Thomas Hickey. 2 What are Web Services? Machine-to-machine communication Run over standard Web protocols –XML syntax, HTTP packaging.
21 October 2000 MathML & Math on the Web Illinois D-Lib Testbed: Technologies for Converting Legacy Mathematics for Display on the Web Timothy W. Cole.
The Open Archives Initiative: Perspectives on Metadata Harvesting OAI Provider & Harvesting Services at the University of Illinois Timothy W. Cole Mathematics.
Beyond HTML: Extensible Markup Language (XML)
Qualified Dublin Core Using RDF for Sci-Tech Journal Articles DC-2001 International Conference on Dublin Core and Metadata Applications, October 22-26,
Metasearch: Top-Level Interface, Reference Applications
Digital Library Issues and Trends
DPubS: An Open Source Electronic Publishing System
Digital Library Issues and Trends
Presentation transcript:

Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search William H. Mischo Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign 2002 International Conference on Digital Archive Technologies (ICDAT2002) December 19, 2002

Outline Digital Libraries and the Distributed Information Environment. Document Representation and Full-Text Digital Library Tools Illinois Projects. XML Technologies. Metadata Technologies. DOIs, Linking, Local Resolver Portals, Simultaneous Search, Linking Grainger Search Aid Issues & Trends.

The Digital Library ‘Digital’, ‘Virtual’, ‘Electronic’ Library as network-based library without regard to place and time. Tendency to apply term to collections and resources. Digital Collections vs. Digital Library. Emphasis on the integration of collections and services (e.g. NSDL grant). Application of standards and protocols is important.

Scholarly Communication Overview E-Resources are Web-based and publisher-centric. Growth of Heterogeneous Distributed Repositories. Value-added services and ‘branding’ of journals. Prestige of Journals and Publishers Reciprocal linking relationships between publishers. Cooperation on linking standards (DOI, CrossRef). Alternative publishing models - Academia, Preprint Servers, disintermediation.

Distributed Information Environment We live in a world of multiple, heterogeneous information repositories, resources, portals, and IR systems. –OPACs – local, regional, national shared bibliographic databases. –Local and remote A & I Services. –Discrete publisher and vendor repositories (full-text). –Web search engines, vertical portals, custom portals (NSDL, ARL Portal). –Local metadata, digital objects, GIS, finding aids. –Preprint servers and institutional repositories (D-Space). –Instructional (course) management systems (WebCT, Blackboard). –Harvestable (OAI) sites and services.

Distributed Repository - Issues Integration of discrete, heterogeneous information resources. Role of federated and broadcast searching of distributed resources. Integration of collections with reference, instructional and navigation services -TOC, remote reference assistance. Integration of Library, institutional, vendor, publisher, and government portals and information services. Linking technologies. Metadata harvesting, archiving.

Distributed Environment Action Plan Pressing need for document representation, retrieval, transmission, and linking middleware tools and standards. Metadata standards, DOIs, OpenURL. Factor: changing landscape of Scholarly Communication and disintermediation of publishers and libraries. Federated search and simultaneous search with reference linking as mechanism to integrate DL landscape.

Portal Functions: --Authorization --Linking mechanisms between resources and among resources. --Simultaneous search. --Navigation OPAC A& I Services (Local and Remote) Full-Text Resources Web Client Portal Presentation Level Local Link Server, Local Value-Added Local Databases and OAI Resources via DBMS Linking: --Between full-text using DOI, CrossRef, Appropriate Copy. --Between A&I and full-text. --Between OPAC and full-text. Web Resources & Knowledge Environments E-Resource Registry Aggregator (Ebsco, OCLC) Publisher Portal (Elsevier) CrossRef Metadata DOI Server

Document Representation Continuum of Web-Enabled technologies -- all presently being utilized. Evolving technologies and standards. Role and history of markup. XML: its role and importance. The Smart Document.

Digital Library Tools We have at our disposal the tools to create integrated digital libraries from the distributed digital resources environment in which we operate: –Standard retrieval environment (Web) and interface/client (Web Browser); –Standard transport mechanisms to connect heterogeneous content (HTTP, OAI, SOAP); –Standard metalanguages and tools for describing and transforming content and metadata (XML, DTDs & Schemas, XSLT, DC/DCQ, RDF, METS); –Standardized search/retrieval mechanisms (HTTP Post/Get, SQL, Z39.50, Object Oriented Databases); –Standard linking tools and infrastructure (DOI, OpenURL, CrossRef). Candidate set of ‘best practices’ for IR.

Work by Illinois DLI Group We are attempting to address many of these issues within the Digital Library Initiatives group. Headquartered at Grainger Engineering Library Information Center at UIUC. Grant Work: –Digital Library Initiative I (NSF, others), –Corporation for National Research Initiatives (CNRI) D-Lib Test Suite, –Collaborating Partners Program, –Andrew Mellon Foundation OAI Harvesting grant, –NSF NSDL (National Science, Engineering, Technology, and Mathematics Digital Library) Program, –Institute of Museum and Library Services (IMLS) Registry and Integration grant,

Illinois Testbed Project Funded under DLI-I by NSF, DARPA, and NASA, Awards made to 6 universities. Large-scale Testbed, Distributed Repository models, evaluation, Web software. Funded under CNRI D-Lib Test Suite Program, 1998—2001. Collaborating Partners Program. AIP, APS, ASCE, IEE, NRL, ASM, ACM, NTT Learning Systems, Elsevier. All XML Journal -- AIP, APS, ACM.

Illinois Full-Text Testbed American Institute of Physics--APL, JAP, RSI –19,000+ articles, American Physical Society--PRL –15,000+ articles, , weekly updates. ASCE Journals (25 titles) –11,000+ articles, IEE Proceedings and Electronics Letters –9,500+ articles, IEEE Computer Society. ASM (American Society for Materials) Handbook. ACM (Association for Computing Machinery) Transactions. Elsevier Science.

Accomplishments Process & retrieve from multiple publishers & heterogeneous DTDs. SGML to XML Conversion. Development of a metadata specification that uses RDF, Dublin Core (DCQ and XML) XML Schemas, local Namespace. Cross-repository searching (Testbed & D-LIB Test Suite). Full-Text and Metadata. XSLT, CSS, for transformation & rendering, including Mathematics.

Accomplishments (2) Introduction of numerous technologies now deployed within publisher repositories: –Forward and Backward links in bibliographies -- within Testbed/Repository, from/to A & I Services. –Use of XSLT for transforming XML to HTML. –Rich extended abstracts. Conversion of ISO math markup to MathML. CSS/DHTML mathematics rendering. Use of plug-ins. Enhanced Web retrieval mechanisms: Author Word Wheels, Co-Occurrence Matrices. Local Link Server for DOIs, Context-Sensitive linking.

XML (eXtensible Markup Language) Like SGML, a Data Description Metalanguage. XML a subset/version of SGML. Document representation and interchange Standard. Allows fine-granularity markup of content and structure. Author can create their own elements (extensible). Tags define the structure of document not the presentation format. Validated vs. “well-formed” - separation of authoring process from representation & presentation. Either validated in DTD/Schema or well-formed. Integrated with relational DBs.

XML Features The milestones in document description and transmission: ASCII, TCP/IP, HTTP and HTML, XML. Web Programmability. DTD not required with XML. Needed if internal entities. Use of Document Object Model (DOM). Technology approach from Web developer’s standpoint: XML data, CSS presentation layer, XSLT to transform the structure (‘view’) of the data/document.

XML in Information Technologies Used in Open Archives Initiative (OAI), NSDL. Compatible with MS SQL Server, Tamino (Software AG), Oracle, DLXS/XPAT (University of Michigan/OpenText), others. Integral to Web Services (WSDL) and SOAP – Google Web Service. Used in Library of Congress MODS and METS metadata technologies. Baked into XyVision and publishing packages.

XML, XSLT, and CSS Use XML full-text articles as ordered hierarchy of content objects. Generate item-level metadata in XML, using RDF and Dublin Core syntax and semantics. XSLT and CSS used to present metadata and articles in either XML or HTML format depending on Browser. Mathematics rendering using MathML tools (conversion from ISO to MathML). Real-time transformation between XML and HTML using XSLT.

Schemas vs. DTDs Both are systems of representing a data model that defines the data’s elements and attributes, and the relationship among elements. Schema addresses limitations of DTDs and the increasingly data-oriented role of XML. W3C XML Schema Working Group: two documents: XML structures and datatypes.

Schema Justification Description of document type’s structure should be in an XML document instead of written in special syntax (DTD). Schema are in XML: easier to edit and process using standard XML DOM manipulation tools. DTD notation doesn’t allow schema designers the power to impose strong data typing -- for example, the ability to say that a certain element type must always have a positive integer value, that it may not be empty, or that it must be one of a list of possible choices.

Metadata and Linking Standards Digital Object Identifier (DOI) and Persistent Object Identifiers. OpenURL and Value-Added Service Components (SFX). Open Archives Initiative (OAI), Dublin Core and Qualifiers, RDF. Local Resolver Servers.

Open Archives Initiative (OAI) Released version 1.0 of metadata harvesting protocols. Frozen through second quarter Mechanism for data providers to expose their metadata through an HTTP protocol and a mechanism for harvesting records containing metadata from repositories. Roots in e-print archives. Lightweight, low-barrier. Easy to implement Web server to handle OAI protocol requests; need to develop procedures to access and extract your metadata.

Ongoing Investigations Relationship between interoperability models for search and discovery: federated searching (OAI harvested) and broadcast, simultaneous searching of distributed repositories. Not mutually exclusive. OAI Provider and Harvesting software. Encoding Archival Description (EAD). OAI Engineering/CS/Physics site. Role of HTTP harvesting, Spider technology. Reference Linking integration built on OpenURL and DOI. Reference Assistant software with simultaneous search, point-of-contact assistance, and remote reference capability.

Portals and Gateways Role is to bring together and integrate disparate e-resources. Provide a systematic ‘view’ of the information landscape, particularly full-text. Two primary foci: robust search/navigation and the ability to link everywhere from anywhere in the environment of OPACs, A & I Services, full-text. Central to this implementation is federated and simultaneous search and reference linking technologies.

Digital Object Identifier (DOI) DOI is both a unique identifier of a piece of digital content AND a system to access that content digitally. Persistent object identifier. ‘The ISBN for the 21st Century’ -- Norman Paskin. DOI system has two main parts: (the identifier and a directory system) and a third logical component, a database. Developed by AAP (Association of American Publishers), now managed by International DOI Foundation.

DOI Construction First real open standard for content identification. DOI is a number that identifies a digital object: – /S Registration Agency Prefix 1063Publisher Prefix S Suffix (Publisher-assigned ID) Suffix can be SICI or PII. The DOI and URL pointing to the digital object, is registered with the International DOI Foundation, e.g: – /333 |

Using a DOI DOIs are resolved using the Handle System technology from CNRI (Corporation for National research Initiatives). Retrieval of object is two step process: link is sent to central directory where current Web address is stored, location is sent back to browser with special message to redirect to address, e.g: –dx.doi.org/ /333 redirects to

Reference Linking CrossRef Publisher system: major Sci-Tech professional societies and commercial publishers. System design calls for one URL for each DOI; underlying technology can handle multiple URLs however. Issue: Directing users to locally held or licensed version of Digital Object (locally loaded or from Aggregator). Appropriate Copy problem.

Cookie on client Client (Web Browser) DOI Proxy Illinois Local Link Server OpenURL Aware Local AIP, IEE CrossRef Metadata Database dx.doi.org/ /1234 Handle Server AIP IEE Elsevier DOI Metadata Local Value Added Nosfx=y UIUC Metadata Registry OpenURL

Simultaneous Search Implementations DialIndex from Dialog. Ex Libris MetaLib service. Endeavor EnCompass. Innovative Interfaces MetaFind. Ovid Multiple Search and reference De-Duping. ISI Web of Knowledge. Gale Corporation InfoTrac Total Access. WebFeat. California Digital Library SearchLight system. Los Alamos FlashPoint system. Fretwell-Downing partnering with ARL Portal and Monash University.

Grainger Search Aid Assist users in the selection of appropriate databases. Normalize user search arguments and display search results from candidate databases. Cross-database asynchronous concurrent searching. Article level and e-journal Web site access to publisher full-text repositories. Utilize OpenURL, CrossRef metadata database and DOI for reference linking at the article level. Proxying of vendor systems and capability of ‘taking over’ the search in vendor native mode.

Grainger Search Aid

Reference Assistant Project Utilize Search Aid simultaneous search and link capabilities. Opportunity to explore interface and navigation issues. Mimics the behavior of reference librarian. Allows the application of ‘best match’ and ‘quorum searching’ algorithms.

Reference Assistant Top Menu

Simultaneous Search Implementations Shared Blackboard approach employing Independent Searchbots dedicated to searching information resources and passing results to Web clients. Event-Driven, Asynchronous HTTP Queries from within a Single Script returning results to Web browser.

Event-Driven, Asynchronous Queries Single, event-driven web server process, asynchronously querying multiple resources. Uses WinHTTP from ASP and VBScript Simpler, not as flexible. Search algorithms and processing coded in scripts. This is the approach we currently use for our service. Implementation of multi-step login and session variable passthru being investigated.

OpenURL-Based Services Standard for expressing and transmitting metadata. Promise of standardized, normalized search results. Provides value-added links to the Ovid search results. Using CrossRef metadata database to look up DOIs.

CiteParse.dll An ActiveX DLL which can parse various Ovid citations and turn them into OpenURLs: Tansu N. Chang YL. Takeuchi T. Bour DP. Corzine SW. Tan MRT. Mawst LJ. Temperature analysis … quantum-well lasers. [Article] IEEE Journal of Quantum Electronics. 38(6): , 2002 Jun. N&atitle=Temperature+analysis+…+quantum- well+lasers&title=IEEE+Journal+of+Quantum+Electronics& volume=38&issue=6&spage=640&epage=651&pages= &date=

Conclusions User reactions very positive. The one-stop-shopping approach has been successful. Users consider ability to link to full-text from citations in A & I Services and from references on publisher portals very helpful. Technically, best approach appears to be a hybrid of asynchronous client interface with Web Services querying databases. Moves database middleware to Web Services and eliminates extensive custom script code for search and database query.

Publishing Trends Publishers will continue to add value to online journal articles. Digital version will become version of record. Virtual journals (both publisher-based and cross-publisher) will become common. Next-generation knowledge environments will evolve. Multimedia, data exposed, live equations with in-place calculations.

Publishing Trends (Continued) Personalized services will be available -- agent technology, alerting services. Different economic and subscription models will be introduced. Deconstruction of Journal (Bob Kelly, APS); article at a time publishing. Journal branding or perhaps publisher branding. Academia issues: publishing, tenure.

Continuing Issues Role of Authors, Academic Institutions, Libraries, Publishers, Abstracting & Indexing Services. Disintermediation may affect both Libraries and Publishers. Information as Function not Place. Provide a ‘Digital Library’ out of digital collections. Role of XML technology. Service mechanisms: processing & archiving, search and discovery, presentation, linking.