A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton, TX 72603
Moen Access October 13–16, Halifax, Nova Scotia 2 Overview Quick description of SRW Brief background – historical, political, conceptual Non-technical (almost) introduction to SRW Common Query Language (CQL) briefly Concluding thoughts
Moen Access October 13–16, Halifax, Nova Scotia 3 What is SRW? Search and Retrieve Web Service (SRW) An XML-based protocol for searching, retrieving, and other information retrieval transactions Cast in the standards/technologies for web services XML SOAP HTTP Brings the concepts and experience of Z39.50 into the web environment using web technologies
Moen Access October 13–16, Halifax, Nova Scotia 4 Why SRW? Genesis: several years of soul searching by Z39.50 developers and implementors The “web” had become the common implementation environment Z39.50 was not perceived as web friendly Pivotal moments: December 2000 ZIG meeting July 2001 meeting
Moen Access October 13–16, Halifax, Nova Scotia 5 Turning point: December 2000 “Z39.50 Future” discussion Perceptions of Z39.50 broken heavy-weight difficult and complex old technology not web friendly Several options presented Rewrite the protocol from the ground up Rewrite as an XML protocol Separate the Z39.50 protocol from its use of BER as a wire protocol Simplify the protocol specifications to focus on core features Recognition of the intellectual contribution of Z39.50
Moen Access October 13–16, Halifax, Nova Scotia 6 Taking action: June 2001 Invitational meeting to discuss moving Z39.50 to an XML- based protocol Goal Lower the barriers to implementation while preserving the existing intellectual contributions of Z39.50, discarding those aspects no longer useful or meaningful. Objective Define specifications for a new web service definition based on Z39.50 together with web technologies Separate the Z39.50 abstract and associated semantic model from its specific encoding and wire protocol (i.e., ASN.1/BER and TCP/IP) Initially called Z39.50 Next Generation (ZNG) Intended as proof-of-concept Defining only those protocol specifications that would actually be implemented by participants
Moen Access October 13–16, Halifax, Nova Scotia 7 ZING – Z39.50 International Next Generation Make intellectual/semantic content of Z39.50 more broadly available Make Z39.50 more attractive by lowering barriers to implementation Use of XML – to represent and encode data Use of HTTP – for transport Use of SOAP – for interaction between client and server Several ZING initiatives: ZOOM, ez39.50, ZeeRex, SRW/U FOR MORE INFORMATION, VISIT THE ZING WEBSITE…
Moen Access October 13–16, Halifax, Nova Scotia 8 SRW/U, SRW, SRU SRW/U: Search and Retrieve for the Web General designation for this initiative SRW: Search and Retrieve Web Service XML messages Simple Object Access Protocol (SOAP) HTTP Post SRU: Search and Retrieve URL Service Request parameters included in URL syntax HTPP Get Development Version 1.0 November 2001 Version 1.1 February 2004 FOR MORE INFORMATION, VISIT THE SRW WEBSITE…
Moen Access October 13–16, Halifax, Nova Scotia 9 Networked information retrieval What’s needed: Identifying a target to search A vocabulary for expressing search requests, search criteria, retrieval requests, etc. Methods to encode the requests and responses from the target Methods to transport the requests and responses across a network In other words, a protocol and supporting specifications
Moen Access October 13–16, Halifax, Nova Scotia 10 Abstract Model of IR
Moen Access October 13–16, Halifax, Nova Scotia 11 Abstract model of Z39.50
Moen Access October 13–16, Halifax, Nova Scotia 12 Z39.50 classic & SRW
Moen Access October 13–16, Halifax, Nova Scotia 13 SRW Overview Builds on Z39.50 concepts and web technologies Web technologies: XML, SOAP, HTTP Uses new, human-readable query language Combines several Z39.50 features into several “operation types” searchRetrieve operation scan operation explain operation explain
Moen Access October 13–16, Halifax, Nova Scotia 14 searchRetrieve operation The core of the protocol Expresses the search and additional criteria Records are returned in XML Request parameters version query Optional parameters sortkeys recordPacking recordSchema recordXPath stylesheet Response parameters version numberOfRecords Optional parameters resultSetID resultSetIdleTime records diagnostics
Moen Access October 13–16, Halifax, Nova Scotia 15 SRW & XML XML as foundation for protocol Provides syntax for intelligent markup Defines or references XML schemas Example XML schema for SRW specifications searchRetrieveRequest searchRetrieveResponse
Moen Access October 13–16, Halifax, Nova Scotia 16 searchRetrieveRequest example XML document is sent to the server Using SOAP to wrap the request Sent as a HTTP Post 1.1 dc.title all "Squirrel Hungry" 1 dc
Moen Access October 13–16, Halifax, Nova Scotia 17 searchRetrieveResponse Records returned in response All records in XML syntax According to one or more XML schemas (semantics) Dublin Core Onix MODS MarcXML
Moen Access October 13–16, Halifax, Nova Scotia 18 searchRetrieveResponse example info:srw/schema/1/dc- v1.1 Squirrel is Hungry
Moen Access October 13–16, Halifax, Nova Scotia 19 searchRetrieve example Retrieval results XML view Screen shot 1.1 dc.title computer 1 10 xml dc>
Moen Access October 13–16, Halifax, Nova Scotia 20 SRW results
Moen Access October 13–16, Halifax, Nova Scotia 21 SRU briefly Protocol requests can be carried via HTTP Get searchRetrieveRequest parameters expressed in standard URL syntax baseURL and search part separated by question mark “?” Response is XML document containing records The searchRetrieveRequest in SRU: rchRetrieve&version=1.1&query=dc.title=%22compute r%22&recordSchema=DC&startRecord=1&maximumR ecords=10&recordPacking=xml rchRetrieve&version=1.1&query=dc.title=%22compute r%22&recordSchema=DC&startRecord=1&maximumR ecords=10&recordPacking=xml Eric Lease Morgan’s Journal LocatorJournal Locator Use of “extra data parameters” allow implementers to add additional functionality
Moen Access October 13–16, Halifax, Nova Scotia 22 search/Retrieve query SRW query consists of one or more query statements linked by Boolean operators Five categories of query statements: 1.single search clause 2.two or more search clauses linked by Boolean 3.search clauses and result sets linked by Boolean 4.two or more result sets linked by Boolean 5.single result set Expressed in the Common Query Language (CQL)
Moen Access October 13–16, Halifax, Nova Scotia 23 Common Query Language (CQL) A formal language for representing queries to information retrieval systems Simple free text Complex Boolean, proximity Human-readable Search clause Always includes a term simple terms consist of one or more words May include index name To limit search to a particular field/element Index name includes base name and may include prefix title, subject dc.title, dc.subject Several index sets have been defined dc bath cql Context sets in SRW define the available indexes for a particular application and additional query specifications (e.g., relation operators) Context sets Legend of the Five Rings Database
Moen Access October 13–16, Halifax, Nova Scotia 24 Other components of CQL Relation, =, =, <> exact used for string matching all when term is list of words to indicate all words must be found any when term is list of words to indicate any words must be found Boolean operators: and, or, not Proximity (prox operator) relation (, =, =, <>) distance (integer) unit (word, sentence, paragraph, element) ordering (ordered or unordered) Masking rules and special characters single asterisk (*) to mask zero or more characters single question mark (?) to mask a single character carat/hat (^) to indicate anchoring, left or right
Moen Access October 13–16, Halifax, Nova Scotia 25 CQL examples Simple queries: dinosaur "the complete dinosaur" Boolean dinosaur and bird or dinobird "feathered dinosaur" and (yixian or jehol) Proximity foo prox bar foo prox/>/4/word/ordered bar Indexes title = dinosaur bath.title="the complete dinosaur" srw.serverChoice=dinosaur Relations year > 1998 title all "complete dinosaur" title any "dinosaur bird reptile" title exact "the complete dinosaur"
Moen Access October 13–16, Halifax, Nova Scotia 26 SRW & classic Z39.50 SRW No explicit concept of connection, session, or state Results sets named by server Single record syntax (XML), multiple schemas String (i.e., human- readable) queries CQL Named indexes Classic Z39.50 Stateful Results sets named by client Multiple record syntaxes No human-readable query language Type 1 query using attribute sets Use attribute to identify access point Z39.50 Concepts Retained Result sets Abstract access points Abstract record schemas Explain Diagnostics
Moen Access October 13–16, Halifax, Nova Scotia 27 What problems does SRW solve Addresses need for standards-based searching in the networked environment Shows the vitality of the Z39.50 concepts and implements those in a web services & URL access context Offers database providers with a web-friendly method for offering standards-based searching of resources Provides low barrier to entry solution using commonly available technologies XML format of records provide for more reuse, and more interesting use of resources
Moen Access October 13–16, Halifax, Nova Scotia 28 Possible implementation venues Gateways to existing Z39.50 servers Lightweight SRW/U servers to specialized databases Standard search interface for OAI service providers and institutional repositories Cost-effective search access to commercial databases (e.g., citation, full- text) Metasearching Beyond libraries to many other information communities
Moen Access October 13–16, Halifax, Nova Scotia 29 References Z39.50 International Next Generation – ZING Search and Retrieve for the Web – SRW/U A Gentle Introduction to SRW A Gentle Introduction to CQL An Introduction to the Search/Retrieve URL Service (SRU) by Eric Lease Morgan in Ariadne (July 04) Search and Retrieval in The European Library: A New Approach by van Veen and Oldroyd in D-Lib (Feb 04)