Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introducing some Standards Paul Miller Interoperability Focus UK Office for Library & Information Networking (U KOLN )

Similar presentations


Presentation on theme: "1 Introducing some Standards Paul Miller Interoperability Focus UK Office for Library & Information Networking (U KOLN )"— Presentation transcript:

1

2 1 Introducing some Standards Paul Miller Interoperability Focus UK Office for Library & Information Networking (U KOLN ) P.Miller@ukoln.ac.ukhttp://www.ukoln.ac.uk/ U KOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (J ISC ) of the Further and Higher Education Funding Councils, as well as by project funding from J ISC and the EU. U KOLN also receives support from the Universities of Bath and Hull where staff are based.

3 2 So… why use standards? Benefit from the expertise of others Enforce rigour in internal practices Facilitate interoperability (and access) –Considered deployment of standard solutions makes access to your resources feasible for many.

4 3 What do standards do? Help identify what’s important –CIMI’s “Access Points” –Mandatory fields Allow for consistent use of terminology –Name Authority Files –Thesauri –Look–up tables Enable internal and external data exchange or access Reduce duplication of effort Minimise (hopefully!) wasted effort Reflect consensus.

5 4 What types of standard are there? Terminology –‘Roma’, not ‘Rome’ –‘Roma’ is preferred to ‘Rome’ Format –‘Miller, A.P. 1971–’, not ‘Paul Miller’ ‘Semantics’ –A gross simplification, and a very big bucket –‘Creator’, ‘Subject’, ‘Title’, ‘Description’… Syntax – Transfer –ftp://ftp.niso.org/ ….

6 5 Terminological Standards (Based upon an earlier presentation with Matthew Stiff of mda) See www.ariadne.ac.uk/issue23/metadata/

7 6 The need for control… European Community E.E.C. Common Market European Union !

8 7 Without control of terms... Users are –incorrectly utilising search terms –failing to find significant resources –suffering from information overload –almost as well using Google Creators are –cataloguing inconsistently –unable to convey hierarchical concepts –Scotland is in United Kingdom is in Europe is in... –perpetuating localised terminology –unable to assess, let alone undertake, integration projects.

9 8 With control... Users might –gain more effective access to a resource –gain far more effective access across resources –reduce the number of ‘false hits’ –find what they are looking for –even learn to think and express themselves in a structured manner. Creators might –produce more valuable resources –convey complex semantic and structural concepts –move towards disciplinary, national, international or global terminologies –effectively integrate both new and existing resources.

10 9 Controlled Vocabulary European Union  E.E.C.  Common Market  European Community ... Etc. With a controlled vocabulary, one or more of these terms might be permitted. Use of the others for record creation or retrieval would be rejected by the system.

11 10 Thesaurus-based Control European Union [preferred term] E.E.C. [synonym] Common Market [synonym] European Community [synonym]... Etc. [synonyms] In a thesaurus, all of the terms might be considered equally valid, with one identified as the preferred term and the others as synonyms But... Are they really synonymous...?

12 11 Thesauri A traditional thesaurus defines synonyms and, perhaps, antonyms for terms within a given language. E.g. –‘workshop’ atelier, factory, mill, plant, shop, studio, workroom...or... ? class, discussion group, seminar, study group.

13 12 Thesauri in Information Retrieval In the context of information retrieval, thesauri do more, facilitating the creation of hierarchies of meaning....

14 13 Hierarchies of Meaning ‘Glass’ ‘Beer Glass’ ‘Wine Glass’ ‘Red wine glass’ ‘White wine glass’

15 14 Thesaurus Components Most thesauri are constructed in a standard form, as defined by ISO 2788 and various national standards. –ISO 5964 extends discussion to multilingual issues Four basic relationships are fundamental in thesaurus construction and use... –Equivalence (preferred and non-preferred terms) –Hierarchy (‘glass’ is broader than ‘wine glass’) –Association (establishes non-hierarchical relationships) –Scope notes (provide guidance and clarification).

16 15 Equivalence As with the European Union example, there are often situations in which users or cataloguers wish to allow multiple synonyms for any one term. –In these cases, one term may be defined as a preferred term “Electricity Plant USE Power Station” –Here, ‘Power Station’ is the preferred term Example from RCHME Thesaurus of Monument Types, © RCHME 1995.

17 16 Hierarchy An important capability of thesauri is their ability to reflect hierarchies, whether conceptual, spatial, or whatever. –Individual thesaurus entries are linked to a class (CL), as well as to broader (BT) and narrower (NT) terms. “BAYONET CL Armour and Weapons BT Edged Weapon NT Plug Bayonet NT Socket Bayonet” Example from mda Archaeological Objects Thesaurus, © mda, English Heritage, RCHME 1997.

18 17 Association In any large thesaurus, a significant number of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy. –Related Terms (RT) can be used to show these links within the thesaurus. “CHURCH RT Churchyard RT Crypt RT Presbytery” Example from RCHME Thesaurus of Monument Types, © RCHME 1995.

19 18 Scope Notes Thesaurus entries can often be terse, and difficult to interpret for the non- expert. –Scope Notes (SN) serve to clarify entries and avoid possible confusion. They serve to embody the underlying concept, rather than the language-specific word. “CHITTING HOUSE SN A building in which potatoes can sprout and germinate” “FERRY SN Includes associated structures” Examples from RCHME Thesaurus of Monument Types, © RCHME 1995.

20 19 Putting it all together... “FERROUS METAL EXTRACTION SITE SN Includes preliminary processing CL Industrial BT Metal Industry Site NT Ironstone Mine NT Ironstone Pit NT Ironstone Workings RT Ironstone Workings” Example from RCHME Thesaurus of Monument Types, © RCHME 1995.

21 20 Working with the tools Thesauri, controlled vocabulary lists, etc, are all useful, but they –often rely upon both cataloguers and users having direct access to these usually weighty tomes –require an awareness of cataloguing issues and practice to be used most effectively –have predominantly developed within –– rather than between –– communities, regions, etc. –rapidly become destabilised as distributed users add new terms in a non–complimentary fashion

22 21 Effective distributed thesauri [1] In order for thesauri to be effective in the online environment, research and good practice need to address; –mapping between existing thesauri –technical mapping –semantic mapping are ‘E.E.C.’ and ‘Common Market’ synonymous? –restructuring one or both where necessary/ possible –inter–disciplinary mapping the ‘God Problem’ –addressing legacy data

23 22 Effective distributed thesauri [2] –delivery of training to remote cataloguers –providing online access to more existing thesauri –development of cataloguing tools –capable of accessing various remote thesauri and selecting terms in an intuitive, timely, fashion Nordic Metadata Project Dublin Core tool –raising the profile of thesauri as “A Good Thing”! –Development of user interface tools –capable of integrating various remote thesauri into the search process without slowing it intolerably, losing contextual awareness or subjecting the browser to information overload.

24 23 Some links English Heritage Thesauri www.rchme.gov.uk/thesaurus/thes_splash.htm Getty Thesauri www.getty.edu/gri/vocabularies/ HASSET biron.essex.ac.uk/searching/zhasset.html HIgh Level Thesaurus Project (HILT) hilt.cdlr.strath.ac.uk/ Pan–Government Thesaurus Should be visible from www.govtalk.gov.uk/ eventually.

25 24 Metadata

26 25 What is ‘Metadata’? –meaningless jargon –or a fashionable, and terribly misused, term for what we’ve always done –or “a means of turning data into information” –and “data about data” –and the name of a person (‘Tony Blair’) –and the title of a book (‘The Name of the Rose’).

27 26 What is ‘Metadata’? Metadata exists for almost anything; People Places Objects Concepts Web pages Databases.

28 27 What is ‘Metadata’? Metadata fulfils three main functions; Description of resource content –“What is it?” Description of resource form –“How is it constructed?” Description of resource use –“Can I afford it?”.

29 28 Challenges  Many flavours of metadata  which one do I use?  Managing change  new varieties, and evolution of existing forms  Tension between functionality and simplicity, extensibility and interoperability Functions, features, and cool stuff Simplicity and interoperability Opportunities

30 29 Introducing the Dublin Core An attempt to improve resource discovery on the Web –now adopted more broadly Building an interdisciplinary consensus about a core element set for resource discovery –simple and intuitive –cross–disciplinary — not just libraries!! –international –open and consensual –flexible. See purl.org/dc/

31 30 15 elements of descriptive metadata All elements optional All elements repeatable The whole is extensible –offers a starting point for semantically richer descriptions. Introducing the Dublin Core

32 31 Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights purl.org/dc/ Introducing the Dublin Core

33 32 Z39.50

34 33 What is Z39.50? ANSI/NISO Z39.50–1995, Information Retrieval (Z39.50): Application Service Definition and Protocol Specification ISO 23950:1998, Information and Documentation — Information Retrieval (Z39.50) — Application Service Definition and Protocol Specification. See lcweb.loc.gov/z3950/agency/1995doce.html

35 34 What is Z39.50? “This standard specifies a client/server based protocol for Information Retrieval. It specifies procedures and structures for a client to search a database provided by a server, retrieve database records identified by a search, scan a term list, and sort a result set. Access control, resource control, extended services, and a ‘help’ facility are also supported. The protocol addresses communication between corresponding information retrieval applications, the client and server (which may reside on different computers); it does not address interaction between the client and the end-user.” (Z39.50–1995, page 0). See lcweb.loc.gov/z3950/agency/1995doce.html

36 35 Some gory details… Z39.50 follows client/server model But calls them Origin and Target Client/origin Server/target

37 36 Client/Server architecture

38 37 Client/Server architecture

39 38 Some gory details… Z39.50–1995 is divided into eleven ‘Facilities’ InitializationSearch RetrievalResult–set–delete BrowseSort Access ControlAccounting ExplainExtended Services Termination. See www.ariadne.ac.uk/issue21/z3950/

40 39 Facilities and Services Each Facility comprises at least one Service A Service facilitates a particular interaction between Origin and Target The three key services are Init, Search, and Present. See www.ariadne.ac.uk/issue21/z3950/

41 40 Init The only Service of the Initialization Facility Origin–initiated Used to start a ‘Z–association’ Origin requests a number of parameters under which the searches will be conducted Target responds, either accepting offered parameters or proposing others if necessary.

42 41 Search The only Service of the Search Facility Origin–initiated Used to actually conduct a search Origin specifies databases to be searched, attribute combinations, and query Target responds, identifying the number of matching results.

43 42 Present Main Service of the Retrieval Facility (along with Segment) Origin–initiated although Target can initiate a Segment request if the result set is very large Used to return records to the user.

44 43 Init for dummies Hello. Do you speak English? Hello. Yes, I do. Let’s talk.

45 44 Search for dummies Cool. Can I have anything you’ve got on a place called “London”? I’ve got 25 records matching your request, and here’s the first five. As you didn’t specify anything else, I’ve sent them to you in MARC, so I hope that’s OK.

46 45 Present for dummies 25, eh? Can I have the first ten, please? Oh, and I really don’t like MARC. If you can send Dublin Core that would be great, and if not I’ll settle for some SUTRS. DC:Creator – blah DC:Title – blah …

47 46 Now it gets hairy… To communicate successfully, Origin and Target need to use the same Attribute Set. An Attribute Set like Bib–1 defines six forms of Attribute — –Use –Relation –Truncation –Completeness –Position –Structure.

48 47 Use Attributes Define the ‘access points’ on which a search takes place Title, author, subject, etc. See lcweb.loc.gov/z3950/agency/defns/bib1.html

49 48 Relation Attributes Defines the relationship between the search term and values stored in the database/index Less than, greater than, equal to, phonetically matched, etc.

50 49 Truncation Attributes Defines which part of the stored value is to be searched on Beginning of any word, end of any word, etc. ‘Smith’ finds ‘Smithsonian’ and not ‘Wordsmith’, and vice versa.

51 50 Completeness Attributes Defines how much of the stored index term must be in the search term ‘Smith’ finds ‘Smith’, but not ‘Smithsonian’ or ‘the Smith’, etc.

52 51 Position Attributes Defines where in the index the search term should be located At the start of the field, anywhere, etc.

53 52 Structure Attributes Specifies the form to be searched for Word, phrase, date, etc.

54 53 Record Syntaxes Record Syntaxes define the structure in which results are returned to the Origin. This does not mean that Targets need to store data in these formats MARC UKMARC, USMARC/MARC21, DANMARC, MARB, UNIMARC… SUTRS Simple Unstructured Text Record Syntax GRS–1 Generic Record Syntax XML.

55 54 Profiles Groupings of Attribute Sets, Record Syntaxes, etc. to meet specific needs Disciplinary –Cultural Heritage (CIMI) –Geospatial (GEO) Geographic/Cultural/National –Texas Profile –OPAC Network for Europe (ONE) –Conference of European National Librarians (CENL) Functional –Collections Profile Etc.

56 55 What’s wrong with Z39.50? Profiles for each discipline Defeats interoperability? Vendor interpretation of the standard Bib–1 bloat Largely invisible to the user Seen as complicated, expensive and old–fashioned Surely no match for XML/RDF/ whatever.

57 56 Some Joined up working: The Bath Profile Vendors and systems implement areas of the Z39.50 standard differently Regional, National, and disciplinary Profiles have appeared over previous years, many of which have basic functions in common Users wish to search across national/regional boundaries, and between vendors. See www.ariadne.ac.uk/issue21/z3950/

58 57 Learning from the past The Bath Profile is heavily influenced by ATS–1 CENL DanZIG MODELS ONE Z Texas vCUC See www.ukoln.ac.uk/interop–focus/bath/

59 58 Learning from the past See www.ukoln.ac.uk/interop–focus/bath/

60 59 Doing the work ZIP–PIZ–L mailing list, hosted by National Library of Canada Meeting face–to–face JISC supported a face–to–face meeting in Bath (UK) over the summer of 1999 A draft was widely circulated for comment ISO accreditation process Resulting in Internationally Registered Profile status Ongoing Maintenance Agency activity. See www.ukoln.ac.uk/interop–focus/bath/

61 60 Makx Dekkers PricewaterhouseCoopers/ EC Janifer Gatenby GEAC Juha Hakala National Library of Finland Poul Henrik Jørgensen Danish Library Centre Carrol Lunau National Library of Canada Paul Miller UKOLN Slavko Manojlovich SIRSI/ Memorial University of Newfoundland Bill Moen University of North Texas Judith Pearce National Library of Australia Joe Zeeman CGI. Doing the work See www.ukoln.ac.uk/interop–focus/bath/

62 61 What we proposed Minimisation of ‘defaults’ Where possible, every attribute is defined in the Profile (Use, Relation, Position, Structure, Truncation, Completeness) Three Functional Areas Basic Bibliographic Search & Retrieval Bibliographic Holdings Search & Retrieval Cross–Domain Search & Retrieval Three Levels of Conformance in each Area. See www.ukoln.ac.uk/interop–focus/bath/

63 62 What we proposed SUTRS or XML and UNIMARC or MARC21 for Bibliographic Search results SUTRS and Dublin Core (in XML) for Cross–Domain results Other record syntaxes also permitted, but conformant tools must support at least these. See www.ukoln.ac.uk/interop–focus/bath/

64 63 Making it work… Adopted already by Texas, Atlantic Canada, CIC (Big 10), CENL, etc. Interoperability suite MARC21 in Texas UNIMARC and cross–domain in Europe? Direct approaches to international vendors User testing in Europe and North America Addition of Functional Areas and Levels of Conformance as required Community Information? See www.ukoln.ac.uk/interop–focus/bath/

65 64 Standards… Technical standards make the job easier in the long run for users, curators, and managers but can make it harder to get started There is rarely a ‘right’ standard for all situations so identify a need to do something, without being specific about how know who your audience is, what you have to offer, and what your purpose/message is..


Download ppt "1 Introducing some Standards Paul Miller Interoperability Focus UK Office for Library & Information Networking (U KOLN )"

Similar presentations


Ads by Google