2002.10.08 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

XML and Enterprise Computing. What is XML? Stands for “Extensible Markup Language” –similar to SGML and HTML –document “tags” are used to define content.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
TC3 Meeting in Montreal (Montreal/Secretariat)6 page 1 of 10 Structure and purpose of IEC ISO - IEC Specifications for Document Management.
Information Retrieval in Practice
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
11/15/2001Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
IS 373—Web Standards Todd Will
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
Thesaurus Design and Development
RDF Kitty Turner. Current Situation there is hardly any metadata on the Web search engine sites do the equivalent of going through a library, reading.
11/9/2000Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
11/21/2000Information Organization and Retrieval Thesaurus Design and Development University of California, Berkeley School of Information Management and.
CS155a: E-Commerce Lecture 14: October 25, 2001 Introduction to XML Acknowledgement: R. Glushko and A. Gregory.
CS155b: E-Commerce Lecture 10: Feb. 13, 2003 XML and its relationship to B2B commerce Acknowledgements: R. Glushko, A. Gregory, and V. Ramachandran.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
B2B e-commerce standards for document exchange In350: week 13: Nov. 19,2001 Judith A. Molka-Danielsen.
8/28/97Information Organization and Retrieval Files and Databases University of California, Berkeley School of Information Management and Systems SIMS.
Distributed Collaborations Using Network Mobile Agents Anand Tripathi, Tanvir Ahmed, Vineet Kakani and Shremattie Jaman Department of computer science.
System Integration (Cont.) Week 7 – Lecture 2. Approaches Information transfer –Interface –Database replication –Data federation Business process integration.
SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Overview of Search Engines
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Processing of structured documents Spring 2003, Part 6 Helena Ahonen-Myka.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
Z39.50, XML & RDF Applications ZIG Tutorial January 2000 Poul Henrik Jørgensen, Danish Bibliographic Centre,
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
MIS 315 Bsharah An Introduction to XML 1MIS Bsharah.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Sep 30, 2000XML Workshop Talk, IIT Bombay XML Standardization for Business Applications Dr. Vasudev Kamath Persistent Systems.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Interfacing Registry Systems December 2000.
An Alternative Approach to Interoperability Testing The Use of Special Diagnostic Records in the Context of Z39.50 and Online Library Catalogs William.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
XML Tools (Chapter 4 of XML Book). What tools are needed for a complete XML application? n Fundamental components n Web infrasructure n XML development.
Linda Schmandt Structured Text & XML in Medicine 16 Jan 2004.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Information Retrieval in Practice
XML QUESTIONS AND ANSWERS
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Component-based Software Engineering: XML
CSE591: Data Mining by H. Liu
Presentation transcript:

SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall SIMS 202: Information Organization and Retrieval Lecture 12: Metadata and Markup

SLIDE 2IS 202 – FALL 2002 Lecture Overview Review –Thesaurus Design And Development –Thesaurus Design –Steps In Thesaurus Development Metadata And Markup –XML As A Metadata Lingua Franca –XML DTD Construction –XML For Protocols And Metadata Languages

SLIDE 3IS 202 – FALL 2002 Lecture Overview Review –Thesaurus Design And Development –Thesaurus Design –Steps In Thesaurus Development Metadata And Markup –XML As A Metadata Lingua Franca –XML DTD Construction –XML For Protocols And Metadata Languages

SLIDE 4IS 202 – FALL 2002 Structure of an IR System Search Line Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Adapted from Soergel, p. 19

SLIDE 5IS 202 – FALL 2002 Thesauri A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among synonymous, equivalent, broader, narrower and other related terms

SLIDE 6IS 202 – FALL 2002 Thesauri (cont.) Examples –The ERIC Thesaurus of Descriptors –The Medical Subject Headings (MESH) of the National Library of Medicine –The Art and Architecture Thesaurus

SLIDE 7IS 202 – FALL 2002 Why Develop a Thesaurus? To provide a conceptual structure or “space” for a body of information –To make it possible to adequately describe the topical contents of informational objects at an appropriate level of generality or specificity –To provide enhanced search capabilities and to improve the effectiveness of searching (i.e., to retrieve most of the relevant material without too much irrelevant material)

SLIDE 8IS 202 – FALL 2002 Development of a Thesaurus Term selection Merging and development of concept classes Definition of broad subject fields and subfields Development of classificatory structure Review, testing, application, revision

SLIDE 9IS 202 – FALL 2002 Flow of Work in Thesaurus Construction Select Sources Assign codes Select Terms Record Selected Terms Sort Terms Merge identical Terms Define Broad Subject Fields Merge Terms in Same Concept class Sort Terms into Broad Subject Fields Define Subfields within one Subject Field Work out detailed structure of the Subject Field Select Preferred Terms All Subfields of Broad Subject finished? All Broad Subjects finished? Improve Class Structure Yes No Print Classified Index and review Discuss with Experts and Users Select descriptors and checklist items Produce Full Thesaurus and Check references Assign Notation Review and Test Many Modifications? Based on Soergel, pp Yes No Revise as needed

SLIDE 10IS 202 – FALL 2002 The Indexing Process Concept identification Term selection (via thesaurus) Term assignment

SLIDE 11IS 202 – FALL 2002 Application: The Indexing Process (Manual) Is Term suitable NO Select Alternative term to represent Concept Would Concept be better represented by one of these terms Is There Another Concept Consider Preferred Term Select Preferred Term Establish Term Denoting Concept Examine Document and Identify Significant Concepts Consider First Concept Preferred Term? Start NO YES Does Thesaurus contain term for Concept Consider any associated terms in Thesaurus (NT,BT) Admit New Term Into Thesaurus Can Concept be expressed combining terms? Consider Each of These Terms Assign Terms to Document Prefer Alternative Term(s) End Adapted from ISO 5963, p.5

SLIDE 12IS 202 – FALL 2002 Lecture Overview Review –Thesaurus Design And Development –Thesaurus Design –Steps In Thesaurus Development Metadata And Markup –XML As A Metadata Lingua Franca –XML DTD Construction –XML For Protocols And Metadata Languages

SLIDE 13IS 202 – FALL 2002 What is SGML/XML? SGML stands for Standard Generalized Markup Language –XML stands for eXtended Markup Language What it is NOT: –Not a visual document description –Not an application specific markup –Not proprietary

SLIDE 14IS 202 – FALL 2002 What is SGML/XML? What it is: –An international standard (SGML- ISO 8879:1986) –A generic language for describing the structure of documents, and markup that can be used for those documents –Intended for generating markup for content rather than form elements XML is a simplified subset of SGML (established by W3C)

SLIDE 15IS 202 – FALL 2002 The Documents of Commerce Customer profiles Vendor profiles Catalogs Datasheets Price lists Purchase orders Invoices Inventory reports Bill of materials Contracts Credit reports Bank statements Proposals Directories Transportation schedules Receipts Source Dr. Robert J. Glushko

SLIDE 16IS 202 – FALL 2002 Alternatives for Exchanging Documents Format based API based Publish information for a universal client Batch and high-volume exchange between trading partners Application Integration HTML EDI CORBA / COM Source Dr. Robert J. Glushko

SLIDE 17IS 202 – FALL 2002 Limitations of Each Exchange Model Format based API based Formatting markup “for eyes” “Scrape and hope” integration Must be pre-arranged High cost Rigid and inflexible Pre-wired Heavyweight to implement Not native to the web HTML EDI CORBA / COM Source Dr. Robert J. Glushko

SLIDE 18IS 202 – FALL 2002 Having Our Cake And Eating It Too We need: –The precision of APIs –The simplicity of HTML Source Dr. Robert J. Glushko

SLIDE 19IS 202 – FALL 2002 XML to the Rescue (SGML and HTML++) Extensible Markup Language –A simplification of SGML, the Standard Generalized Markup Language –Instead of a fixed set of format-oriented tags like HTML, XML allows you to create the schema— whatever set of tags are needed—for your information type or application –This makes any XML instance “self-describing” and easily understood by computers and people Version 1.0 ratified by W3C in 2/98 –Backed by Microsoft, Sun, Netscape, many others Source Dr. Robert J. Glushko

SLIDE 20IS 202 – FALL 2002 Why XML is Revolutionary XML enables a business to preserve any “document type” or “database schema” when it publishes on the Web XML enables a business to send self-describing “business messages” that can be understood by programs, not just “by eye” This information cannot be encoded in HTML XML-encoded information is smart enough to support new classes of Web applications Source Dr. Robert J. Glushko

SLIDE 21IS 202 – FALL 2002 XML Enables New Web Applications Data interchange between Web clients –Use Web for application integration without information loss (example: product information in supply chain, EDI) Moving processing from server to client –Reduce network traffic and server load (example: download airline schedule, find best flights without “back-and-forth” thrashing) Source Dr. Robert J. Glushko

SLIDE 22IS 202 – FALL 2002 XML Enables New Web Applications Multiple client-side views of same data –Expert and novice versions –Manager and worker versions –Localization (currency or measurement conversions) “Information push” from personalized applications –Selecting information based on user preferences (example: custom news feed by matching article keywords against user profile) Source Dr. Robert J. Glushko

SLIDE 23IS 202 – FALL 2002 The First Generation Web ComputersBrowsers.. making information accessible through browsers scripts HTML Eyeballs only No automation Limited integration Source Dr. Robert J. Glushko

SLIDE 24IS 202 – FALL 2002 HTML Airline Schedule Seen “By Eye” Airline Schedule Flight Information United Airlines #200 San Francisco 9:30 AM Honolulu 12:30 PM $ Source Dr. Robert J. Glushko

SLIDE 25IS 202 – FALL 2002 HTML Airline Schedule Seen “By Computer” Airline Schedule Flight Information United Airlines #200 San Francisco 9:30 AM Honolulu 12:30 PM $ Source Dr. Robert J. Glushko

SLIDE 26IS 202 – FALL 2002 Next Generation Web Java Computers.. making information and services accessible to computers (and people) XML Structured searches Agents New models Source Dr. Robert J. Glushko

SLIDE 27IS 202 – FALL 2002 Airline Schedule in XML San Francisco 9:30 AM Honolulu 12:30 PM Source Dr. Robert J. Glushko

SLIDE 28IS 202 – FALL 2002 Shared Semantics for Time and Location Shared semantics for location and time in all schemas that need them enables richer “commerce networks” of services: –... – Honolulu –... – Honolulu – … – Honolulu Source Dr. Robert J. Glushko

SLIDE 29IS 202 – FALL 2002 Automated Vacation Planning Service Book me the cheapest flight to Honolulu the first week of January Find a hotel room for the day I arrive What concerts are taking place the next day? Source Dr. Robert J. Glushko

SLIDE 30IS 202 – FALL 2002 The Common Business Language Specifies common semantics, common syntax, and message packaging for information held by and exchanged among transaction partners and market participants These documents are the interfaces among the commerce components envisioned in the overall eCo architecture being realized in a current ATP project being carried out by CNgroup, CommerceNet, BusinessBots, and Tesserae CBL’s focus is on the functions and information that are common to all business domains Source Dr. Robert J. Glushko

SLIDE 31IS 202 – FALL 2002 CBL and XML CBL documents are described by XML DTDs to make them “self-descriptive” and validatable CBL builds on existing standard or industry semantics where possible Complex descriptions and messages can be composed from primitives Domain-specific XML applications can be implemented in “native” form or as “hybrids” for maximal interoperability Source Dr. Robert J. Glushko

SLIDE 32IS 202 – FALL 2002 CBL Building Blocks CBL Documents Business Forms Catalog Purchase Order Invoice Business Descriptions Vendor Services Products Measurements Time Currency Weight Locale Address Country Language Classification SIC NAICS FSC core Source Dr. Robert J. Glushko

SLIDE 33IS 202 – FALL 2002 CBL Building Blocks CBL Documents Business Forms Catalog Purchase Order Invoice Business Descriptions Vendor Services Products Measurements Time Currency Weight Locale Address Country Language Classification SIC NAICS FSC core Source Dr. Robert J. Glushko

SLIDE 34IS 202 – FALL 2002 If Interested In CBL Visit: – And for e-commerce applications using CBL, visit: –

SLIDE 35IS 202 – FALL 2002 Lecture Overview Review –Thesaurus Design And Development –Thesaurus Design –Steps In Thesaurus Development Metadata And Markup –XML As A Metadata Lingua Franca –XML DTD Construction –XML For Protocols And Metadata Languages

SLIDE 36IS 202 – FALL 2002 SGML/XML Structure An SGML document consists of three parts: –The SGML Declaration –The Document Type Definition (DTD) –The Document Instance An XML document REQUIRES only the document instance, but for effective processing a DTD is very important XML Schema provides an alternative to DTDs for XML applications

SLIDE 37IS 202 – FALL 2002 Document Type Definitions The DTD describes the structural elements and "shorthand" markup for a particular document type and defines: –Names of "legal" elements –How many times elements can appear –The order of elements in a document –Whether markup can be omitted (SGML only) –Contents of elements (i.e., nested structures) –Attributes associated with elements –Names of "entities" –Short-hand conventions for element tags (SGML only)

SLIDE 38IS 202 – FALL 2002 DTD Components The major components of a DTD are: –Entity Declarations –Element Declarations –Attribute Declarations

SLIDE 39IS 202 – FALL 2002 Document Type Definitions Entity Declarations are a "macro" definition facility for both DTD and Document instance parts –General Internal Entity Definitions referenced by &name; –General External Entity Definitions referenced by &name; –Parameter Entity Definitions (used only inside DTDs) or referenced by %name; or %name

SLIDE 40IS 202 – FALL 2002 Document Type Definitions Element Declarations define the structural elements of a document and its associated markup –Omitted tag minimization indicates whether start-tags or end-tags can be omitted in the markup (o) or (-) are required in SGML but can NOT be used in XML

SLIDE 41IS 202 – FALL 2002 Document Type Definitions Content model provides a nested structural description of the elements that make up this element, e.g.:... –ANY (in SGML) may be used to indicate a content model of any elements in the DTD, in any order

SLIDE 42IS 202 – FALL 2002 Document Type Definitions Same content model in XML <!DOCTYPE memo [ … ]> –Note the XML processing instruction “Prolog” –Note that & in previous page is not legal XML

SLIDE 43IS 202 – FALL 2002 Document Type Definitions Declared content can be: PCDATA, CDATA, RCDATA, EMPTY Inclusion and Exclusion lists can be used to indicate elements that can occur or are forbidden to occur in any sub-elements of the content model (NOT in XML), e.g.: –Says that element fn can appear anyplace in the memo

SLIDE 44IS 202 – FALL 2002 Document Type Definitions Attribute Declarations define attributes associated with (potentially) each element of a document and provide the acceptable values for those attributes

SLIDE 45IS 202 – FALL 2002 Attributes Example –In markup of a document: also, because of the default set: would be the same as There are a variety of special defaults and data types that can be given in attribute definitions

SLIDE 46IS 202 – FALL 2002 Sample SGML DTD <!doctype ELIB-TEXTS [ <!-- This is a DTD for bibliographic records extracted from the elib/rfc1357 simple bibliographic format. --> <!ELEMENT ELIB-BIB - - (BIB-VERSION, ID, ENTRY?, DATE?, TITLE*, ORGANIZATION*, (SERIES | TYPE | REVISION | REVISION-DATE | AUTHOR-PERSONAL | AUTHOR-INSTITUTIONAL | AUTHOR-CONTRIBUTING-PERSONAL | AUTHOR-CONTRIBUTING-PERSONAL | AUTHOR-CONTRIBUTING-INSTITUTIONAL | CONTACT AUTHOR | PROJECT | PAGES | BIOREGION | CERES-BIOREGION | TEXTSOUP | LOCATION | ULTIMATE-CLIENT | URL | KEYWORDS | NOTES | ABSTRACT)*, (TEXT-REF | PAGED-REF)* )> … etc… ]>

SLIDE 47IS 202 – FALL 2002 XML Version <!doctype ELIB-TEXTS [ <!-- This is a DTD for bibliographic records extracted from the elib/rfc1357 simple bibliographic format. --> <!ELEMENT ELIB-BIB (BIB-VERSION, ID, ENTRY?, DATE?, TITLE*, ORGANIZATION*, (SERIES | TYPE | REVISION | REVISION-DATE | AUTHOR-PERSONAL | AUTHOR-INSTITUTIONAL | AUTHOR-CONTRIBUTING-PERSONAL | AUTHOR-CONTRIBUTING-PERSONAL | AUTHOR-CONTRIBUTING-INSTITUTIONAL | CONTACT AUTHOR | PROJECT | PAGES | BIOREGION | CERES-BIOREGION | TEXTSOUP | LOCATION | ULTIMATE-CLIENT | URL | KEYWORDS | NOTES | ABSTRACT)*, (TEXT-REF | PAGED-REF)* )> … etc… ]>

SLIDE 48IS 202 – FALL 2002 Document Using That DTD ELIB-v1.0 6 February March 1, 1993 Water Conditions in California Report 2 California Department of Water Resources bulletin California Department of Water Resources 17 /elib/data/disk/disk5/documents/6/HYPEROCR/hyperocr.html /elib/data/disk/disk5/documents/6/OCR-ASCII-NOZONE

SLIDE 49IS 202 – FALL 2002 A More Complex DTD <!DOCTYPE USMARC [ <!ATTLIST USMARC Material (BK|AM|CF|MP|MU|VM|SE) "BK" id CDATA #IMPLIED> <!-- Author's Note: the id attribute for the USMARC element is intended to hold a unique record number for each MARC record in the local database. That is to say, it is intended ONLY as an aid in maintaining the local database of MARC records --> <!ELEMENT Leader - O (LRL, RecStat, RecType, BibLevel, UCP, IndCount, SFCount, BaseAddr, EncLevel, DscCatFm, LinkRec, EntryMap)> …etc…

SLIDE 50IS 202 – FALL 2002 More Complex DTD (cont.) <!ELEMENT VarDFlds - O (NumbCode, MainEnty?, Titles, EdImprnt?, PhysDesc?, Series?, Notes?, SubjAccs?, AddEnty?, LinkEnty?, SAddEnty?, HoldAltG?, Fld9XX?)> <!ELEMENT NumbCode - O (Fld010?, Fld011?, Fld015?, Fld017*, Fld018?, Fld019*, Fld020*, Fld022*, Fld023*, Fld024*, Fld025*, Fld027*, Fld028*, Fld029*, Fld030*, Fld032*, Fld033*, Fld034*, Fld035*, Fld036?, Fld037*, Fld039*, Fld040?, Fld041?, Fld042?, Fld043?, Fld044?, Fld045?, Fld046?, Fld047?, Fld048*, Fld050*, Fld051*, Fld052*, Fld055*, Fld060*, Fld061*, Fld066?, Fld069*, Fld070*, Fld071*, Fld072*, Fld074*, Fld080?, Fld082*, Fld084*, Fld086*, Fld088*, Fld090*, Fld096*)> <!ELEMENT Titles - O (Fld210?, Fld211*, Fld212*, Fld214*, Fld222*, Fld240?, Fld242*, Fld243?, Fld245, Fld246*, Fld247*)> <!ELEMENT EdImprnt - O (Fld250?, Fld254?, Fld255*, Fld256?, Fld257?, Fld260?, Fld261?, Fld262?, Fld263?, Fld265?)> <!ELEMENT PhysDesc - O (Fld300*, Fld305*, Fld306?, Fld310?, Fld315?, Fld321*, Fld340*, Fld350?, Fld351*, Fld355*, Fld357*, Fld362*)> …etc…

SLIDE 51IS 202 – FALL 2002 Complex DTD (cont.) <!ATTLIST Fld245 AddEnty (No|Yes|Blank) #IMPLIED NFChars (0|1|2|3|4|5|6|7|8|9|Blnk) #IMPLIED> …etc…

SLIDE 52IS 202 – FALL 2002 Document Markup All document markup is derived from the DTD for the particular document type The DTD must be referenced in the document using the DOCTYPE declaration: or or The doctype_declaration_subset can be any combination of elements, entity, and attribute declarations

SLIDE 53IS 202 – FALL 2002 HTML HTML was not originally "real" SGML, the DTD was invented after the language It is often more concerned with the form of the output on the screen than with the structural contents of the HTML docs Relies on the application (such as Netscape) to implement interesting actions like hypertext linking

SLIDE 54IS 202 – FALL 2002 Lecture Overview Review –Thesaurus Design And Development –Thesaurus Design –Steps In Thesaurus Development Metadata And Markup –XML As A Metadata Lingua Franca –XML DTD Construction –XML For Protocols And Metadata Languages

SLIDE 55IS 202 – FALL 2002 Dublin Core Review… Simple metadata for describing internet resources For “Document-Like Objects” 15 Elements

SLIDE 56IS 202 – FALL 2002 Dublin Core Elements Title Creator Subject Description Publisher Other Contributors Date Resource Type Format Resource Identifier Source Language Relation Coverage Rights Management

SLIDE 57IS 202 – FALL 2002 DC DTD Implementation There have been various versions This one is the one recommended (required) by the Open Archives Initiative Metadata Harvesting Protocol (OAI-MHP) Uses XML Name Spaces Available at

SLIDE 58IS 202 – FALL 2002 DC Element and Attribute Definitions <!-- An entity primarily responsible for making the content of the resource. --> <!-- An entity responsible for making contributions to the content of the resource. -->

SLIDE 59IS 202 – FALL 2002 DC Element Definitions (cont.)

SLIDE 60IS 202 – FALL 2002 Other Protocols and Metadata Systems Using XML SOAP (Simple Object Access Protocol) DAV/DASL (Distributed Authoring and Versioning) SDLIP (Simple Digital Library Interoperability Protocol) RDF (Resource Description Framework) ADL Gazetteer Protocol OAI-MHP (already discussed) MPEG-7 Also versions of MARC and other formats in XML

SLIDE 61IS 202 – FALL 2002 SGML and XML Sources and Resources Books: –van Herwijnen, Eric. Practical SGML. (2nd Ed.) Boston: Kluwer Academic Publishers, –Goldfarb, Charles F. The SGML Handbook. Oxford: Clarenden Press, (and MANY XML books) Web Sites: –The W3C web site (all XML standards documents) –Robin Cover’s SGML/XML Site

SLIDE 62IS 202 – FALL 2002 Next Time Assignment 5 Due Come to class having thought about the strengths and weaknesses of the consolidated photo classification