11/15/2001Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.

Slides:



Advertisements
Similar presentations
Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Developing a Generic Toolkit: Architecture and technology issues ALLC/ACH Conference.
Advertisements

An Introduction to MODS: The Metadata Object Description Schema Tech Talk By Daniel Gelaw Alemneh October 17, 2007 October 17, 2007.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
IS 373—Web Standards Todd Will
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
RDF Kitty Turner. Current Situation there is hardly any metadata on the Web search engine sites do the equivalent of going through a library, reading.
SLIDE 1IS FALL 2004 Lecture 18: Metadata & Controlled Vocabulary Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday.
11/9/2000Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
11/7/2000Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
CS155a: E-Commerce Lecture 14: October 25, 2001 Introduction to XML Acknowledgement: R. Glushko and A. Gregory.
CS155b: E-Commerce Lecture 10: Feb. 13, 2003 XML and its relationship to B2B commerce Acknowledgements: R. Glushko, A. Gregory, and V. Ramachandran.
8/28/97Information Organization and Retrieval Controlled Subject Vocabularies and Thesauri University of California, Berkeley School of Information Management.
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
SLIDE 1IS 257 – Fall 2009 Controlled Vocabularies University of California, Berkeley School of Information IS 245: Organization of Information.
SLIDE 1IS 257 – Fall 2007 Subject Access to Collections: Introduction University of California, Berkeley School of Information IS 245: Organization.
11/13/2001Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
B2B e-commerce standards for document exchange In350: week 13: Nov. 19,2001 Judith A. Molka-Danielsen.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
8/28/97Information Organization and Retrieval Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information.
Distributed Collaborations Using Network Mobile Agents Anand Tripathi, Tanvir Ahmed, Vineet Kakani and Shremattie Jaman Department of computer science.
System Integration (Cont.) Week 7 – Lecture 2. Approaches Information transfer –Interface –Database replication –Data federation Business process integration.
SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004
SLIDE 1IS FALL 2002 Lecture 06: Controlled Vocabularies Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and.
Introduction to XML This material is based heavily on the tutorial by the same name at
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Processing of structured documents Spring 2003, Part 6 Helena Ahonen-Myka.
Digital Encoding What’s behind E-text Resources?.
XML – Extensible Markup Language Sivakumar Kuttuva & Janusz Zalewski.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
MIS 315 Bsharah An Introduction to XML 1MIS Bsharah.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
Organizing Internet Resources OCLC’s Internet Cataloging Project -- funded by the Department of Education -- from October 1, 1994 to March 31, 1996.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
10/21/98Organization of Information in Collections Subject Access to Collections: Introduction University of California, Berkeley School of Information.
1 Metadata Standards Catherine Lai MUMT-611 MIR January 27, 2005.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Metadata Bridget Jones Information Architecture I February 23, 2009.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
GenX- XML Mapping of GenCAM Andy Dugenske Andy Scholand Manufacturing Research Center Georgia Institute of Technology January 23, 1999.
Document Computing Technologies for Managing Electronic Document Collections Ross Wilkinson... [et al.] Circulation Counter [RES3H] ZA4080.D
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Linda Schmandt Structured Text & XML in Medicine 16 Jan 2004.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
A centre of expertise in digital information management UKOLN is supported by: Metadata – what, why and how Ann Chapman.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
XML QUESTIONS AND ANSWERS
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Catherine Lai MUMT-611 MIR January 27, 2005
Presentation transcript:

11/15/2001Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval Slides by Ray R. Larson and Robert Glushko

11/15/2001Information Organization and Retrieval Review Controlled Vocabularies –Authority Control and Name Authorities –Subject Headings vs. Descriptors –Hierarchical vs. Facetted Organizations

11/15/2001Information Organization and Retrieval Metadata Metadata is: – “data about data” (from Database) –Information about Information –Structures and Languages for the Description of Information Resources and their elements (components or features) –“Metadata is information on the organization of the data, the various data domains, and the relationship between them” (Baeza-Yates p. 142)

11/15/2001Information Organization and Retrieval Type of Metadata systems and standards Naming and ID systems – URLs, ISBNs Bibliographic description – MARC, Dublin Core, TEI, etc. Music -- SMDL Images and objects – CIMI, VRA Core Categories Numeric Data – DDI, SDSM Geospatial Data – FGDC Collections – EAD

11/15/2001Information Organization and Retrieval Controlled Vocabularies Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information.

11/15/2001Information Organization and Retrieval The problem Proliferation of the forms of names –Different names for the same person –Different people with the same names Examples –from Books in Print (semi-controlled but not consistent) –ERIC author index (not controlled)

11/15/2001Information Organization and Retrieval Name Authority Files ID:NAFL ST:p EL:n STH:a MS:c UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 053 PR6005.R Creasey, John Cooke, M. E Cooke, Margaret,$d Cooper, Henry St. John,$d Credo,$d Fecamps, Elise Gill, Patrick,$d Hope, Brian,$d Hughes, Colin,$d Marsden, James Matheson, Rodney Ranger, Ken St. John, Henry,$d Wilde, Jimmy $wnnnc$aAshe, Gordon,$d Different names for the same person

11/15/2001Information Organization and Retrieval Name Authority Files ID:NAFO ST:p EL:n STH:a MS:n UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC:c SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d OCoLC$cOCoLC Marric, J. J.,$d $wnnnc$aCreasey, John 663 Works by this author are entered under the name used in the item. For a listing of other names used by this author, search also under$bCrease y, John 670 OCLC : His Gideon's day, 1955$b(hdg.: Creasey, John; usage: J.J. Marric) 670 LC data base, 6/10/91$b(hdg.: Creasey, John; usage: J.J. Marric) 670 Pseuds. and nicknames dict., c1987$b(Creasey, John, ; Britis h author; pseud.: Marric, J. J.)

11/15/2001Information Organization and Retrieval Name Authority Files ID:NAFL ST:p EL:n STH:a MS:c UIP:a TD: KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF: RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC Butler, William Vivian,$d Butler, W. V.$q(William Vivian),$d Marric, J. J.,$d His The durable desperadoes, His The young detective's handbook, c1981:$bt.p. (W.V. Butler) 670 His Gideon's way, 1986:$bCIP t.p. (William Vivian Butler writing as J.J. Marric) Different people writing with the same name

11/15/2001Information Organization and Retrieval Other Types of Controlled Vocabularies Gazetteers (Geographic Names) Code lists (e.g. LC Language Codes) Subject Heading Lists Classification Schemes Thesauri

11/15/2001Information Organization and Retrieval Today SGML XML DTDs Document Markup Uses of XML

11/15/2001Information Organization and Retrieval SGML & XML What is SGML/XML? Document Type Definitions Document Markup Sources and Resources

11/15/2001Information Organization and Retrieval What is SGML/XML? A. SGML stands for Standard Generalized Markup Language –XML stands for eXtended Markup Language B. What it is NOT: –Not a visual document description –Not an application specific markup –Not proprietary

11/15/2001Information Organization and Retrieval What is SGML/XML? What it is: –An international standard (SGML- ISO 8879:1986) –A generic language for describing the structure of documents, and markup that can be used for those documents –Intended for generating markup for content rather than form elements XML is a simplified subset of SGML (being established by W3C)

11/15/2001Information Organization and Retrieval The Documents of Commerce Customer Profiles Vendor Profiles Catalogs Datasheets Price Lists Purchase Orders Invoices Inventory Reports Bill of Materials Contracts Credit Reports Bank Statements Proposals Directories Transportation Schedules Receipts Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval Alternatives for Exchanging Documents Format based API based Publish information for a universal client Batch and high-volume exchange between trading partners Application Integration HTML EDI CORBA / COM Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval Limitations of each Exchange Model Format based API based Formatting markup “for eyes” “Scrape and hope” integration Must be pre-arranged High cost Rigid and inflexible Pre-wired Heavyweight to implement Not native to the web HTML EDI CORBA / COM Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval Having our Cake and Eating it Too We need: the precision of APIs the simplicity of HTML Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval XML to the Rescue (SGML-- and HTML++) Extensible Markup Language –a simplification of SGML, the Standard Generalized Markup Language –instead of a fixed set of format-oriented tags like HTML, XML allows you to create the schema -- whatever set of tags are needed --for your information type or application –this makes any XML instance “self-describing” and easily understood by computers and people Version 1.0 ratified by W3C in 2/98; backed by Microsoft, Sun, Netscape, many others Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval Why XML is Revolutionary XML enables a business to preserve any “document type” or “database schema” when it publishes on the Web XML enables a business to send self-describing “business messages” that can be understood by programs, not just “by eye” This information cannot be encoded in HTML XML-encoded information is smart enough to support new classes of Web applications Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval XML Enables New Web Applications Data interchange between Web clients –use Web for application integration without information loss (example: product information in supply chain, EDI) Moving processing from server to client –reduce network traffic and server load (example: download airline schedule, find best flights without “back-and-forth” thrashing) Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval XML Enables New Web Applications Multiple client-side views of same data –expert and novice versions –manager and worker versions – localization (currency or measurement conversions) “Information push” from personalized applications –selecting information based on user preferences (example: custom news feed by matching article keywords against user profile) Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval The First Generation Web ComputersBrowsers.. making information accessible through browsers scripts HTML Eyeballs only No automation Limited integration Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval HTML Airline Schedule Seen “By Eye” Airline Schedule Flight Information United Airlines #200 San Francisco 9:30 AM Honolulu 12:30 PM $ Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval HTML Airline Schedule Seen “By Computer” Airline Schedule Flight Information United Airlines #200 San Francisco 9:30 AM Honolulu 12:30 PM $ Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval Next Generation Web Java Computers.. making information and services accessible to computers (and people) XML Structured searches Agents New models Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval Airline Schedule in XML San Francisco 9:30 AM Honolulu 12:30 PM Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval XML is a Foundation for Interoperability Format based API based WEBEDICORBA / COM XML.. exchange information in an application and vendor neutral format Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval Open framework for commerce Computer Automotive Pinnacles HL/7 Common Business Language ProcureRetail XML/ EDI OBI OTP SCOROFX Shared Semantics Extensible and “aggressively interoperable” Health Care Office Consumer Manufac- turing- Supply Chain Appliances Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval Shared Semantics for Time and Location Shared semantics for location and time in all schemas that need them enables richer “commerce networks” of services:... Honolulu... Honolulu … Honolulu Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval Automated Vacation Planning Service Book me the cheapest flight to Honolulu the first week of January Find a hotel room for the day I arrive What concerts are taking place the next day? Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval The Common Business Language Specifies common semantics, common syntax, and message packaging for information held by and exchanged among transaction partners and market participants These documents are the interfaces among the commerce components envisioned in the overall eCo architecture being realized in a current ATP project being carried out by CNgroup, CommerceNet, BusinessBots, and Tesserae CBL’s focus is on the functions and information that are common to all business domains Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval CBL and XML CBL documents are described by XML DTDs to make them “self-descriptive” and validatable CBL builds on existing standard or industry semantics where possible Complex descriptions and messages can be composed from primitives Domain-specific XML applications can be implemented in “native” form or as “hybrids” for maximal interoperability Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval Common Business Language Building Blocks CBL Documents Business Forms Catalog Purchase Order Invoice Business Descriptions Vendor Services Products Measurements Time Currency Weight Locale Address Country Language Classification SIC NAICS FSC core Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval Common Business Language Building Blocks CBL Documents Business Forms Catalog Purchase Order Invoice Business Descriptions Vendor Services Products Measurements Time Currency Weight Locale Address Country Language Classification SIC NAICS FSC core Source Dr. Robert J Glushko

11/15/2001Information Organization and Retrieval If Interested in CBL Visit: – And for e-commerce applications using CBL, visit: –

11/15/2001Information Organization and Retrieval SGML/XML Definitions Defining DTDs Markup

11/15/2001Information Organization and Retrieval SGML/XML Structure An SGML document consists of three parts: –The SGML Declaration –The Document Type Definition (DTD) –The Document Instance An XML document REQUIRES only the document instance, but for effective processing a DTD is very important XML Schema provides an alternative to DTDs for XML applications

11/15/2001Information Organization and Retrieval Document Type Definitions The DTD describes the structural elements and "shorthand" markup for a particular document type. It defines: –Names of "legal" elements –How many times elements can appear –The order of elements in a document –Whether markup can be omitted (SGML only) –Contents of elements (i.e., nested structures) –Attributes associated with elements –Names of "entities" –short-hand conventions for element tags. (SGML only)

11/15/2001Information Organization and Retrieval DTD Components The major components of a DTD are: –Entity Declarations –Element Declarations –Attribute Declarations

11/15/2001Information Organization and Retrieval Document Type Definitions Entity Declarations are a "macro" definition facility for both DTD and Document instance parts. –General Internal Entity Definitions referenced by &name; –General External Entity Definitions referenced by &name; –Parameter Entity Definitions (used only inside DTDs) or referenced by %name; or %name

11/15/2001Information Organization and Retrieval Document Type Definitions Element Declarations define the structural elements of a document and its associated markup. –Omitted tag minimization indicates whether start-tags or end-tags can be omitted in the markup (o) or (-) are required in SGML but can NOT be used in XML

11/15/2001Information Organization and Retrieval Document Type Definitions Content model provides a nested structural description of the elements that make up this element, e.g.:... –ANY (in SGML) may be used to indicate a content model of any elements in the DTD, in any order.

11/15/2001Information Organization and Retrieval Document Type Definitions Same Content model in XML <!DOCTYPE memo [ … ]> –Note the XML Processing instruction “Prolog” –Note that & in previous page is not legal XML

11/15/2001Information Organization and Retrieval Document Type Definitions Declared content can be: PCDATA, CDATA, RCDATA, EMPTY Inclusion and Exclusion lists can be used to indicate elements that can occur or are forbidden to occur in any sub-elements of the content model. (NOT in XML) E.g.: – –says that element fn can appear anyplace in the memo.

11/15/2001Information Organization and Retrieval Document Type Definitions Attribute Declarations define attributes associated with (potentially) each element of a document and provide the acceptable values for those attributes.

11/15/2001Information Organization and Retrieval Attributes Example –In markup of a document: also, because of the default set: would be the same as There are a variety of special defaults and data types that can be given in attribute definitions

11/15/2001Information Organization and Retrieval Sample SGML DTD <!doctype ELIB-TEXTS [ <!-- This is a DTD for bibliographic records extracted from the elib/rfc1357 simple bibliographic format. --> <!ELEMENT ELIB-BIB - - (BIB-VERSION, ID, ENTRY?, DATE?, TITLE*, ORGANIZATION*, (SERIES | TYPE | REVISION | REVISION-DATE | AUTHOR-PERSONAL | AUTHOR-INSTITUTIONAL | AUTHOR-CONTRIBUTING-PERSONAL | AUTHOR-CONTRIBUTING-PERSONAL | AUTHOR-CONTRIBUTING-INSTITUTIONAL | CONTACT AUTHOR | PROJECT | PAGES | BIOREGION | CERES-BIOREGION | TEXTSOUP | LOCATION | ULTIMATE-CLIENT | URL | KEYWORDS | NOTES | ABSTRACT)*, (TEXT-REF | PAGED-REF)* )> … etc… ]>

11/15/2001Information Organization and Retrieval XML version <!doctype ELIB-TEXTS [ <!-- This is a DTD for bibliographic records extracted from the elib/rfc1357 simple bibliographic format. --> <!ELEMENT ELIB-BIB (BIB-VERSION, ID, ENTRY?, DATE?, TITLE*, ORGANIZATION*, (SERIES | TYPE | REVISION | REVISION-DATE | AUTHOR-PERSONAL | AUTHOR-INSTITUTIONAL | AUTHOR-CONTRIBUTING-PERSONAL | AUTHOR-CONTRIBUTING-PERSONAL | AUTHOR-CONTRIBUTING-INSTITUTIONAL | CONTACT AUTHOR | PROJECT | PAGES | BIOREGION | CERES-BIOREGION | TEXTSOUP | LOCATION | ULTIMATE-CLIENT | URL | KEYWORDS | NOTES | ABSTRACT)*, (TEXT-REF | PAGED-REF)* )> … etc… ]>

11/15/2001Information Organization and Retrieval Document Using That DTD ELIB-v1.0 6 February March 1, 1993 Water Conditions in California Report 2 California Department of Water Resources bulletin California Department of Water Resources 17 /elib/data/disk/disk5/documents/6/HYPEROCR/hyperocr.html /elib/data/disk/disk5/documents/6/OCR-ASCII-NOZONE

11/15/2001Information Organization and Retrieval A More Complex DTD <!DOCTYPE USMARC [ <!ATTLIST USMARC Material (BK|AM|CF|MP|MU|VM|SE) "BK" id CDATA #IMPLIED> <!-- Author's Note: the id attribute for the USMARC element is intended to hold a unique record number for each MARC record in the local database. That is to say, it is intended ONLY as an aid in maintaining the local database of MARC records --> <!ELEMENT Leader - O (LRL, RecStat, RecType, BibLevel, UCP, IndCount, SFCount, BaseAddr, EncLevel, DscCatFm, LinkRec, EntryMap)> …etc…

11/15/2001Information Organization and Retrieval More complex DTD (cont.) <!ELEMENT VarDFlds - O (NumbCode, MainEnty?, Titles, EdImprnt?, PhysDesc?, Series?, Notes?, SubjAccs?, AddEnty?, LinkEnty?, SAddEnty?, HoldAltG?, Fld9XX?)> <!ELEMENT NumbCode - O (Fld010?, Fld011?, Fld015?, Fld017*, Fld018?, Fld019*, Fld020*, Fld022*, Fld023*, Fld024*, Fld025*, Fld027*, Fld028*, Fld029*, Fld030*, Fld032*, Fld033*, Fld034*, Fld035*, Fld036?, Fld037*, Fld039*, Fld040?, Fld041?, Fld042?, Fld043?, Fld044?, Fld045?, Fld046?, Fld047?, Fld048*, Fld050*, Fld051*, Fld052*, Fld055*, Fld060*, Fld061*, Fld066?, Fld069*, Fld070*, Fld071*, Fld072*, Fld074*, Fld080?, Fld082*, Fld084*, Fld086*, Fld088*, Fld090*, Fld096*)> <!ELEMENT Titles - O (Fld210?, Fld211*, Fld212*, Fld214*, Fld222*, Fld240?, Fld242*, Fld243?, Fld245, Fld246*, Fld247*)> <!ELEMENT EdImprnt - O (Fld250?, Fld254?, Fld255*, Fld256?, Fld257?, Fld260?, Fld261?, Fld262?, Fld263?, Fld265?)> <!ELEMENT PhysDesc - O (Fld300*, Fld305*, Fld306?, Fld310?, Fld315?, Fld321*, Fld340*, Fld350?, Fld351*, Fld355*, Fld357*, Fld362*)> …etc…

11/15/2001Information Organization and Retrieval Complex DTD (cont.) <!ATTLIST Fld245 AddEnty (No|Yes|Blank) #IMPLIED NFChars (0|1|2|3|4|5|6|7|8|9|Blnk) #IMPLIED> …etc…

11/15/2001Information Organization and Retrieval Document Markup All document markup is derived from the DTD for the particular document type. The DTD must be referenced in the document using the DOCTYPE declaration: – or or The doctype_declaration_subset can be any combination of elements, entity, and attribute declarations.

11/15/2001Information Organization and Retrieval HTML HTML was not originally "real" SGML, the DTD was invented after the language. It is often more concerned with the form of the output on the screen than with the structural contents of the HTML docs. Relies on the application (such as Netscape) to implement interesting actions like hypertext linking.

11/15/2001Information Organization and Retrieval How can you describe an information-bearing object?

11/15/2001Information Organization and Retrieval Dublin Core Review… Simple metadata for describing internet resources. For “Document-Like Objects” 15 Elements.

11/15/2001Information Organization and Retrieval Dublin Core Elements Title Creator Subject Description Publisher Other Contributors Date Resource Type Format Resource Identifier Source Language Relation Coverage Rights Management

11/15/2001Information Organization and Retrieval DC DTD Implementation There have been various versions This one is the one recommended (required) by the Open Archives Initiative Metadata Harvesting Protocol (OAI-MHP) Uses XML Name Spaces Available at

11/15/2001Information Organization and Retrieval DC Element and Attribute Definitions… <!-- An entity primarily responsible for making the content of the resource. --> <!-- An entity responsible for making contributions to the content of the resource. -->

11/15/2001Information Organization and Retrieval DC Element Definitions, Cont.

11/15/2001Information Organization and Retrieval SGML and XML Sources and Resources Books: van Herwijnen, Eric. Practical SGML. (2nd Ed.) Boston: Kluwer Academic Publishers, Goldfarb, Charles F. The SGML Handbook. Oxford: Clarenden Press, (And MANY XML books) Web Sites: –Robin Cover’s SGML/XML Site Cover’s SGML/XML Site