Do More with Digital Scholarship: XML-Based Metadata

Slides:



Advertisements
Similar presentations
Introduction to Web Design Lecture number:. Todays Aim: Introduction to Web-designing and how its done. Modelling websites in HTML.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
LIS650lecture 1 XHTML 1.0 strict Thomas Krichel
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
XML: Extensible Markup Language
KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.
XML and Enterprise Computing. What is XML? Stands for “Extensible Markup Language” –similar to SGML and HTML –document “tags” are used to define content.
WeB application development
Website Design.
Teppo Räisänen LIIKE/OAMK 2010
1 XSLT – eXtensible Stylesheet Language Transformations Modified Slides from Dr. Sagiv.
XHTML Basics.
SPECIAL TOPIC XML. Introducing XML XML (eXtensible Markup Language) ◦A language used to create structured documents XML vs HTML ◦XML is designed to transport.
INF201 Fall2010 Intro. to Info. Technologies Department of Informatics University at Albany – SUNY Original Source: w3schools.com Prepared by Xiao Liang,
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Basics of HTML Shashanka Rao. Learning Objectives 1. HTML Overview 2. Head, Body, Title and Meta Elements 3.Heading, Paragraph Elements and Special Characters.
Basics of HTML.
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
Creating a Simple Page: HTML Overview
Mark Sullivan University of Florida Libraries Digital Library of the Caribbean.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
IS432 Semi-Structured Data Lecture 5: XSLT Dr. Gamal Al-Shorbagy.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
An Introduction to XML Presented by Scott Nemec at the UniForum Chicago meeting on 7/25/2006.
Essential Tags Web Design – Sec 3-3 Part or all of this lesson was adapted from the University of Washington’s “Web Design & Development I” Course materials.
Learning Web Design: Chapter 4. HTML  Hypertext Markup Language (HTML)  Uses tags to tell the browser the start and end of a certain kind of formatting.
HTML. WHAT IS HTML HTML stands for Hyper Text Markup Language HTML is not a programming language, it is a markup language A markup language is a set of.
CSC 551: Web Programming Fall 2001 emerging & alternate Web technologies  Dynamic HTML  ActiveX  XML course overview  online review sheet  advice.
ECA 228 Internet/Intranet Design I XSLT Example. ECA 228 Internet/Intranet Design I 2 CSS Limitations cannot modify content cannot insert additional text.
Waqas Anwar Next SlidePrevious Slide. Waqas Anwar Next SlidePrevious Slide XML XML stands for EXtensible Markup Language.
Tutorial 1 Developing a Basic Web Page. Objectives Learn the history of the Web and HTML Describe HTML standards and specifications Understand HTML elements.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
XML Schema – XSLT Week 8 Web site:
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
HTML Structure & syntax
HTML Structure & syntax
Intro to HTML CS 1150 Spring 2017.
Online PD Basic HTML The Magic Of Web Pages
HTML – The COMPUTING UNLOCKED GUIDE
HTML5 Basics.
XML Introduction Bill Jerome.
This is the cover slide..
Intro to HTML CS 1150 Fall 2016.
Introduction to OBIEE:
XML Related Technologies
Essential Tags Web Design – Sec 3-3
Do More with Digital Scholarship: XML-Based Metadata
Demystifying Digital Scholarship 10: TEI
XML QUESTIONS AND ANSWERS
XHTML Basics.
Essential Tags Web Design – Sec 3-3
Introduction to XHTML.
Lesson 9 Sharing Documents
WEBSITE DESIGN Chp 1
XHTML Basics.
XHTML Basics.
Digital Design – Copyright Law
What is HTML anyway? HTML stands for HyperText Markup Language. Developed by scientist Tim Berners-Lee in 1990, HTML is the "hidden" code that helps us.
Internet Technologies I - Lect.01 - Waleed Ibrahim Osman
HTML Structure.
XHTML Basics.
Understand basic HTML and CSS terminology, concepts, and basic operations. Objective 3.01.
Tutorial 7 – Integrating Access With the Web and With Other Programs
HTML – The COMPUTING UNLOCKED GUIDE
HyperText Markup Language
XHTML Basics.
CSE591: Data Mining by H. Liu
AN INTRODUCTION BY FAITH BRENNER
HTML Structure & syntax
Lesson 2: HTML5 Coding.
Presentation transcript:

Do More with Digital Scholarship: XML-Based Metadata September 26th, 2017 Matthew Evan Davis Postdoctoral Fellow, Sherman Centre for Digital Scholarship davism17@mcmaster.ca @medievalmatt

So what does XML-Based Metadata mean? XML – eXtensible Markup Language (https://www.w3.org/standards/xml/core) A set of rules for encoding a document in a format that both humans and machines can read. These rules are meant for annotating documents, but can also be used to store data. Very similar in appearance to HTML, but both more flexible and stricter in implementation.

So what does XML-Based Metadata mean? XML – eXtensible Markup Language (https://www.w3.org/standards/xml/core) Metadata – data that gives information about other data (how meta!). It falls into three types: Descriptive metadata is information, such as a title, abstract, author information, or keywords that might help find and identify the underlying data Structural metadata is metadata about how the underlying data is put together – chapters in a book, songs on an album, or other aspects of the types, versions and relationships of materials Administrative metadata is information, such as a creation date, that helps to manage the underlying data.

What’s What?

What’s What?

What’s what?

So what does this do for me? Because XML-based metadata schemes allow you to encode both information about at text and the content of that text, you can do various things with the information despite only having encoded it once. So what? Why not just put your transcription into an excel sheet, a word file, or something else and be done with it? How is this useful?

So what? .pdf files, text files, and html, once created, can generally only be used in the original contexts they were created for.

So what? An XML-based file, on the other hand, can be converted to any of the formats mentioned, can be utilized in whole or in part by other projects, and can be searched on across multiple documents easily.

How does XML do this? Generally speaking, XML comes in various ”flavors,” called standards, that set out rules to lay over the top of the general XML framework, but are specific to the project in question. The example on the previous page was written using the TEI, or Text Encoding Initiative, standard for XML, and that standard can be found at http://www.tei-c.org/Guidelines/P5/. There are a number of other XML standards for metadata, and a good sampling of them (with links to the standards for each) can be found at the Getty’s Metadata Standards Crosswalk page: https://www.getty.edu/research/publications/electronic_publications/intrometadata/crosswa lks.html. A crosswalk is a piece of code that is used to translate from one metadata standard to another. You can also have multiple standards on a single XML page, and if you’re interested in this we can talk about it at the workshop on Friday. No matter the standard, though, there are certain aspects that remain the same because they’re part of the underlying XML framework.

Terminology XML, much like HTML, is built on the concepts of start tags, end tags, and empty elements: <p>The quick brown fox jumped over the lazy dog</p> <p/> The element is the information contained by the angle brackets, and the slash indicates whether it is at the beginning of the element, the end of the element, or an empty element.

Terminology The example given in the last slide consists entirely of elements: <p>The quick brown fox jumped over the lazy dog</p> <p/> If you want to add additional information to an XML element that is meant to be machine readable, but not readable by humans, you can include what is called an attribute: <p rend=“align(center) case(allcaps)”>The quick brown fox jumped over the lazy dog</p> The description of each element in the TEI guidelines not only tells you what the intended purpose of the element is, but what attributes can be attached to that element. For the attributes that can attach to the element <p>, browse to http://www.tei-c.org/release/doc/tei- p5-doc/en/html/ref-p.html.

Terminology

How do I actually build a document? This is the smallest TEI document possible: <TEI version="5.0" xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <fileDesc> <titleStmt> <title>The shortest TEI Document Imaginable</title> </titleStmt> <publicationStmt> <p>First published as part of TEI P2, this is the P5 version using a name space.</p> </publicationStmt> <sourceDesc> <p>No source: this is an original work.</p> </sourceDesc> </fileDesc> </teiHeader> <text> <body> <p>This is about the shortest TEI document imaginable.</p> </body> </text> </TEI> A TEI document consists of two parts – the metadata about the document, which is contained in a tree under the <teiHeader>, and the actual document itself, which is contained either under a <text> or a <sourceDoc> element..

How do I actually build a document? Everything else you do is a process of checking the guidelines, determining what element matches with what you are trying to transcribe, and applying it accordingly. The underlying structure of the TEI Header can be found at http://www.tei-c.org/Vault/P5/3.1.0/doc/tei-p5- doc/en/html/HD.html, while the underlying structure for a default text can be found at http://www.tei-c.org/Vault/P5/3.1.0/doc/tei- p5-doc/en/html/DS.html. Note that the default text structure does not handle every situation that is likely to come up! The Table of Contents for the full Guidelines (at http://www.tei-c.org/Vault/P5/3.1.0/doc/tei-p5- doc/en/html/) can be useful for ferreting out the various ways that your document fits (and doesn’t fit!) TEI as it is written.

Editorial Tools Anything that can edit text can be used to create an XML document. XML-specific tools simply allow you to validate the XML for correctness against the schema, run various programmatic transformations against the document, or autocomplete certain elements. Possible examples (a fuller list comparing features can be found on Wikipedia: https://en.wikipedia.org/wiki/Comparison_of_XML_editors): OxyGen (https://www.oxygenxml.com/): academic license (US$99) Xeditor (http://www.xeditor.com/portal): web-based, pricing by enquiry. Adobe Framemaker (http://www.adobe.com/products/framemaker.html): commerical license (US$999)

Online Tools TAPAS (http://tapasproject.org/): designed to “Visualize, Store, and Share Your TEI.” CWRC-Writer (http://www.cwrc.ca/projects/infrastructure- projects/technical-projects/cwrc-writer/). Based on a modified version of TEI Lite (http://www.tei- c.org/Guidelines/Customization/Lite/) http://apps.testing.cwrc.ca/editor/dev/index.htm Username: cwrc Password: cwrcy

Displaying, Searching and Working With an XML Text There are two languages that will let you manipulate the document you’ve just created: XSLT and XQuery. XSLT stands for eXtensible Stylesheet Langauge Transformations. It is a language designed to transform XML documents into other formats – other versions of XML, Word Documents, plain text, etc. XQuery is a programming language that can both query and transform collections of data, including TEI documents. XSLT and XQuery are similar to, but significantly different in structure to XML, and idiosyncratic enough that it’s easier to talk about them with an example of your encoded data to work with.

Matthew Evan Davis davism17@mcmaster.ca @medievalmatt Thank you! Matthew Evan Davis davism17@mcmaster.ca @medievalmatt