Digital Media Technology Week 3: Introduction to TEI Peter Verhaar.

Digital Media Technology Week 3: Introduction to TEI Peter Verhaar.

1 Digital Media Technology Week 3: Introduction to TEI Peter Verhaar

2 □ Elements □ Attributes □ DTD □ Well-formed XML □ Valid XML □ Meta-language Terminology


4 □ More advanced search actions □ Explicit expression of implicit information □ Logical structure of the text □ Intellectual contents of the text (semantics) Text Encoding

5 Logical structure

6 Semantic contents Though his chief focus was the eighteenth century – before the convergence of linguistic and national boundaries had consolidated – Robert Darnton’s remarks in What Is the History of Books? are pertinent for any period.

7 Journal Article paragraph quotation authortitle name

8 Book Trade Archives to Book Trade Networks Adriaan van der Weel and Peter Verhaar Though his chief focus was the eighteenth century – before the convergence of linguistic and national boundaries had consolidated – Robert Darnton ’s remarks in What Is the History of Books ? are pertinent for any period.

9 Uses of text encoding □ “Intelligent texts”: Searching beyond free text searches □ Indexes □ Separation of form and content

10 □ OHCO theory: Ordered Hierarchy of Content Objects □ Multiple hierarchies? Book Chapter Section Paragraph Sentence Book Cover Section / gathering Folium

11 Text Encoding Initiative □ More than 500 elements □ Developed by consortium of scholars □ First established in 1987 □ Text in general: “texts in any natural language, of any date, in any literary genre”





16 Dear Sirs, I will accept £10 for the rights to make a translation into Dutch of my novel entitled Wanda Printers will send you entire proofs from London instantly. Please to send money on receipt of this / Address Madame Ouida. ~c. 2 words illegible~ ~c. 1 word illegible~ Ouida L. de la Ramée

17 Yrs. Yours Impressions Impressions of Theophrastus Such

18 Madame Ouida London

19 □ Character encoding scheme □ Uses 7 bits (128 characters)bits ASCII

20 □ 16 bits □ UTF-8 □ 1,112,064 characters Unicode α : α


22 Entities En réponse à votre lettre du 30 Janvier nous avons l'honneur de vous informer que nous avons payé Mon- sieur Midderigh déjà depuis longtemps et presque toujours d'avance.

23 VLQ 1

24 Entities This sentence is in the <p> element. >Greater than <Less than "Quotation mark &Ampersand

25 Comments Used to improve the readability of the XML document:

