Download presentation
Presentation is loading. Please wait.
Published byUriel Wilder Modified over 10 years ago
1
From characters to text: XML in a nutshell Tamás Váradi varadi@nytud.hu
2
BTANT129 w32 Introduction The need for text markup A simple example XML annotation HTML vs. XML Benefits Applications Tools
3
BTANT129 w33 From characters to texts If the computer sees only character streams, how to build up the notion of texts? Conventional means to indicate text structure: –formatting –layout –still we often need to understand the text element to recognize its role (semantics!)
4
BTANT129 w34 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu name position dept. univ. address dept. email
5
BTANT129 w35 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu position dept. univ. address dept. email
6
BTANT129 w36 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu dept. univ. address dept. email
7
BTANT129 w37 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu univ. address dept. email
8
BTANT129 w38 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu address dept. email
9
BTANT129 w39 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu dept. email
10
BTANT129 w310 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu email
11
BTANT129 w311 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu
12
BTANT129 w312 No need for formatting anymore Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu
13
BTANT129 w313 An XML file is born Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros Miskolc-Egyetemváros varadi@nytud.hu Zuzsanna Fülöp http://www.oasis-open.org/committees/relax-ng/tutorial.html
14
BTANT129 w314 A close look at XML tags http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xml-p2.html
15
BTANT129 w315 An empty tag http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xml-p2.html
16
BTANT129 w316 Well-formed XML: some basic rules Everything is embedded in one element –The whole file has one root element No overlapping tags – proper embedding Let's call the whole thing off WRONG! Let's call the whole thing off RIGHT!
17
BTANT129 w317 Well-formed XML –some basic rules No unclosed tags Attribute values must be in quotes The text characters ( ) and („) must be in character entities –< –> –"
18
BTANT129 w318 DTD: Document type description ]>
19
BTANT129 w319 HTML HyperText Markup Language The Internet is based on the notion of Hypertext HTML goes back to SGML (Standard Generalized Markup Language) HTML: Display oriented markup language XML: Designed to capture Content
20
BTANT129 w320 The pretty-printed surface http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xml-p2.html
21
BTANT129 w321 the code in HTML Lime Jello Marshmallow Cottage Cheese Surprise Lime Jello Marshmallow Cottage Cheese Surprise My grandma's favorite (may she rest in peace). Ingredients Qty Units Item 1 box lime gelatin 500 g multicolored tiny marshmallows 500 ml cottage cheese dash Tabasco sauce (optional) Instructions Prepare lime gelatin according to package instructions...
22
BTANT129 w322 The same info coded in XML Lime Jello Marshmallow Cottage Cheese Surprise My grandma's favorite (may she rest in peace). 1 lime gelatin 500 multicolored tiny marshmallows 500 Cottage cheese Tabasco sauce Prepare lime gelatin according to package instructions
23
BTANT129 w323 The "grammar" of text: DTD
24
BTANT129 w324 HTML vs. XML HTML: Display oriented markup language Tag-set is fixed and serves display Here to stay as a page description language XML: Designed to capture Content Tag-set is open and can be suited for content XML is a general purpose annotation scheme to encode data
25
BTANT129 w325 Conclusions Computers do not recognize text structure and text elements To do so, they often would need to "understand" and "interpret" text Until they are smart enough (if ever) to do so, we explicitely mark up text in a standard way, using annotation that the machines can parse
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.