Presentation is loading. Please wait.

Presentation is loading. Please wait.

From characters to text: XML in a nutshell Tamás Váradi

Similar presentations


Presentation on theme: "From characters to text: XML in a nutshell Tamás Váradi"— Presentation transcript:

1 From characters to text: XML in a nutshell Tamás Váradi varadi@nytud.hu

2 BTANT129 w32 Introduction The need for text markup A simple example XML annotation HTML vs. XML Benefits Applications Tools

3 BTANT129 w33 From characters to texts If the computer sees only character streams, how to build up the notion of texts? Conventional means to indicate text structure: –formatting –layout –still we often need to understand the text element to recognize its role (semantics!)

4 BTANT129 w34 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu name position dept. univ. address dept. email

5 BTANT129 w35 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu position dept. univ. address dept. email

6 BTANT129 w36 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu dept. univ. address dept. email

7 BTANT129 w37 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu univ. address dept. email

8 BTANT129 w38 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu address dept. email

9 BTANT129 w39 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu dept. email

10 BTANT129 w310 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu email

11 BTANT129 w311 A simple example Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu

12 BTANT129 w312 No need for formatting anymore Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros (+36) 46 1234589/10-75 varadi@nytud.hu

13 BTANT129 w313 An XML file is born Tamás Váradi Chair Department of English Linguistics Miskolc University Miskolc-Egyetemváros Miskolc-Egyetemváros varadi@nytud.hu Zuzsanna Fülöp http://www.oasis-open.org/committees/relax-ng/tutorial.html

14 BTANT129 w314 A close look at XML tags http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xml-p2.html

15 BTANT129 w315 An empty tag http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xml-p2.html

16 BTANT129 w316 Well-formed XML: some basic rules Everything is embedded in one element –The whole file has one root element No overlapping tags – proper embedding Let's call the whole thing off WRONG! Let's call the whole thing off RIGHT!

17 BTANT129 w317 Well-formed XML –some basic rules No unclosed tags Attribute values must be in quotes The text characters ( ) and („) must be in character entities –< –> –"

18 BTANT129 w318 DTD: Document type description ]>

19 BTANT129 w319 HTML HyperText Markup Language The Internet is based on the notion of Hypertext HTML goes back to SGML (Standard Generalized Markup Language) HTML: Display oriented markup language XML: Designed to capture Content

20 BTANT129 w320 The pretty-printed surface http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xml-p2.html

21 BTANT129 w321 the code in HTML Lime Jello Marshmallow Cottage Cheese Surprise Lime Jello Marshmallow Cottage Cheese Surprise My grandma's favorite (may she rest in peace). Ingredients Qty Units Item 1 box lime gelatin 500 g multicolored tiny marshmallows 500 ml cottage cheese dash Tabasco sauce (optional) Instructions Prepare lime gelatin according to package instructions...

22 BTANT129 w322 The same info coded in XML Lime Jello Marshmallow Cottage Cheese Surprise My grandma's favorite (may she rest in peace). 1 lime gelatin 500 multicolored tiny marshmallows 500 Cottage cheese Tabasco sauce Prepare lime gelatin according to package instructions

23 BTANT129 w323 The "grammar" of text: DTD

24 BTANT129 w324 HTML vs. XML HTML: Display oriented markup language Tag-set is fixed and serves display Here to stay as a page description language XML: Designed to capture Content Tag-set is open and can be suited for content XML is a general purpose annotation scheme to encode data

25 BTANT129 w325 Conclusions Computers do not recognize text structure and text elements To do so, they often would need to "understand" and "interpret" text Until they are smart enough (if ever) to do so, we explicitely mark up text in a standard way, using annotation that the machines can parse


Download ppt "From characters to text: XML in a nutshell Tamás Váradi"

Similar presentations


Ads by Google