Download presentation
Presentation is loading. Please wait.
Published byWalter Welch Modified over 9 years ago
1
Linguistic Annotation and Standoff Markup Henry S. Thompson HCRC Language Technology Group World Wide Web Consortium Markup Technology Ltd. University of Edinburgh © 2001 Henry S. Thompson
2
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 2 Ontology and 'ontology' n It's the 'in' word just now n Philosophy ä The nature of being(s) n Computing Industry ä Scholastic taxonomy ä I.e. (a description of) a data model n Where does XML fit in?
3
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 3
4
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 4 XML is ASCII for the 21st century n ASCII (ISO 646) solved a fundamental interchange problem for flat text documents ä What bits encode what characters –(For a pretty parochial definition of 'character') n UNICODE/ISO 10646 extends that solution to the whole world n XML thought it was doing the same for simple tree- structured documents ä The emphasis in the XML design was on simplifying SGML to move it to the Web ä XML didn't touch SGML's architectural vision –flexible linearisation/transfer syntax –for tree-structured documents with internal links
5
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 5 The essence of XML n It's a markup language used for annotating text n It is concerned with logical structure ä to identify sections, titles, section headers, chapters, paragraphs,… n It is not concerned with appearance ä you say 'this is a subtitle' not 'this is in bold, 14pt, centered' ä you say 'this is an example' not 'this is in verbatim, indented by 5pts, ragged right' n It is authored and consumed by people
6
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 6 XML marked up text Internet-based Application Architectures for the 21st Century: The Role of XML Let's skip straight to an example of XML syntax for a simple bit of structure: <tip><emph>Never</emph> stand up in a canoe!</tip>
7
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 7 Connecting structure and form n There is a stylesheet language called XSLT which will allow us to write simple style rules which will produce the formatted presentation from the structured version n For example will do part of the Transformation job
8
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 8 The essence of XML, mark two n It's a markup language used for transferring data n It is concerned with data models ä to convert between application-appropriate and transfer-appropriate forms n It is not concerned with human beings ä It's produced and consumed by programs
9
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 9 What just happened!? n The whole transfer syntax story just went meta, that's what happened! n XML has been a runaway success, on a much greater scale than its designers anticipated ä Not for the reason they had hoped –Because separation of form from content is right ä But for a reason they barely thought about –Data must travel the web n Tree structured documents are a useable transfer syntax for just about anything ä So data-oriented web users think of XML as a transfer mechanism for their data
10
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 10
11
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 11 What's missing? n The relationship between transfer syntax and tree-structured data is well-defined ä Defined by XML 1.0 + XML Namespaces + XML Infoset n The relation between tree-structured data and application data is not ä Left up to each application –an XML application = syntax and semantics ä No official or even consensus standard for expressing the relation –So more-or-less ad-hoc scripting solutions predominate
12
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 12 What's missing? n The relationship between transfer syntax and tree-structured data is well-defined ä Defined by XML 1.0 + XML Namespaces + XML Infoset n The relation between tree-structured data and application data is not ä Left up to each application –an XML application = syntax and semantics ä No official or even consensus standard for expressing the relation –So more-or-less ad-hoc scripting solutions predominate –And it's easy to confuse domain analysis ('ontology') and document design (DSD)
13
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 13 A twin-track declarative approach n My colleague Ari Krupnikov and I are working on an approach based on annotating W3C XML Schemas with data-binding information n In looking at existing uses of markup, a number of pre-existing patterns of practice emerged ä A data-oriented example A data-oriented example n Raises the question of what aspects of the XML Data Model map to what aspects of the application data, at a generic/ontological level
14
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 14 Where does annotation fit in? n It's not just a lexical accident n Annotation is markup ä In practice, often quite literally, with coloured pens n The question of semantics remains n So we're in the curious state of using trees both as data model and as external representation n There's a tension between two views of XML documents: ä Opaque transfer mechanism ä Repurposable information store –XML Query/XPath/XSLT
15
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 15 Overlapping Hierarchies n In eithercase, overlapping more-or-less orthogonal annotations are a challenge n Consider for example annotating poetry n There is a verse/stanza/line perspective n And a sentence/clause/phrase perspective n Ordinary trees can't handle this n Initially we thought of it as a markup problem ä But latterly we've embraced schizophrenia
16
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 16 A little sloppiness is a good thing n Just as in ordinary natural language communication n XML trees can bear a wide range of interpretations n Lack of absolute precision is not necessarily a flaw ä The ontology of linguistic artefacts in general, and annotation in particular, is just not clear ä XML/Trees seem to be at a useful point with respect to concreteness and precision
17
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 17 What’s standoff markup? n Separating markup from the material marked up n Three obvious reasons ä Base material may be read-only and large –Or not freely distributable –Or of necessity somewhere else ä Markup may involve multiple overlapping hierarchies ä Multiple analysts may be at work simultaneously
18
Annotation; Standoff MarkupHenry S. Thompson IRCS, Philadelphia, 2001-12-12 18 Where's the beef? n At the data model level, there's not much difference: ä Instead of a parentfunction from node to node ä We have a children function from node to node sequence n At the document level, this means using reference mechanisms (URIs) instead of containment n In practice, the distinction provides a lot of leverage
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.