Download presentation
Presentation is loading. Please wait.
Published byAlexandra Harmon Modified over 8 years ago
1
SemAF – Basics: Semantic annotation framework Harry Bunt Tilburg University isa -6 Joint ISO - ACL/SIGSEM workshop Oxford, 11 - 12 January 2011 TC 37/SC 4/WG 2 Kiyong Lee, convenor
2
Outline Background: ad-hoc Task Domain Group TDG 3; LIRICS; SemAF part 1 (time and events); part 2 (dialogue acts);... General (ISO, LAF) considerations on annotation standards Specific LAF requirements Additional or elaborated methodological requirements -Principle of Additivity (Complementarity) -Abstract versus concrete syntax -Semantics for abstract syntax -Requirements on representation formats -Metamodel and abstract syntax -Core entities, extensions and subschemas -Layers and integrated annotation/representation Conclusion: How to move further?
3
Aims Make explicit what is, or should be, common to the various parts of SemAF (24617) Ensure consistency of the various parts of SemAF (24617): Their aims Their methodology Their annotation schemes Their representation schemes Provide guidelines for future parts of SemAF
4
General requirements on linguistic annotation standards Media independence (common mechanisms should be provided to handle all media types, including text, audio, video, etc.) Data integrity (use standoff rather than inline representation format) Machine processibility (representations must be machine readable and interpretable; the burden of interpretation should not be left to the processing software) Human readability (representations must be human readable, at least for creation and editing)
5
LAF requirements: Distinguish annotation from representation. An annotation is certain linguistic information that is added to language data, independent of its representation. A representation is the format into which annotation is rendered, independent of its content. Distinguish systematically between content and reference in annotation representations Uniform and TEI-compliant way of referring to relevant segments of source data Uniform way of cross-referencing between different layers of annotation
6
SemAF-specific requirements (1) Semantic additivity (semantic annotations should add semantic information to source data (rather than, e.g., ‘flag’ semantic phenomena)) Semantic explicitness (information in an annotation scheme must be explicit: the burden of interpretation should not be left to the processing software) Conceptual consistency (concepts used in annotations in different SemAF-parts should have the same meaning; related concepts in different SemAF-parts should be semantically consistent; underlying meta models should be mutually consistent) Representational consistency (a single mechanism should be used to represent the same type of information; there must be a consistent underlying data model)
7
SemAF-specific requirements (2) Methodological consistency ( Bunt, ICGL-2 Hong Kong, January 2010; Ide & Bunt, LAW-IV, Uppsala, July 2010): Conceptual analysis: metamodel Abstract syntax: extended formal specification of metamodel Definition of formal semantics of abstract syntax Concrete syntax: definition of ‘ideal’ representation format Core entities; extensions; subschemas Relation to Data Category Registry
8
Additivity and Explicitness Annotations (ad notare ≈ adding notes to) add information to portions of source text (cf. LAF); semantic annotations add semantic information to source text. Semantic annotations can only count as such if they have a formal semantics (Bunt & Romary, 2002), which makes them machine-interpretable.
9
Conceptual consistency ISO-TimeML: events subdivided into transitions, processes, and states; ISO-Semantic Roles? ISO-TimeML: event-time relations like AT, DURING; DURATION; ISO-Semantic Roles: temporal semantic roles ISO-Space: event-location relations; ISO-Semantic Roles: semantic roles relating motion events to locations etc. (Location, Source, Goal, Distance,..) ISO-Dialogue Acts: rhetorical relations between dialogue acts like Explanation, Justification Exemplification; ISO-DS: similar discourse relations
10
Abstract and concrete syntax of an annotation language Abstract syntax is a formal specification of the categories of objects and relations in a metamodel, describing how these elements may be combined to form annotations, defined as set-theoretical constructs; Concrete syntax specifies a particular format for the representation of annotations. The abstract/concrete syntax distinction implements the fundamental distinction between annotations and representations made by LAF.
11
Semantics, abstract and concrete syntax Semantics of semantic annotations should be defined for abstract syntax, rather than for some concrete representation format. Advantage: every representation format for the same abstract syntax has the same semantics
12
Requirements on representation formats Expressive adequacy: each annotation structure can be represented in this format; ‘Unambiguity’: each representation encodes a unique annotation structure. A representation format that satisfies these requirements is called ideal (Bunt, ICGL-2, Hong Kong, January 2010) Representations in one ideal format can be converted in a meaning-preserving way to any other ideal format.
13
Ideal concrete syntax abstract syntax ideal concrete syntax-1 semantics F 1 F 1 -1 IaIa ideal concrete syntax-2 F 2 -1 F2F2 C 12 C 21
14
Core concepts, extensions, and subschemas; and the DCR A standard specifies: core concepts; principles for adding elements to the set of core concepts; principles for subschemas of a standard annotation schema. Core concepts should be entered into the ISO DCR
15
Things that cut across SemAF parts Overlaps, e.g. Events and their classification (ISO-TimeML, ISO-Space, Semantic roles) Time and place (ISO-TimeML, ISO-Space, Semantic roles, ISO-NE) Rhetorical and other coherence relations in dialogue and discourse (ISO-Dialogue acts, ISO-DS) Cutting across: Negation; modality Quantification; modification
16
References Bunt, Harry (2010) A methodology for designing semantic annotation languages. In Proceedings of the 2nd International Conference on Global Interoperability for Language Resources (ICGL-2), Hong Kong, January 2010, pp. 29-46. Bunt, Harry (2011) Multifunctionality in dialogue. Computer, Speech and Language 25, 225-245. Ide, Nancy and Harry Bunt (20100 Anatomy of semantic annotation schemes: Mappings to GrAF. In Proceedings of the 4th Linguistic Annotation Workshop (LAW-IV), Uppsala, July 2010.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.