Linguistic Annotation Framework SC4 WG 1 Nancy Ide Vassar College USA.

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

XML: Extensible Markup Language
2012: J Paul GibsonT&MSP: Mathematical FoundationsMAT7003/L2-GraphsAndTrees.1 MAT 7003 : Mathematical Foundations (for Software Engineering) J Paul Gibson,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Introduction to Graphs
Fundamentals of Computer Networks ECE 478/578 Lecture #13: Packet Switching (2) Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.
Graphs Chapter Chapter Contents Some Examples and Terminology Road Maps Airline Routes Mazes Course Prerequisites Trees Traversals Breadth-First.
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Graphs Chapter 20 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Chapter 8, Part I Graph Algorithms.
Graphs Chapter 30 Carrano, Data Structures and Abstractions with Java, Second Edition, (c) 2007 Pearson Education, Inc. All rights reserved X.
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 excerpts Graphs (breadth-first-search)
ITEC200 – Week 12 Graphs. 2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study.
Merging Models Based on Given Correspondences Rachel A. Pottinger Philip A. Bernstein.
Introduction to Graphs
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Spring 2010CS 2251 Graphs Chapter 10. Spring 2010CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs.
16-Graphs Graphs Fonts: MTExtra:  (comment) Symbol:  Wingdings: Fonts: MTExtra:  (comment) Symbol:  Wingdings:
Chapter 4: Straight Line Drawing Ronald Kieft. Contents Introduction Algorithm 1: Shift Method Algorithm 2: Realizer Method Other parts of chapter 4 Questions?
1 Section 8.4 Connectivity. 2 Paths In an undirected graph, a path of length n from u to v, where n is a positive integer, is a sequence of edges e 1,
Fall 2007CS 2251 Graphs Chapter 12. Fall 2007CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs To.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
More Graph Algorithms Weiss ch Exercise: MST idea from yesterday Alternative minimum spanning tree algorithm idea Idea: Look at smallest edge not.
Graphs Chapter 20 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Naming Names in computer systems are used to share resources, to uniquely identify entities, to refer to locations and so on. An important issue with naming.
Important Problem Types and Fundamental Data Structures
Introduction to Data Structures. Definition Data structure is representation of the logical relationship existing between individual elements of data.
A Web Application for Customized Corpus Delivery Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science Vassar College USA.
Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
Chapter 14 Graphs. © 2004 Pearson Addison-Wesley. All rights reserved Terminology G = {V, E} A graph G consists of two sets –A set V of vertices,
Graphs. Definitions A graph is two sets. A graph is two sets. –A set of nodes or vertices V –A set of edges E Edges connect nodes. Edges connect nodes.
SynAF:Provo ISO Meeting Thierry Declerck, DFKI GmbH.
Introduction to Graph Theory
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Graphs. Graphs Similar to the graphs you’ve known since the 5 th grade: line graphs, bar graphs, etc., but more general. Those mathematical graphs are.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 13: Graphs Data Abstraction & Problem Solving with C++
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
© 2006 Pearson Addison-Wesley. All rights reserved 14 A-1 Chapter 14 Graphs.
– Graphs 1 Graph Categories Strong Components Example of Digraph
Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: There is a unique simple path between any 2 of its.
Graphs Upon completion you will be able to:
SemAF – Basics: Semantic annotation framework Harry Bunt Tilburg University isa -6 Joint ISO - ACL/SIGSEM workshop Oxford, January 2011 TC 37/SC.
Graphs Chapter 28 © 2015 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved. Data Structures and Abstractions with Java, 4e Frank Carrano.
Introduction to Graph Theory By: Arun Kumar (Asst. Professor) (Asst. Professor)
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
Graph Concepts Illustrated Using The Leda Library Amanuel Lemma CS252 Algorithms.
Data Structures and Algorithm Analysis Graph Algorithms Lecturer: Jing Liu Homepage:
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
1 GRAPHS – Definitions A graph G = (V, E) consists of –a set of vertices, V, and –a set of edges, E, where each edge is a pair (v,w) s.t. v,w  V Vertices.
CSC317 1 At the same time: Breadth-first search tree: If node v is discovered after u then edge uv is added to the tree. We say that u is a predecessor.
Spanning Trees Alyce Brady CS 510: Computer Algorithms.
Discrete Structures Li Tak Sing( 李德成 ) Lectures
© 2006 Pearson Addison-Wesley. All rights reserved14 B-1 Chapter 14 (continued) Graphs.
Logical Database Design and the Rational Model
Basic Concepts Graphs For more notes and topics visit:
Graph theory Definitions Trees, cycles, directed graphs.
State Machine Diagrams
More Graph Algorithms.
Graphs Chapter 11 Objectives Upon completion you will be able to:
Data Model.
Chapter 11 Graphs.
Trees-2, Graphs Data Structures with C Chpater-6 Course code: 10CS35
Important Problem Types and Fundamental Data Structures
Chapter 14 Graphs © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Linguistic Annotation Framework SC4 WG 1 Nancy Ide Vassar College USA

LAF Goal ISO TC37 SC4 - WG 1  Provide a generic means to represent linguistic data and annotations  Based on a formal model  Users map their formats into/out of LAF  User formats must conform to underlying model  Pivot or “dump” format for exchange, machine processing

User A’s representation User B’s representation DUMP FORMAT “interlingua”

Principles  Separation of data and annotations  Stand-off annotation  Separation of user annotation formats and the exchange (“dump”) format  Mappable to one another  Separation of referential structure and annotation content in dump format  Separation of annotation structure (relationships among parts) and content (data categories) in representation of annotations

LAF Development  LAF has gone through a slow evolution  Model development (GMT as base)  Consideration of processing needs  Application to different annotation types/structures/formats  Adjustments to development in other WGs on specific annotation types and feature structures  “Proof of concept” instantiation in the American National Corpus  Transduction of several different annotation types and formats to LAF format  API to merge, transduce to other formats

LAF Status  Have now  Reduced FS specification  Final XML format / schema  GrAF : Graph Annotation Format  Mapping “rules” and examples  Also  Coordination with UIMA  Header specification including information about annotation, similar to UIMA type definition

Basic Model  Annotation content represented by feature structures  Powerful means to represent any/all annotations  Referential structure represented as a directed acyclic graph (DAG)  Enables exploitation of well-understood graph traversal and manipulation algorithms

Referential Structure  Means by which annotation content is associated with primary data or other annotations  Very simple DAG model  No need to consider internal structure of annotation content (i.e. relations among bits of annotation information)

Primary Data  Primary data contains no annotations  “Read-only”  Modifications can be regarded as annotations  Insistence on the identification of a base segmentation of the primary data  Identifies contiguous sequences of indivisible logical units  For text, usually a character  “Compatible” annotations (i.e. those that can be merged etc.) use common base segmentation

Primary Segmentation  Set of disjoint edges over primary data  Vertices  Virtual, located between each logical unit  Sequentially numbered  Edges  Each edge (x,y) in the graph delimits a non- divisible region of primary data  Comformance to MAF, SynAF  call these edges over primary data a span

 Multiple primary segmentations may be defined over a single primary data set  Specify segmentations at different levels of granularity  A segmentation is “primary” vis a vis a given annotation, not the data itself  Edges in a primary segmentation can be defined over any span of contiguous primary data, regardless of its length  No need for spans to be contiguous  For text, most common primary segmentation is the token

Referring to Primary Segmentation  Define an edge graph over the edges (spans) in the primary segmentation  Given an edge set, E, create an edge graph E’ such that for each edge (x,y) in E, there is a vertex xy in E’  Annotations are associated with regions of primary data by referencing the edge graph vertices  Annotations never reference the primary data directly

 Edges in E’ are defined when annotations reference vertices in E’  Vertices may or may not be contiguous  An annotation is associated with vertices in E’ as follows: 1. Create a new vertex, v 2. Label it with the FS containing the annotation content 3. Create an edge from v to 0 or more vertices in E’  Zero reference is used in the special case where the annotation applies to information not present in the data  References to 2 or more vertices in E’ by by default concatenate the information covered by the referenced vertices (in order)  can be overridden to specify vertices are to be regarded as an ordered list or “bag”

Edge graph over primary data |T|h|e| |c|l|o|c|k| |s|t|r|u|c|k| |t|w|e|n|t|y|-|t|w|o| | Annotations associated with vertices in the primary data edge graph type=token pos=nn base=clock type=token pos=det base=the type=token pos=vbd base=strike type=token pos=cd base=twenty+two

As many annotations as desired can reference the same segmentation or be layered over lower-level annotations S EG 2 Primary data MS1 MS2NP Syn2 Co-Ref Syn1 SEG1SEG1 MS3Sem

Annotating Annotations  Vertices in an annotation may be referenced from other annotations 1. Create a new vertex, v’ 2. Label it with the FS containing the annotation content 3. Create an edge from v’ to one or more vertices associated with an annotation  The strategy described above may be applied recursively, thus creating a DAG whose leaves are the vertices in E’

Annotations associated with token annotations type=np number=sing type=vp tense=past type=np number=sing type=token pos=nn base=clock type=token pos=det base=the type=token pos=vbd base=strike type=token pos=cd base=twenty+two |T|h|e| |c|l|o|c|k| |s|t|r|u|c|k| |t|w|e|n|t|y|-|t|w|o| |

XML Instantiation

Token Annotation Creates a new vertex (node) associated with the FS with a single edge to vertex “e2” in the primary segmentation edge graph

NP Annotation Creates a new vertex (node) associated with the FS with two outgoing edges to vertices “t1” and “t2” in the token annotation

Question ISO TC37 SC4 - WG 1 Beijing 2006  When referring to annotations, edge targets typically represent components  E.g. in the example: “the” and “clock” are components of “NP”  But this is not always the case  Could be e.g. a list of co-referents  Others?  Possible solution: let the processor deal with it using the FS type

Note ISO TC37 SC4 - WG 1 Beijing 2006  Edges are never labeled, unlike in many linguistic analyses  Preserves simplicity of the graph  Relations are DatCats  edgesTo attribute can be empty  Can create pseudo-nodes  Implies a flat (non-nested) structure in the dump format

ISO TC37 SC4 - WG 1 Beijing 2006 obj head s FLEA HAVE head gen subj DOG MY [DOG]

Advantages of DAG ISO TC37 SC4 - WG 1 Beijing 2006  Can apply graph algorithms to traverse the graph  Breadth-first, depth-first traversal, shortest path, minimum spanning tree  Connectedness, articulation vertices  Topological sort  Graph coloring, graph partitioning  Etc.  What can we do with this?  What is all info on path to/from node x  What is nearest common ancestor of nodes x and y  Find matching sub-graphs  Identify connected components  Which nodes (phenomena) are most connected, form articulation vertices, etc.  …

Feature Structures ISO TC37 SC4 - WG 1 Beijing 2006  Each edge is labeled with a feature value  Can be FS, collection (list, bag, set), atom  Alternation and grouping handled by the FS mechanisms  Need to identify “basic” FS mechanisms  90% of annotations use only these  Annotations may (optionally) use only this set  Ease of use  No need to implement procedures to handle full power of FS  Need to create a FS library for abbreviation

Implications for Other WGs ISO TC37 SC4 - WG 1 Beijing 2006  Should (conceptually at least) separate referential structure from annotation content  E.g. “tlink” in TimeML/SemAF: the link itself is the edge, “tlink” is the annotation content (?)  Need for coordination  Inter-project coordination committee?  Need examples!

Today’s Work ISO TC37 SC4 - WG 1 Beijing 2006  Discuss the format in terms of specific annotation types  Remember that dump format is in principle never seen by the user  Map user format into and out of dump format  Two topics  DAG for referential structure  FS for representing annotation content