BRAT: a web based tool for manual annotation Hans Paulussen ITEC, KU Leuven KULAK
Overview Annotation BRAT LCF (Learner corpus French) Alternative editors Conclusion
Annotation
Annotation = metadata: o data on data Edition of textual data or multimedia data requires different approach: stand-off vs. inline markup Typical multimedia editors: ELAN & ADVENE o o
Stand-off vs inline annotation Inline: o Data and metadata (annotation or markup) are intermingled Stand-off: o Metadata is stored in a separate document, using reference anchors o Alignment: based on token or character offsets o Primary data is left untouched
Inline John went to Paris yesterday. He loved the excursion. John_NNP went_VBD to_TO Paris_NNP yesterday_NN._. He_PRP loved_VBD the_DT excursion_NN._. John_NNP went_VBD to_TO Paris_NNP yesterday_NN._. He_PRP loved_VBD the_DT excursion_NN._.
Stand-off John went to Paris yesterday. He loved the excursion. 1 4 NNP 6 9 VBD TO NNP NN PRP VBD DT NN
Stand-off
BRAT
BRAT rapid annotation tool: online environment for collaborative text annotation o
Motivation Web-based environment Multi-user Easy to install & configure “Comprehensive” visualization Well-documented
LCF Learner corpus French
LCF LCF: Learner corpus French French texts written by Dutch students from 4 Flemish institutions 500K words (971 texts) Text types: argumentative, informative, journalistic, letter, Self-portrait, summary
LCF
Configuring BRAT Corpus preparation: conversion XML to read-only text format Create annotatation configuration file Set up user accounts Create export filter to summarize annotated features
LCF
Alternative editors
Alternative annotation editors /1 MAT (MITRE Annotation Toolkit): a suite of tools which can be used for automated and human tagging of annotations. o TEITOK (The Tokenized TEI Environment): a web-based system for viewing, creating, and editing corpora with both rich textual mark-up and linguistic annotation o EGAS: a web-based platform for biomedical text mining and collaborative curation, supporting manual and automatic annotation of concepts and relations. o
Alternative annotation editors /2 TextAE: web-based (RESTful) annotation editor for HTML documents o WebAnno: a general purpose web-based annotation tool for a wide range of linguistic annotations o
WebAnno workflow
WebAnno pro and cons First impressions (from colleagues): o Improved project and user management o Browser ‘sensitive’ behaviour o Accepts larger texts than Brat o Data management only possible when files are closed
Conclusion Annotation editors for textual data have improved considerably, mainly because of standardisation of data format (XML) and web technology (HTML5) Selection of editor depends mainly on user friendliness of tool and quality of the features for further exploitation