Download presentation
Presentation is loading. Please wait.
Published byHannah Arnold Modified over 8 years ago
1
BRAT: a web based tool for manual annotation Hans Paulussen ITEC, KU Leuven KULAK
2
Overview Annotation BRAT LCF (Learner corpus French) Alternative editors Conclusion
3
Annotation
4
Annotation = metadata: o data on data Edition of textual data or multimedia data requires different approach: stand-off vs. inline markup Typical multimedia editors: ELAN & ADVENE o https://tla.mpi.nl/tools/tla-tools/elan/ https://tla.mpi.nl/tools/tla-tools/elan/ o http://liris.cnrs.fr/advene/ http://liris.cnrs.fr/advene/
5
Stand-off vs inline annotation Inline: o Data and metadata (annotation or markup) are intermingled Stand-off: o Metadata is stored in a separate document, using reference anchors o Alignment: based on token or character offsets o Primary data is left untouched
6
Inline John went to Paris yesterday. He loved the excursion. John_NNP went_VBD to_TO Paris_NNP yesterday_NN._. He_PRP loved_VBD the_DT excursion_NN._. John_NNP went_VBD to_TO Paris_NNP yesterday_NN._. He_PRP loved_VBD the_DT excursion_NN._.
7
Stand-off 12345678901234567890123456789012345678901234567890123 1 2 3 4 5 John went to Paris yesterday. He loved the excursion. 1 4 NNP 6 9 VBD 11 12 TO 14 18 NNP 20 28 NN 29 29. 31 32 PRP 34 38 VBD 40 42 DT 44 52 NN 53 53.
8
Stand-off
9
BRAT
10
BRAT rapid annotation tool: online environment for collaborative text annotation o http://brat.nlplab.org/ http://brat.nlplab.org/
11
Motivation Web-based environment Multi-user Easy to install & configure “Comprehensive” visualization Well-documented
12
LCF Learner corpus French
13
LCF LCF: Learner corpus French French texts written by Dutch students from 4 Flemish institutions 500K words (971 texts) Text types: argumentative, informative, journalistic, letter, Self-portrait, summary
14
LCF
16
Configuring BRAT Corpus preparation: conversion XML to read-only text format Create annotatation configuration file Set up user accounts Create export filter to summarize annotated features
17
LCF
18
Alternative editors
19
Alternative annotation editors /1 MAT (MITRE Annotation Toolkit): a suite of tools which can be used for automated and human tagging of annotations. o http://mat-annotation.sf.net http://mat-annotation.sf.net TEITOK (The Tokenized TEI Environment): a web-based system for viewing, creating, and editing corpora with both rich textual mark-up and linguistic annotation o http://alfclul.clul.ul.pt/teitok http://alfclul.clul.ul.pt/teitok EGAS: a web-based platform for biomedical text mining and collaborative curation, supporting manual and automatic annotation of concepts and relations. o https://demo.bmd-software.com/egas/ https://demo.bmd-software.com/egas/
20
Alternative annotation editors /2 TextAE: web-based (RESTful) annotation editor for HTML documents o http://textae.pubannotation.org/ http://textae.pubannotation.org/ WebAnno: a general purpose web-based annotation tool for a wide range of linguistic annotations o https://code.google.com/p/webanno/ https://code.google.com/p/webanno/
21
WebAnno workflow
22
WebAnno pro and cons First impressions (from colleagues): o Improved project and user management o Browser ‘sensitive’ behaviour o Accepts larger texts than Brat o Data management only possible when files are closed
23
Conclusion Annotation editors for textual data have improved considerably, mainly because of standardisation of data format (XML) and web technology (HTML5) Selection of editor depends mainly on user friendliness of tool and quality of the features for further exploitation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.