BRAT: a web based tool for manual annotation Hans Paulussen ITEC, KU Leuven KULAK.

Slides:



Advertisements
Similar presentations
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Alternative FILE formats
With Microsoft Access 2010© 2011 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access.
1 Lesson 14 Sharing Documents Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Project 1 Introduction to HTML.
1 PROJECT Web-based Database Applications Lecture 1: Basic Internet Concepts & Databases - the History.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Context and Relationships Developing Electronic Research Tools for Irish Studies.
Developing a Basic Web Page with HTML
Use Case Modelling Visual Annotator for studying ICU Notes Bacchus Beale.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
XIS™ XML Intranet System. XIS, the XML Intranet System provides the foundation for your database production and management. XIS maximizes the flexible.
The electronic corpus of 17th and 18th century Polish texts (up to 1772) – aims, methods, current state, problems and prospects for development Włodzimierz.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 14 Sharing Documents 1 Morrison / Wells / Ruffolo.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
Web Content Management Systems. Lecture Contents Web Content Management Systems Non-technical users manage content Workflow management system Different.
Web 2.0: Concepts and Applications 3 Syndicating Content.
Systems Analysis and Design in a Changing World, 6th Edition
Computer Concepts 2014 Chapter 7 The Web and .
Leuven, Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
ACOT Intro/Copyright Succeeding in Business with Microsoft Excel
NetTech Solutions Working with Web Elements Lesson 6.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
XBRL eXtensible Business Reporting Language By: Jeff Elston Jake White and Garrett Allen.
WEB DESIGN USING DREAMWEAVER. The World Wide Web –A Web site is a group of related files organized around a common topic –A Web page is a single file.
CP2022 Multimedia Internet Communication1 HTML and Hypertext The workings of the web Lecture 7.
PCWG Analysis Tool Peter Stuart September 15, 2015.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
1 What is HTML? Standardized codes Web pages SGML Descriptive markup Tags.
CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012.
XP 2 HTML Tutorial 1: Developing a Basic Web Page.
2XML Marko Tadić Department of linguistics, Faculty of philosophy, University of Zagreb ( Tübingen,
Presentation Software IT DOES NOT HAVE TO BE POWERPOINT.
- Shourie. 3 Basic Questions What How Why Crystal Reports is a business intelligence application for designing and generating reports from a wide range.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
A Semantic-Web based Framework for Developing Applications to Improve Accessibility in the WWW Michail Salampasis Dept. of Informatics TEI of Thessaloniki.
University of Illinois at Urbana-Champaign BeeSpace Navigator v4.0 and Gene Summarizer beespace.uiuc.edu `
Semantic Visualization What do we mean when we talk about visualization? - Understanding data - Showing the relationships between elements of data Overviews.
Web Page Design Introduction. The ________________ is a large collection of pages stored on computers, or ______________ around the world. Hypertext ________.
Andy Dawson– University College London 1 EABH SUMMER SCHOOL Web Page Construction Andy Dawson Department of Information Studies, UCL.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
FRErator – the Bridge between FRE and Curator DB.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
Jan Christoph Meister University of Hamburg
Invitation to Computer Science 6 th Edition Chapter 10 The Tower of Babel.
Introduction to HTML Simple facts yet crucial to beginning of study in fundamentals of web page design!
1 Lesson 14 Sharing Documents Computer Literacy BASICS: A Comprehensive Guide to IC 3, 4 th Edition Morrison / Wells.
HTML HyperText Markup Language Victoria E. Kozlek.
XP 1 HTML Tutorial 1: Developing a Basic Web Page.
#SummitNow Annotating Documents in Alfresco Share November 13 th, 2013 Tony Parzgnat – Technology Services Group.
The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies
Project 1 Introduction to HTML.
Data Exchange.
Chapter 1 HTML, XHTML, and the World Wide Web
Software and Multimedia
Software and Multimedia
Lesson 14 Sharing Documents
Chapter 1 HTML, XHTML, and the World Wide Web
RichAnnotator: Annotating rich (XML-like) documents
Part of the Multilingual Web-LT Program
ICEweb 2 a new way of compiling high-quality web-based components for ICE corpora Martin Weisser Center for Linguistics & Applied Linguistics, Guangdong.
Introduction to HTML Simple facts yet crucial to beginning of study in fundamentals of web page design!
Structured Data Markup Helper
SIDE: The Summarization IDE
Presentation transcript:

BRAT: a web based tool for manual annotation Hans Paulussen ITEC, KU Leuven KULAK

Overview Annotation BRAT LCF (Learner corpus French) Alternative editors Conclusion

Annotation

Annotation = metadata: o data on data Edition of textual data or multimedia data requires different approach: stand-off vs. inline markup Typical multimedia editors: ELAN & ADVENE o o

Stand-off vs inline annotation Inline: o Data and metadata (annotation or markup) are intermingled Stand-off: o Metadata is stored in a separate document, using reference anchors o Alignment: based on token or character offsets o Primary data is left untouched

Inline John went to Paris yesterday. He loved the excursion. John_NNP went_VBD to_TO Paris_NNP yesterday_NN._. He_PRP loved_VBD the_DT excursion_NN._. John_NNP went_VBD to_TO Paris_NNP yesterday_NN._. He_PRP loved_VBD the_DT excursion_NN._.

Stand-off John went to Paris yesterday. He loved the excursion. 1 4 NNP 6 9 VBD TO NNP NN PRP VBD DT NN

Stand-off

BRAT

BRAT rapid annotation tool: online environment for collaborative text annotation o

Motivation Web-based environment Multi-user Easy to install & configure “Comprehensive” visualization Well-documented

LCF Learner corpus French

LCF LCF: Learner corpus French French texts written by Dutch students from 4 Flemish institutions 500K words (971 texts) Text types: argumentative, informative, journalistic, letter, Self-portrait, summary

LCF

Configuring BRAT Corpus preparation: conversion XML to read-only text format Create annotatation configuration file Set up user accounts Create export filter to summarize annotated features

LCF

Alternative editors

Alternative annotation editors /1 MAT (MITRE Annotation Toolkit): a suite of tools which can be used for automated and human tagging of annotations. o TEITOK (The Tokenized TEI Environment): a web-based system for viewing, creating, and editing corpora with both rich textual mark-up and linguistic annotation o EGAS: a web-based platform for biomedical text mining and collaborative curation, supporting manual and automatic annotation of concepts and relations. o

Alternative annotation editors /2 TextAE: web-based (RESTful) annotation editor for HTML documents o WebAnno: a general purpose web-based annotation tool for a wide range of linguistic annotations o

WebAnno workflow

WebAnno pro and cons First impressions (from colleagues): o Improved project and user management o Browser ‘sensitive’ behaviour o Accepts larger texts than Brat o Data management only possible when files are closed

Conclusion Annotation editors for textual data have improved considerably, mainly because of standardisation of data format (XML) and web technology (HTML5) Selection of editor depends mainly on user friendliness of tool and quality of the features for further exploitation