Database „Multilingualism“ – Perspectives for collaborative corpus construction and collaborative commentary Thomas Schmidt Sonderforschungsbereich 538.

Slides:



Advertisements
Similar presentations
1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 1 Introduction to Perl and CGI.
Advertisements

Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Developing a Generic Toolkit: Architecture and technology issues ALLC/ACH Conference.
Mitglied der Leibniz-Gemeinschaft Querying Spoken Language Corpora Thomas Schmidt IDS Mannheim.
An Introduction to GATE
IRCS Workshop on Linguistic Databases, December 2001 EXMARaLDA Thomas Schmidt SFB 538 „Mehrsprachigkeit“ University of Hamburg.
The eXtensible Markup Language (XML) An Applied Tutorial Kevin Thomas.
Software Tools for Language Documentation DocLing 2013 Peter K. Austin Department of Linguistics, SOAS.
Database Systems: Design, Implementation, and Management Tenth Edition
Observing users Chapter 12. Observation ● Why? Get information on.. – Context, technology, interaction ● Where? – Controlled environments – In the field.
Annotation, Alignment and Transcription: An extremely brief and basic introduction to Elan and Transcriber OLAC Tutorial at the Linguist Society of America.
Remote Collaboration Tool University of California Davis Copyright 2003 Richard F. Walters. Permission granted for non commercial use if this copyright.
XML Prashant Karmarkar Brendan Nolan Alexander Roda.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
MUSCLE movie data base is a multimodal movie corpus collected to develop content- based multimedia processing like: - speaker clustering - speaker turn.
Chapter 2: IS Building Blocks Objectives
~ Multimodal Communication ~ HOW TO: From raw data to data annotation.
Chapter 3 Software Two major types of software
Overview of Search Engines
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Information Retrieval in Practice
INTRODUCTION TO WEB DATABASE PROGRAMMING
Lab 8 – C# Programming Adding two numbers CSCI 6303 – Principles of I.T. Dr. Abraham Fall 2012.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Chapter Lead Black Slide Powered by DeSiaMore Powered by DeSiaMore.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
AS Computing Software definitions.
COMPUTER SOFTWARE Section 2 “System Software: Computer System Management ” CHAPTER 4 Lecture-6/ T. Nouf Almujally 1.
Word Processors, Databases, Spreadsheets, and Data Problems.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.
CIS 375—Web App Dev II Microsoft’s.NET. 2 Introduction to.NET Steve Ballmer (January 2000): Steve Ballmer "Delivering an Internet-based platform of Next.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
A Flexible and Extensible Architecture for Linguistic Annotation Steven Bird *, David Day †, John Garofolo ‡, John Henderson †, Christophe Laprun ‡ and.
1 XML – an introduction David Nathan ELDP training March 2010.
1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree.
1 SYS366 Lecture Visual Modeling and Business Use Case Diagrams.
Introduction to ELAN Mary Chambers ELAP, Department of Linguistics, SOAS.
Introduction To System Analysis and Design
2XML Marko Tadić Department of linguistics, Faculty of philosophy, University of Zagreb ( Tübingen,
Lead Black Slide. © 2001 Business & Information Systems 2/e2 Chapter 5 Information System Software.
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
Resource Manager for Distance Education Systems Goran Kimovski Vladimir Trajkovik Danco Davcev Faculty of Electrical Engineering and Computer Science,
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 2 Information System Building Blocks.
XML stands for Extensible Mark-up Language XML is a mark-up language much like HTML XML was designed to carry data, not to display data XML tags are not.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
ONZEminer Margaret Maclagan, ONZE director Robert Fromont, designer.
Computer Applications Chapter 16. Management Information Systems Management Information Systems (MIS)- an organized system of processing and reporting.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
14/05/2003Christiane Schmidt1 XML – application A presentation about different examples of use.
ICT in Classroom Prepared by: Ymer LEKSI Kukes
Improving the visualisation of statistics: The need for an SDMX-based visualisation framework Xavier Sosnowska Luxembourg, 6 May 2008.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
Intro to Web Services Dr. John P. Abraham UTPA. What are Web Services? Applications execute across multiple computers on a network.  The machine on which.
Annotation by category – ELAN and ISO DCR Han Slöetjes, Peter Wittenburg Max-Planck-Institute for Psycholinguistics LREC,
Glencoe Introduction to Multimedia Chapter 8 Audio 1 Section 8.1 Audio in Multimedia Audio plays many roles in multimedia. Effective use in multimedia.
DocLing2016 Software Tools Peter K. Austin Department of Linguistics SOAS, University of London
HTML HyperText Markup Language Victoria E. Kozlek.
BRAT: a web based tool for manual annotation Hans Paulussen ITEC, KU Leuven KULAK.
INTRODUCTION TO APPLIED LINGUISTICS
SUPPORTING DIVERSE LEARNING STYLES WITH ALTERNATE FORMATS OF INFORMATION UNIVERSAL DESIGN FOR LEARNING.
The Components of Information Systems
Hardware and Software Hardware refers to the physical devices of the computer system e.g. monitor, keyboard, printer, RAM etc. Software is a set of programs,
The Components of Information Systems
DATABASE SYSTEM UNIT I.
Speech Capture, Transcription and Analysis App
Uralic multimedia corpora: ISO/TEI corpus data in the project INEL
Information System Building Blocks
Presentation transcript:

Database „Multilingualism“ – Perspectives for collaborative corpus construction and collaborative commentary Thomas Schmidt Sonderforschungsbereich 538 Mehrsprachigkeit University of Hamburg LREC-Conference Panel „Collaborative Commentary“, Lisbon, 27 May 2004

SFB „Multilingualism“, University of Hamburg  13 projects organized in 3 groups (Multilingual acquisition / Multilingual communication / Historical multilingualism)  Empirical work – corpora of written texts and corpora of transcribed recordings (video / audio), all computerized  Roughly 2000 transcripts / 1000 hrs of transcribed speech  “Raison d‘être”: Collaboration (!) Background

Diversity of Transcription data  Research background: Generative Grammar / Discourse Analysis / Phonetic Research  Transcription systems: HIAT / IPA /...  Presentation formats: Score notation / Line notation / Column notation  Writing systems: Latin, Greek, Cyrillic, Japanese  Transcription software: syncWriter / WordBase / HIAT-DOS / Lapsus  Operating systems: Windows / Macintosh / Linux  Interrelatedness of these dimensions Background

Problems: Use project A‘s data with project B‘s operating system? Use project B‘s tools with project C‘s data? Use project C‘s transcription system with project B‘s tools? Exchange corpora? Build larger corpora from existing ones? Build a common tool for all projects’ data? Collaborative commentary? Data exchange

Vision: A framework for computer transcription Let software and formats operate on a common conception of transcription data Make transcription systems a parameter rather than a principle for data models Use standard technologies (JAVA, XML, Unicode) to achieve “platform independence”  Use one tool with different transcription systems  Use different tools with one data format  Use one tool on different operating systems  Facilitate collaboration in corpus construction and analysis Data exchange

Database „Multilingualism“ System architecture

EXMARaLDA Separate model and visualization / three level architecture Describe models as Directed Acyclic Graphs  Time reference of all transcription entities (Annotation Graphs) Calculate visualization(s) from model Store as XML files

HIAT GAT DIDA IPA CHAT EXMARaLDA Collaborative Commentary Tools EXMARaLDA Partitur-Editor TASX Annotator Praat ELAN Transcription systemsSoftware and data formatsOperating systems

Collaborative Commentary I Collaborative transcription and annotation 1. Transcription 2. Transcription control  3. Utterance translation 4. Translation control  5. Morphological transliteration 6. Transliteration control 

Collaborative Commentary II Collaborative analysis Negotiate categorizations / interpretations HTML example

Collaborative Commentary III Collaborative publication Negotiate transcription conventions Get user feedback

Model (Time-Based / XML-files) Visualisation(s) (HTML documents) ProjectPad Annozilla ProjectPad data model EXMARaLDA data model Collaborative Commentary: Technology

Database „Multilingualism“ – Perspectives for collaborative corpus construction and collaborative commentary Summary  Research on multilingualism is a “market” for collaborative commentary:  collaborative transcription and annotation  collaborative analysis  collaborative publication  A common framework for computerized transcription data  use different tools on (different flavors of) the same data structure  Collaborative commentary can simply be the task of one of those tools  Time based data models and tools like ProjectPad seem to go with one another