IRCS Workshop on Linguistic Databases, 11-13 December 2001 EXMARaLDA Thomas Schmidt SFB 538 „Mehrsprachigkeit“ University of Hamburg.

Slides:



Advertisements
Similar presentations
Dublin Core in Multiple Languages Thomas Baker Sixth Dublin Core Workshop Library of Congress, Washington DC Tuesday, 3 November 1998.
Advertisements

Archiving and linguistic databases Jeff Good, MPI EVA LSA Annual Meeting Oakland, California January 6, 2005 Available at:
U.S. Government Language Requirements U.S. Government Language Requirements 7 September 2000 Everette Jordan Department of Defense
Part Two: Using Xaira to explore corpora Richard Xiao
Mitglied der Leibniz-Gemeinschaft Querying Spoken Language Corpora Thomas Schmidt IDS Mannheim.
Building International Applications with Visual Studio.NET Achim Ruopp International Program Manager Microsoft Corporation.
Recording Audio with Audacity Workshop by Dr. Luba Iskold Fulvia Alderiso and Kellen Mickley August 2007 Dept. of Languages, Literatures and Cultures.
Page 1. Page 2 Virtual Speaker: A Virtual Studio The software: Virtual Speaker is a package that automatically creates your voice files, prompts or any.
Benefits of XLSTAT statistical software. Easy to get started with  Microsoft Excel is the most used spreadsheet worldwide.  XLSTAT dialog boxes approach.
Increase Your “Y”  Traffic  Users  Donors  Sales  Profit.
Linkedin “Your Professional Networking Hub”. What is linkedin Linkedin is a social networking website for professionals. It’s highly homogenous with most.
J. Kunzmann, K. Choukri, E. Janke, A. Kießling, K. Knill, L. Lamel, T. Schultz, and S. Yamamoto Automatic Speech Recognition and Understanding ASRU, December.
Solutions for Multilingual Literature by XSL Formatter 6,800 known languages.
Clients for XProtect VMS What’s new presentation
< Translator Team > 25+ Languages, …and growing!.
Towards an NLP `module’ The role of an utterance-level interface.
Tanja Schultz, Alan Black, Bob Frederking Carnegie Mellon University West Palm Beach, March 28, 2003 Towards Dolphin Recognition.
MUSCLE movie data base is a multimodal movie corpus collected to develop content- based multimedia processing like: - speaker clustering - speaker turn.
Database „Multilingualism“ – Perspectives for collaborative corpus construction and collaborative commentary Thomas Schmidt Sonderforschungsbereich 538.
Review1 What is multilingual computing? Bilingual, trilingual, vs. Multilingual What are the fundamental issues in multi-lingual computing? –Representation.
Evaluations Submit your evals online.
Digital audio editing software (Audacity) Audacity Instructions Introduction What is Audacity What can you do with Audacity Audacity Control Panel How-To.
Advanced Auto attendant v3.0. December 2003 Page 2 New Auto Attendant Features for 3.0 Allow different languages on different dialogs New Language support.
1 JCM 106 Computer Application for Journalism Lecture 1 – Introduction to Computing.
Computer Software Unit C. Software Categories System Software Application Software.
IBM Maximo Asset Management © 2007 IBM Corporation Tivoli Technical Exchange Calls Aug 31, Maximo - Multi-Language Capabilities Ritsuko Beuchert.
Betsy L. Humphreys Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS NLM, NIH, HHS National Library.
CASCOT AND THE CODING OF OCCUPATIONS IN EUROPEAN SURVEYS Demonstration of CASCOT Presentation for the InGRID Workshop Amsterdam, February 2014 Ritva.
It’s your choice! French or Spanish?. World Languages French Ms.Reed Spanish Ms. Reed Mr. Draper.
Hans-Peter Plag October 9, 2014 Session 2 Storing Information File Formats Accessing Information Processing Information.
Computer Programming A program is a set of instructions a computer follows in order to perform a task. solve a problem Collectively, these instructions.
LIN 6932 LIN6932 Topics in Computational Linguistics Lecture 11 Hana Filip.
“Limba dultsi multu adutsi”. Sweet language brings much. Aromanian proverb Primary Languages Session 2: Teaching & Learning Primary Languages Session.
2XML Marko Tadić Department of linguistics, Faculty of philosophy, University of Zagreb ( Tübingen,
New RCLayout. Do product layout 3 improvements All products Local databases New functionalities.
What is Programming? A program is a list of instructions that is executed by a computer to accomplish a particular task. Creating those instructions is.
5 th EI World Congress - Berlin, July 2007 Use of the Web and Internet Technologies to enhance Teacher Union Work.
ELanguages creative collaboration for teachers globally.
Why Study Languages Produced by the Subject Centre for Languages, Linguistics and Area Studies …When Everyone Speaks English?
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
PageManager /16 What ’ s the strength in PM6 ? Open Architecture Tree View to Browse Any Folders In Your System Open Architecture Tree View to Browse.
1.What is a language family?. A group of languages that came from the same ancestor language and have words in common.
Transana. General For qualitative analysis Transana is cross-platform. Runs on both Windows and Apple OS X Transana is Open Source. – Researchers can.
Window Docking Made Easy!. What is Aqua Snap? AquaSnap is free software that greatly enhances desktop the way you can arrange windows on your Desktop.
GBIF NODES Committee Meeting Copenhagen, Denmark 4 th October 2009 The GBIF Integrated Publishing Toolkit Alberto GONZÁLEZ-TALAVÁN Programme Officer for.
Find International Driving Document Translator Online
Free Powerpoint Templates Page 1 Free Powerpoint Templates Pass Microsoft Exam MCSE: Data Platform.
Using the Automatic Captions Feature. Objectives Learn how to use the Automatic Captions feature in YouTube  Edit the generated captions  Extract the.
C Copyright © 2009, Oracle. All rights reserved. Using SQL Developer.
EUROPEAN DAY OF LANGUAGES. The European Year of Languages 2001 was organised by the Council of Europe and the European Union. Its activities celebrated.
ELanguages creative collaboration for teachers globally.
Pass Microsoft MCSE Exam MCSE: Business Intelligence
Languages of Europe Romance, Germanic, and Slavic.
An introduction to Amazon AI
Introduction Powerful PDF To Word Converter - PDFZilla
SMART NOTE TAKER BY : V.MEHER MANJUSHA.
Language Translation Services –Wordpar.com
Ashima Wadhwa Assistant Professor(giBS)
Online Educational tool #2 and #3
INTERNATIONALIZATION
Sales Presenter Available now
Sales Presenter Available now
Oracle Supplier Management Solution Product Availability
Take Away English Level 3. Classroom Teaching Tool

COUNTRIES NATIONALITIES LANGUAGES.
Sales Presenter Available now Standard v Slim
Claro ScanPen Reader By Claro Software Limited
Computer Applications -Generic Elective

Presentation transcript:

IRCS Workshop on Linguistic Databases, December 2001 EXMARaLDA Thomas Schmidt SFB 538 „Mehrsprachigkeit“ University of Hamburg

IRCS Workshop on Linguistic Databases, December 2001  2200 transcriptions of spoken language (30 min recording each) Language acquisition data, interviews, expert discourse, classroom discourse, presentation discourse, interpreted discourse, languages (German, English, Swedish, Norwegian, Danish, French, Spanish, Portuguese, Turkish, Italian, Basque, Japanese, Chinese, Russian, Luganda) 9 different data formats (dBase, syncWriter, HIAT-DOS, Verbmobil,...) 3 different operating systems (MAC OS 9.x, Windows, Linux) + MAC OS X research interests: phonetics, syntax, discourse,... Data Formats and Tools at the SFB

IRCS Workshop on Linguistic Databases, December 2001 syncWriter: editor for interlinear text MAC OS 9.x and earlier outputs binary data Data Formats and Tools at the SFB

IRCS Workshop on Linguistic Databases, December 2001 HIAT-DOS: editor for HIAT-transcription MS-DOS/Windows outputs text files Data Formats and Tools at the SFB

IRCS Workshop on Linguistic Databases, December 2001 Data Formats and Tools at the SFB dBase/Access/4th Dimension utterance databases

IRCS Workshop on Linguistic Databases, December 2001 Data Formats and Tools at the SFB Verbmobil: 7-bit ASCII files

IRCS Workshop on Linguistic Databases, December 2001 Database „Multilingualism“ Goals: 1. To have one common tool for accessing (querying) the data  Data must come in one format (AG)  Multilingual issues must be taken care of (UNICODE)  Data format should be software independent (XML)  Software should work across different OS (JAVA) 2. To have different tools reflecting the habits and needs of the different projects  different input methods (Score, column, vertical notation)  different output methods (dito)

IRCS Workshop on Linguistic Databases, December 2001 SyncWriter HIAT-DOS Verbmobil SQL- Database ? ACCESS / dBase Database „Multilingualism“

IRCS Workshop on Linguistic Databases, December 2001 SyncWriter HIAT-DOS Verbmobil SQL- Database ACCESS / dBase Database „Multilingualism“ Segmented Transcription List Transcription Basic Transcription EXMARaLDA Input / Editing Tools Output / Visualization Tools

IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“)

IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers

IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers Speakers Categories

IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers Speakers Categories 0123 Timeline

IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers Speakers Categories 0123 Timeline Events

IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles 1. Score notation („Partitur“)  Basic Transcription TiersSpeakersCategoriesTimelineEvents You keep interrupting me, Tom. pointing at Tom

IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interrupting me, Tom. pointing at Tom Oh, I‘m sorry for that. smiling 2. Column notation

IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interrupting me, Tom. pointing at Tom Oh, I‘m sorry for that. smiling 2. Column notation  Basic Transcription TiersSpeakersCategoriesTimelineEvents

IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles 3. Vertical notation MAX TOM You keep interrupting[me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling)

IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling) 3. Vertical notation You keep interrupting TiersSpeakersCategoriesTimelineEvents

IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles 3. Vertical notation MAX TOM You keep interrupting[me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling) TiersSpeakersCategoriesTimelineEvents Speaker-Turns

IRCS Workshop on Linguistic Databases, December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure)

IRCS Workshop on Linguistic Databases, December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure) Oh, das tut mir Leid. Immer unterbrichst Du mich, Tom Utterances (linguistic structure)

IRCS Workshop on Linguistic Databases, December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure) Oh, das tut mir Leid. Immer unterbrichst Du mich, Tom Utterances (linguistic structure) ProVVpartProPN. IntProVAdjPrepPro Words (linguistic structure)

IRCS Workshop on Linguistic Databases, December ab1c2 W: YouW: keepW: interruptingW: meW: Tom POS: proPOS: vPOS: vpartPOS: proPOS: pn U: You keep interrupting me, Tom. GER: Immer unterbrichst Du mich, Tom. 1d2 POS: intPOS: pn e POS: v W: OhW: IW: 'm U: Oh, I'm sorry for that. 3 GER: Oh, das tut mir Leid. Structure Of Annotated Data