IRCS Workshop on Linguistic Databases, December 2001 EXMARaLDA Thomas Schmidt SFB 538 „Mehrsprachigkeit“ University of Hamburg
IRCS Workshop on Linguistic Databases, December 2001 2200 transcriptions of spoken language (30 min recording each) Language acquisition data, interviews, expert discourse, classroom discourse, presentation discourse, interpreted discourse, languages (German, English, Swedish, Norwegian, Danish, French, Spanish, Portuguese, Turkish, Italian, Basque, Japanese, Chinese, Russian, Luganda) 9 different data formats (dBase, syncWriter, HIAT-DOS, Verbmobil,...) 3 different operating systems (MAC OS 9.x, Windows, Linux) + MAC OS X research interests: phonetics, syntax, discourse,... Data Formats and Tools at the SFB
IRCS Workshop on Linguistic Databases, December 2001 syncWriter: editor for interlinear text MAC OS 9.x and earlier outputs binary data Data Formats and Tools at the SFB
IRCS Workshop on Linguistic Databases, December 2001 HIAT-DOS: editor for HIAT-transcription MS-DOS/Windows outputs text files Data Formats and Tools at the SFB
IRCS Workshop on Linguistic Databases, December 2001 Data Formats and Tools at the SFB dBase/Access/4th Dimension utterance databases
IRCS Workshop on Linguistic Databases, December 2001 Data Formats and Tools at the SFB Verbmobil: 7-bit ASCII files
IRCS Workshop on Linguistic Databases, December 2001 Database „Multilingualism“ Goals: 1. To have one common tool for accessing (querying) the data Data must come in one format (AG) Multilingual issues must be taken care of (UNICODE) Data format should be software independent (XML) Software should work across different OS (JAVA) 2. To have different tools reflecting the habits and needs of the different projects different input methods (Score, column, vertical notation) different output methods (dito)
IRCS Workshop on Linguistic Databases, December 2001 SyncWriter HIAT-DOS Verbmobil SQL- Database ? ACCESS / dBase Database „Multilingualism“
IRCS Workshop on Linguistic Databases, December 2001 SyncWriter HIAT-DOS Verbmobil SQL- Database ACCESS / dBase Database „Multilingualism“ Segmented Transcription List Transcription Basic Transcription EXMARaLDA Input / Editing Tools Output / Visualization Tools
IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“)
IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers
IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers Speakers Categories
IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers Speakers Categories 0123 Timeline
IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom pointing at Tom Oh, I‘msorry for that smiling Score notation („Partitur“) Tiers Speakers Categories 0123 Timeline Events
IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles 1. Score notation („Partitur“) Basic Transcription TiersSpeakersCategoriesTimelineEvents You keep interrupting me, Tom. pointing at Tom
IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interrupting me, Tom. pointing at Tom Oh, I‘m sorry for that. smiling 2. Column notation
IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interrupting me, Tom. pointing at Tom Oh, I‘m sorry for that. smiling 2. Column notation Basic Transcription TiersSpeakersCategoriesTimelineEvents
IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles 3. Vertical notation MAX TOM You keep interrupting[me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling)
IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles MAX TOM [me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling) 3. Vertical notation You keep interrupting TiersSpeakersCategoriesTimelineEvents
IRCS Workshop on Linguistic Databases, December 2001 „Traditional“ layout principles 3. Vertical notation MAX TOM You keep interrupting[me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling) TiersSpeakersCategoriesTimelineEvents Speaker-Turns
IRCS Workshop on Linguistic Databases, December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure)
IRCS Workshop on Linguistic Databases, December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure) Oh, das tut mir Leid. Immer unterbrichst Du mich, Tom Utterances (linguistic structure)
IRCS Workshop on Linguistic Databases, December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure) Oh, das tut mir Leid. Immer unterbrichst Du mich, Tom Utterances (linguistic structure) ProVVpartProPN. IntProVAdjPrepPro Words (linguistic structure)
IRCS Workshop on Linguistic Databases, December ab1c2 W: YouW: keepW: interruptingW: meW: Tom POS: proPOS: vPOS: vpartPOS: proPOS: pn U: You keep interrupting me, Tom. GER: Immer unterbrichst Du mich, Tom. 1d2 POS: intPOS: pn e POS: v W: OhW: IW: 'm U: Oh, I'm sorry for that. 3 GER: Oh, das tut mir Leid. Structure Of Annotated Data