Download presentation
Presentation is loading. Please wait.
Published byHoward Lockridge Modified over 10 years ago
1
IRCS Workshop on Linguistic Databases, 11-13 December 2001 EXMARaLDA Thomas Schmidt SFB 538 „Mehrsprachigkeit“ University of Hamburg
2
IRCS Workshop on Linguistic Databases, 11-13 December 2001 2200 transcriptions of spoken language (30 min recording each) Language acquisition data, interviews, expert discourse, classroom discourse, presentation discourse, interpreted discourse,... 15 languages (German, English, Swedish, Norwegian, Danish, French, Spanish, Portuguese, Turkish, Italian, Basque, Japanese, Chinese, Russian, Luganda) 9 different data formats (dBase, syncWriter, HIAT-DOS, Verbmobil,...) 3 different operating systems (MAC OS 9.x, Windows, Linux) + MAC OS X research interests: phonetics, syntax, discourse,... Data Formats and Tools at the SFB
3
IRCS Workshop on Linguistic Databases, 11-13 December 2001 syncWriter: editor for interlinear text MAC OS 9.x and earlier outputs binary data Data Formats and Tools at the SFB
4
IRCS Workshop on Linguistic Databases, 11-13 December 2001 HIAT-DOS: editor for HIAT-transcription MS-DOS/Windows outputs text files Data Formats and Tools at the SFB
5
IRCS Workshop on Linguistic Databases, 11-13 December 2001 Data Formats and Tools at the SFB dBase/Access/4th Dimension utterance databases
6
IRCS Workshop on Linguistic Databases, 11-13 December 2001 Data Formats and Tools at the SFB Verbmobil: 7-bit ASCII files
7
IRCS Workshop on Linguistic Databases, 11-13 December 2001 Database „Multilingualism“ Goals: 1. To have one common tool for accessing (querying) the data Data must come in one format (AG) Multilingual issues must be taken care of (UNICODE) Data format should be software independent (XML) Software should work across different OS (JAVA) 2. To have different tools reflecting the habits and needs of the different projects different input methods (Score, column, vertical notation) different output methods (dito)
8
IRCS Workshop on Linguistic Databases, 11-13 December 2001 SyncWriter HIAT-DOS Verbmobil SQL- Database ? ACCESS / dBase Database „Multilingualism“
9
IRCS Workshop on Linguistic Databases, 11-13 December 2001 SyncWriter HIAT-DOS Verbmobil SQL- Database ACCESS / dBase Database „Multilingualism“ Segmented Transcription List Transcription Basic Transcription EXMARaLDA Input / Editing Tools Output / Visualization Tools
10
IRCS Workshop on Linguistic Databases, 11-13 December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom. ------ pointing at Tom ------------- Oh, I‘msorry for that. ----- smiling --------------- 1. Score notation („Partitur“)
11
IRCS Workshop on Linguistic Databases, 11-13 December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom. ------ pointing at Tom ------------- Oh, I‘msorry for that. ----- smiling --------------- 1. Score notation („Partitur“) Tiers
12
IRCS Workshop on Linguistic Databases, 11-13 December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom. ------ pointing at Tom ------------- Oh, I‘msorry for that. ----- smiling --------------- 1. Score notation („Partitur“) Tiers Speakers Categories
13
IRCS Workshop on Linguistic Databases, 11-13 December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom. ------ pointing at Tom ------------- Oh, I‘msorry for that. ----- smiling --------------- 1. Score notation („Partitur“) Tiers Speakers Categories 0123 Timeline
14
IRCS Workshop on Linguistic Databases, 11-13 December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interruptingme, Tom. ------ pointing at Tom ------------- Oh, I‘msorry for that. ----- smiling --------------- 1. Score notation („Partitur“) Tiers Speakers Categories 0123 Timeline Events
15
IRCS Workshop on Linguistic Databases, 11-13 December 2001 „Traditional“ layout principles 1. Score notation („Partitur“) Basic Transcription TiersSpeakersCategoriesTimelineEvents You keep interrupting me, Tom. pointing at Tom
16
IRCS Workshop on Linguistic Databases, 11-13 December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interrupting me, Tom. pointing at Tom Oh, I‘m sorry for that. smiling 2. Column notation
17
IRCS Workshop on Linguistic Databases, 11-13 December 2001 „Traditional“ layout principles MAX TOM [v] [nv] You keep interrupting me, Tom. pointing at Tom Oh, I‘m sorry for that. smiling 2. Column notation Basic Transcription 0 1 2 3 TiersSpeakersCategoriesTimelineEvents
18
IRCS Workshop on Linguistic Databases, 11-13 December 2001 „Traditional“ layout principles 3. Vertical notation MAX TOM You keep interrupting[me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling)
19
IRCS Workshop on Linguistic Databases, 11-13 December 2001 „Traditional“ layout principles MAX TOM [me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling) 3. Vertical notation You keep interrupting TiersSpeakersCategoriesTimelineEvents
20
IRCS Workshop on Linguistic Databases, 11-13 December 2001 „Traditional“ layout principles 3. Vertical notation MAX TOM You keep interrupting[me, Tom.] (pointing at Tom) [Oh, I‘m]sorry for that. (smiling) TiersSpeakersCategoriesTimelineEvents Speaker-Turns
21
IRCS Workshop on Linguistic Databases, 11-13 December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure)
22
IRCS Workshop on Linguistic Databases, 11-13 December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure) Oh, das tut mir Leid. Immer unterbrichst Du mich, Tom Utterances (linguistic structure)
23
IRCS Workshop on Linguistic Databases, 11-13 December 2001 Structure Of Annotated Data Youkeepinterruptingme,Tom. Oh,I`msorryforthat Events (temporal structure) Oh, das tut mir Leid. Immer unterbrichst Du mich, Tom Utterances (linguistic structure) ProVVpartProPN. IntProVAdjPrepPro Words (linguistic structure)........
24
IRCS Workshop on Linguistic Databases, 11-13 December 2001 0ab1c2 W: YouW: keepW: interruptingW: meW: Tom POS: proPOS: vPOS: vpartPOS: proPOS: pn U: You keep interrupting me, Tom. GER: Immer unterbrichst Du mich, Tom. 1d2 POS: intPOS: pn e POS: v W: OhW: IW: 'm U: Oh, I'm sorry for that. 3 GER: Oh, das tut mir Leid. Structure Of Annotated Data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.