Presentation is loading. Please wait.

Presentation is loading. Please wait.

The AMITIÉS Corpus up to the minute report. The GE English corpus Around 716 English dialogues were received so far from GE Leeds of which 642 are “good.

Similar presentations


Presentation on theme: "The AMITIÉS Corpus up to the minute report. The GE English corpus Around 716 English dialogues were received so far from GE Leeds of which 642 are “good."— Presentation transcript:

1 The AMITIÉS Corpus up to the minute report

2 The GE English corpus Around 716 English dialogues were received so far from GE Leeds of which 642 are “good ones”. Around 716 English dialogues were received so far from GE Leeds of which 642 are “good ones”. The GE transcribers use the Transcriber tool version 1.4.2 to deliver ( *.TRS ) documents based on an XML syntax The GE transcribers use the Transcriber tool version 1.4.2 to deliver ( *.TRS ) documents based on an XML syntax

3 Good things The TRS documents being XML based are very suitable for automatic processing and delivering of the format we are interested in (DAMSL like for example ). The TRS documents being XML based are very suitable for automatic processing and delivering of the format we are interested in (DAMSL like for example ). The transcribers successfully applied the AMITIES guidelines for transcribing. The transcribers successfully applied the AMITIES guidelines for transcribing.

4 Issues They started to transcribe the audio files using the Turn and Utterance levels of annotation provided by the Transcriber tool. They started to transcribe the audio files using the Turn and Utterance levels of annotation provided by the Transcriber tool. We noticed that some strange situations like:overlapping, acknowledging, completion failed to be represented correctly in the received TRS documents. We noticed that some strange situations like:overlapping, acknowledging, completion failed to be represented correctly in the received TRS documents.

5 Solution and examples Making use of the third logical level of annotation provided by the Transcriber, called Section. Making use of the third logical level of annotation provided by the Transcriber, called Section. The transcribers were required to create a new Section level called “exception” and to use it to encapsulate all the Turns containing one of the situations described previously. The transcribers were required to create a new Section level called “exception” and to use it to encapsulate all the Turns containing one of the situations described previously.

6 Example of overlapping BEFORE using the “exception” section AFTER using the “exception” section DAMSL LIKE annotation A: That’s [lovely](1) my name’s Louise Mr Smith and you want to change address? C: [Hello](1) DAMSL LIKE annotation A: That’s A: [lovely](1) C: [Hello](1) A: my name’s Louise Mr Smith and you want to change address?

7 Example of acknowledging similar to completion BEFORE using the “exception” section AFTER using the “exception” section DAMSL LIKE annotation A: And your telephone number please? C: 11111 A: Uh hmmm C: 111 A: Uh hmmm C: 111111 DAMSL LIKE annotation A: And your telephone number please? C: 11111 [](1) 111 [](2) 111111 A: [Uh hmmm](1) [Uh hmmm](2)

8 Addition facts The Turns that were not considered to be exceptions were encapsulated by the default Section. The Turns that were not considered to be exceptions were encapsulated by the default Section. We trained the transcribers to use this logical level and the last 100 dialogues received are annotated with the “exception” level. We trained the transcribers to use this logical level and the last 100 dialogues received are annotated with the “exception” level. 542 dialogues are not annotated with this level. 542 dialogues are not annotated with this level.

9 A rough classification of the corpus English Amities corpus 716 moreThanTwoPartiesDlgs 40 oneDlgPerFile 35 Annot With Exception 11 Annot Without Exc 24 multipleDlgsPerFile 5 Annot With Exception 1 Annot Without Exc 4 twoPartiesDlgs 673 oneDlgPerFile 642 Annot With Exception 100 Annot Without Exc 542 multipleDlgsPerFile 31 Annot With Exception 6 Annot Without Exc 25 noPartiesDlgs 3 oneDlgPerFile 3 Annot Without Exc 3

10 Task distribution inside the 100 exception annotated dialogues

11 Thank you.


Download ppt "The AMITIÉS Corpus up to the minute report. The GE English corpus Around 716 English dialogues were received so far from GE Leeds of which 642 are “good."

Similar presentations


Ads by Google