Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clarke, R. J (2001) S951-10: 1 Critical Issues in Information Systems BUSS 951 Seminar 10 Transcription & Coding.

Similar presentations


Presentation on theme: "Clarke, R. J (2001) S951-10: 1 Critical Issues in Information Systems BUSS 951 Seminar 10 Transcription & Coding."— Presentation transcript:

1 Clarke, R. J (2001) S951-10: 1 Critical Issues in Information Systems BUSS 951 Seminar 10 Transcription & Coding

2 Clarke, R. J (2001) S951-10: 2 Transcription & Coding An Introduction

3 Clarke, R. J (2001) S951-10: 3 Transcribing & Coding transcription and coding is a major requirement for language based methods of analysis transcription- convertion of speech to writing coding- is the addition of relevant information to the transcription needed because spoken and written language are very different

4 Clarke, R. J (2001) S951-10: 4 Speech is not Writing Differences in Spoken & Written Texts + interactive 2 or more participants + face-to-face in the same place and time + language as action using language to accomplish some task + spontaneous without rehearsing what is going to be said + casual informal and everyday - interactive one participant - face-to-face on his or her own - language as action using language to reflect - spontaneous planning, drafting and rewriting - casual formal and special occasions

5 Clarke, R. J (2001) S951-10: 5 Transcribing & Coding Seek to Lead-in Zone Playback Coding Transcribe cue the tape (rewind and fast forward) until you get to the part of the tape you are seeking iterate until the text is transcribed and coded

6 Clarke, R. J (2001) S951-10: 6 CHAT Standard

7 Clarke, R. J (2001) S951-10: 7 CHAT one of the best standards is CHAT- Codes for the Human Analysis of Transcripts well defined standard even in research literature, transcriptions are often ad hoc & idiosyncratic formal standards are difficult to obtain

8 Clarke, R. J (2001) S951-10: 8 CHAT developed for subsequent computer processing in mind suite of programs is available called CLAN to parse the text excellent provision for creating transcripts even when the text is difficult to understand speaker has an accent or has a speech problem

9 Clarke, R. J (2001) S951-10: 9 CHAT standard is extensible; provides a consistent way of adding new headers if necessary developed by Brain MacWhinney and Jane Walter at the CHILDES- Child Language Data Exchange Research Centre Department of Psychology, Carnegie Mellon University

10 Clarke, R. J (2001) S951-10: 10 CHAT Structure CHAT has a basic structure common to all transcripts a block of so-called Constant Headers at the top of the transcript starting with an @Begin the body of the transcript consisting of turns taken by speakers called Mainlines, followed by zero through to many Dependent Tiers a single command which is used to signal the end of the transcript, @End

11 Clarke, R. J (2001) S951-10: 11 CHAT Structure Top of Transcript

12 Clarke, R. J (2001) S951-10: 12 CHAT Structure Top of the Transcript (1) the top of any transcript always has two compulsory commands: @Begin @Participants: MCL MicroLabs Assistant, STU Student @Begin indicates the start of the transcript. It must always be the first line of any CHAT transcript. It does not include any other information...

13 Clarke, R. J (2001) S951-10: 13 CHAT Structure Top of the Transcript (2) @Participants specifies is a mandatory Constant Header- a command only used once per transcript- which lists the interactants in the transcript. The syntax as with all transcripts is critical. the three letter codes after the header indicate a person who speaks or is other wise involved with the text the string after the three letter code explains the role of that participant in the text

14 Clarke, R. J (2001) S951-10: 14 CHAT Structure Top of the Transcript (3) below the @Begin and @Participants can be listed other optional constant headers including @Age of, @Sex of, @SES of @Age of MCL: 35 @SES of MCL: middle @Sex of MCL: male

15 Clarke, R. J (2001) S951-10: 15 CHAT Structure Top of the Transcript (4) optional Constant Headers must follow the @Participants header because they need to refer to the three letter participant identifier whether you include them will depend on if they are significant: is the age of a participant important in the text? a complete list follows...

16 Clarke, R. J (2001) S951-10: 16 Table 1 :CHAT Constant Headers. CHAT Constant Headers. Constant Headers that have proved to be useful in workplace language studies (Clarke 1996b, 1996c) are presented against a white background while less relevant Constant Headers are presented against a shaded background. @Begin indicates the start of CHAT file @Participants: list of actors in file @Age of XXX: speakers age in yymmdd format @Birth of XXX: date of birth of speaker @SES of XXX: socio-economic status of speaker @Education of XXX: speakers education in years @Sex of XXX: indicates gender of the speaker @Filename: name of transcription data file @Coding: version of CHAT being used @Warning: relative completeness of the transcript @End indicates the end of CHAT file

17 Clarke, R. J (2001) S951-10: 17 CHAT Structure Top of the Transcript (6) the CHAT Constant Headers can also be represented using a syntax diagram, which are also used for describing the syntax rules for computer languages like Pascal a diagram follows...

18 Clarke, R. J (2001) S951-10: 18 Figure 3 :CHAT Constant Headers Syntax Diagram

19 Clarke, R. J (2001) S951-10: 19 CHAT Structure Top of the Transcript (8) Completed transcript so far... @Begin @Participants: MCL MicroLabs Assistant, STU Student @Age of MCL: 35 @SES of MCL: middle @Sex of MCL: male @Age of STU: 18 @SES of STU: middle @Sex of STU: male

20 Clarke, R. J (2001) S951-10: 20 CHAT Structure Transcript Body

21 Clarke, R. J (2001) S951-10: 21 CHAT Structure Transcript Body (1) most of the transcript body of mainlines which indicate that a participant is taking a turn in the conversation other features are also found in the transcript body include: Dependent Tiers which are used to add special coding for a given turn Changeable or Repeating Headers

22 Clarke, R. J (2001) S951-10: 22 CHAT Structure Mainlines (1) a mainline is a turn taken by a participant, indicated by an * who takes a turn is indicated by one of the participant identifiers, listed in the @Participants constant header...

23 Clarke, R. J (2001) S951-10: 23 CHAT Structure Mainlines (2) the text comprising the speakers turn is transcribed after the * and participant identifier an example of a completed mainline: *MCL what software do you want

24 Clarke, R. J (2001) S951-10: 24 CHAT Structure Dependent Tiers (1) Dependent Tiers are used to add extra detail many different types of them always relate only to a specific turn, and if necessary, are only ever listed below the mainline to which they refer

25 Clarke, R. J (2001) S951-10: 25 CHAT Structure Dependent Tiers (2) dependent tiers are identified in a transcript by the use of a % followed by the appropriate dependent tier code the dependent tier code tells the reader what kind of information is being coded for the above mainline

26 Clarke, R. J (2001) S951-10: 26 CHAT Structure Dependent Tiers (3) an example showing a mainline and its two dependent tiers (%sit, %com) is provided below: *MCL what software do you want %sit STU and MCL are at the service desk %com STU looks like he is lost a list of valid dependent tiers follows...

27 Clarke, R. J (2001) S951-10: 27 %phs phrase structure notation %err error coding %cod general purpose coding

28 Clarke, R. J (2001) S951-10: 28 CHAT Structure Changeable/Repeating Headers (1) Repeating Headers can be inserted repeatedly in a transcript, but they are only used when a significant condition has changed inserted in a transcript, a Repeating Header is valid for the remainder of the transcript, or until another Header of the same type overrides it

29 Clarke, R. J (2001) S951-10: 29 CHAT Structure Changeable/Repeating Headers (2) a list of valid Changeable or Repeating Headers is provided on the next slide just like the Constant Headers, Changeable or Repeating Headers can be described using a syntax diagram, which is on the slide following the list

30 Clarke, R. J (2001) S951-10: 30

31 Clarke, R. J (2001) S951-10: 31

32 Clarke, R. J (2001) S951-10: 32 CHAT Structure Summary...so far! so far we have described three separate types of structure that occur within the body of a CHAT transcript: Mainlines (for transcribing turns) Dependent Tiers (for coding turns) Changeable or Repeating Headers

33 Clarke, R. J (2001) S951-10: 33 CHAT Structure Special Mainline Codes (1) sometimes it is important to add additional information into the mainline itself NOTE the following about the body of the CHAT transcript: an actual turn as shown in lower case on a mainline, and that there is normally no punctuation on mainlines

34 Clarke, R. J (2001) S951-10: 34 CHAT Structure Special Mainline Codes (2) this is because when punctuation is used it conforms to CHAT Special Mainline Codes Special Mainline Codes occur in one of two types: Utterance Junctures and Delimiters Utterance Ambiguity Codes we will describe both types in order...

35 Clarke, R. J (2001) S951-10: 35 CHAT Structure Special Mainline Codes (3) Utterance Junctures and Delimiters- indicate either junctures or brakes in the turn (pauses etc). These Special Mainline Codes are referred to as Utterance Internal Junctures indicate how a turn was completed (as a question, the speaker was interrupted etc). These Special Mainline Codes are referred to as Post Utterance Delimiters

36 Clarke, R. J (2001) S951-10: 36 CHAT Structure Special Mainline Codes (4) Utterance Junctures and Delimiters continued... indicate how a turn was started, either by a participant taking up anothers talk (called latching), or by completing anothers talk (called completion). These Special Mainline Codes are referred to as Pre Utterance Delimiters a list follows...

37 Clarke, R. J (2001) S951-10: 37 Utterance Junctures and Delimiters (a)Utterance Internal Junctures Short Pause [#] Long Pause [#long] Timed Pause [#ss.mm] Comma, (b)Post Utterance Delimiters Period. Question ? Exclamation ! Trailing off [...] Interruption [\] (c)Pre Utterance Delimiters Latching [>] Completion [+]

38 Clarke, R. J (2001) S951-10: 38 CHAT Structure Special Mainline Codes (6) Utterance Ambiguity Codes can also be inserted into a mainline used when there has been: a problem with the transcription process, or when an unusual condition occurs (when a gesture substitutes for a word) words used special coding is required...

39 Clarke, R. J (2001) S951-10: 39 CHAT Structure Special Mainline Codes (7) Utterance Ambiguity Codes may also be moved to their own dependent tiers if the mainline is getting cluttered up with coding the table that follows shows the valid CHAT Utterance Ambiguity Codes...

40 Clarke, R. J (2001) S951-10: 40

41 Clarke, R. J (2001) S951-10: 41 CHAT Structure Bottom of the Transcript

42 Clarke, R. J (2001) S951-10: 42 CHAT Structure Bottom of the Transcript (1) the only unique syntax for the bottom of the transcript is the @End mandatory Constant Header needed to indicate when a transcript is finished a relatively complete transcript extract showing required features follows. NOTE that : is not part of the CHAT standard...

43 Clarke, R. J (2001) S951-10: 43

44 Clarke, R. J (2001) S951-10: 44 Tool Support

45 Clarke, R. J (2001) S951-10: 45 Tool Support (1) the CHAT system has a number of tools available for it one tool called CLAN consists of a parser for checking the syntax of CHAT transcripts multimedia versions of CLAN are being developed; useful when meetings have been videotaped

46 Clarke, R. J (2001) S951-10: 46 Tool Support (2) Needed for Transcription NOT Coding these tools are great for building elaborately coded transcripts they are not so helpful when dealing with workplace language coding is not the major problem- its transcription that takes the greatest effort in workplace language studies

47 Clarke, R. J (2001) S951-10: 47 Tool Support (3) Transcription there are of course a number of transcription systems which when combined with CHAT and CLAN could form a useful workplace language system but, the ‘State-of-the-Art’ still not very good

48 Clarke, R. J (2001) S951-10: 48 Tool Support (4) Speech Recognition? some manufacturers claim to get 95% accuracy in transcription, but this is only possible under very constrained conditions: these systems cannot handle speech which is continuous and flowing- the software cannot find where words start and end these systems cannot transcribe speech unless the system has been trained to understand each and every speaker

49 Clarke, R. J (2001) S951-10: 49 Tool Support (5) in some circumstances the inability of current systems to recognise Flowing Speech may not be a great problem because workplace transcripts can be sparse Some excellent system are becoming available eg./ Dragon DICTATE for Windows

50 Clarke, R. J (2001) S951-10: 50 Tool Support (6) but, it has taken the IS Discipline 20 years to come up with reasonable CASE tools to support traditional systems development activities we may need another 20 years to provide the same level of support for semio-informatics!


Download ppt "Clarke, R. J (2001) S951-10: 1 Critical Issues in Information Systems BUSS 951 Seminar 10 Transcription & Coding."

Similar presentations


Ads by Google