CHAT and CLAN Fraibet Aveledo ESRC Centre for Research on Bilingualism in Theory and Practice
Page 2 The corpora (brief summary about the importance of corpora) –The computerization of the data. – Spontaneous speech that represent a community –The size of the corpus –Homogeneity –Transcriptions and notations –Analysis of the data
Page 3 CHILDES and Talkbank The CHILDES Project: Child Language Data Exchange System The goal of TalkBank is to foster fundamental research in the study of human and animal communication. –It will construct sample databases within each of the subfields studying communication. –It will use these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary materials via networked computers.
Page 4 CHAT Codes for the Human Analysis of Transcripts Standardized format for computerized transcripts of face-to-face conversational interactions. CHAT allows –to transcribe basic conversations – provides options for coding more specialized information that allows » to analyze syntax, phonology, and morphology phenomena.
Page 5 CHAT Codes for the Human Analysis of Transcripts When transcribing Be careful no to transcribe spoken language as written language. Some issues have to be discussed, depending on the characteristics of the corpus. Tendency to use punctuations as in written language.
Page 6 Transcription in CHAT Transcription is done in CLAN programme. The sound can be accessed in the same page when the transcription is taking place. CHAT format have three main components: –Headers –Main tiers –Dependent tiers
Page 7
Page 8 Headers –Component for including information about subjects from transcription, date of recording, date of transcription, ages, etc. –There are hidden, initial, constant and changeable headers. » Do not appear in CLAN but necessary for running the programme. –Headers: should start by –Then, the name of the header, followed by “ : ” and a tab “ “
Page IMPORTANT: headers never finish in any punctuation. Between the “:” and the number 2 there is a TAB
Page 10 There are three initial headers, they are obligatory. Without them, CLAN does not work. (STATFREQ y OUTPUT TO EXCEL) - it is placed at the beginning of the transcription. This header is not followed by a It tells to the programme what language has been used in the dialogues. In the CHAT manual there is a Table with the abbreviation for each language
Page they have to be placed in the second line of the transcription. The ID, the names, and roles are placed SAR Sue Target_Child, CAR Carol Mother Participants are identified by three letters, usually, a pseudonym. These letters have to go in capital letters. When transcribing children conversations, the role of each participant is written.
Page 12 Not obligatory ** obligaroty.
Page 13
Page 14 There is another set of headers that are optional. They offer important information about the participants: - In a case where the child Julio, is called of place of of JUL: Participant-specific headers
Page @Number Constant headers: are optionals
Page 16 Quality Layout Duration
Page 17 Other headers Start:
Page 18 Chageable headers They can go in any part of the transcription background material for GEM date of the interaction for GEM episode Language only written text location
Page 19 Main tiers Main tiers contains the utterances produced by speakers. Each tier must start: *JUL:mam á, quiero agua [c] y quiero chocolate [c]! *MAM:ya te los traigo [c]. Transcribers decide what should contain each tier. Each tier must finish in :. ! ? Utterances begin with small letters; exceptions: 1st person pronoun « I », proper names.
Page 20 Trancription markers In the main tiers, in our transcriptions we mark the language of the word: –*KAY: –Language = = = word with first morpheme(s) English, secondmorpheme(s) = word with first morpheme(s) Spanish,second morpheme(s) = word with first morpheme(s)undetermined, second morpheme(s) English. –There are constant discussion about cases in which it is difficult to determine to what language the word belong.
Page 21 Trancription markers Trailing off: +... –*TOD:I think that I +... Interruption: +/. –*TOD:it’s your +/. –*LEO:do you have a lion ? Lazy overlap: +< –*TOD:it’s your +/. –*LEO:+<do you have a lion ? Self-interruption:+//. –*TOD: I don’t think +//. –*TOD:let’s play Go Fish. Self-completion: +, –*TOD:I don’t think that I +... –*SUS:what ? –*TOD:+, that I know how to play.
Page 22 Other symbols Repetition: [/] *TOD:what [/] what did you say ? If the repetition applies to more than one word, use angle brackets Repetition with self-repair: [//] *TOD: [//] what did you say ? Retracing with reformulation: [///] *TOD:what did [///] when are you coming ?
Page 23 Other symbols Quotations –*TOD:he said +”/. –*TOD:+” do you have a lion ? Pauses: –# –## long –### very long Not understood, or transcriber’s best guess:[?] *SIM:pairs [?] I want to play Candyland.
Page 24
Page 25
Page 26
Page 27 Simple events
Page 28 Commentaries in the transcription, and codify, should be done in the Dependent Tiers *JUL:mam á, quie(r)o XXX [c] y quie(r)o choco(l)ate [c]! %com:the child does not master the liquids. *MAM:ya te los traigo [c]. Dependent tiers
Page 29
Page 30 Transcription process Before starting the transcription the headers tiers must be ready. Transcription is done in CLAN. Sound mode: sound file can be accessed in the same file where the transcription is taking place. –Sound playing from the waveform –Waveform demarcation –Linking : transcription to the sound Bullet system: allows you to save in the transcription each bits of conversations transcribed in each tier (e.g. SASTRE 9) –Changing the waveform window: +H, -H (time displayed in the window); +V –V (wave amplitude). –Chanels R and L.
Page 31 OPTIONS ◄◄
Page 32
Page 33 CLAN Programmes CLAN: Computerized Language Analysis Instructions: –Open CLAN –Open Commands –Setting Working and Lib
Page 34