© Marta Gómez Palou, Ottawa, Canada, 2006 A guide through the unknown: using corpora to translate into a non-native dialect Marta Gómez Palou New Research in Translation and Interpreting Studies Tarragona, Spain, 21 October 2006
Outline Key definitions Project conception Methodology Study 1 Study 2 Conclusion
Key definitions Corpus: “large collection of authentic texts that have been gathered in electronic form according to a specific set of criteria” (Bowker and Pearson 2002: 9) Dialect: in the context of this thesis, a language variety, such as French Canadian or Argentinean Spanish Non-native dialect: dialect which one does not grow up speaking
CONCEPTION
Mental Process Globalization + Localization + Lack of translators + Complexification of the translation process … non-native dialect translation? Advantages Possible? Maybe with corpora? Why corpora?
METHODOLOGY
2 goals, 2 studies, 2 lit reviews Assess the usefulness and advantages (if any) of using corpora to study dialects. Test their helpfulness for translators translating into a non-native dialect.
Limitations Language/dialect: Buenos Aires Spanish and peninsular Spanish Dialectal characteristics: morphosyntactic and not lexical or phonetic Corpora: self-compiled, size Source texts: unequal Participants: volunteers
Study 1: Reference Books vs. Corpora Nature of the experiment: Take the linguistic characteristics of a given dialect as described in textbooks and investigate their presence in a monodialectal corpus of that same dialect, as well as their absence in a monodialectal corpus of another dialect. Linguistic descriptions review: Alvar (1996) Manual de dialectología hispánica: el español de América Fontanella de Weinberg (1987) El español bonaerense: cuatro siglos de evolución lingüística (1992) El español de América (2004) El español de la Argentina y sus variedades regionales Lipski and Iglesias Recuerdo (1996) El español de América Placer (2003) ¿Los argentinos hablan español? Vaquero de Ramírez (1996) El español de América
Description of Buenos Aires Spanish: Compiling a monodialectal corpus: Scope: specialized, not highly, not culturally bound > popularized IT texts Channel: written Linguistic quality: professional Publication date: recent Pre-processing: TreeTagger Articles from the supplement Informática 2.0 in the Clarín journal. June 05 – May 06, 126 texts, words. Identifying peninsular Spanish and Argentinean reference corpora: CREA Analysing the corpora 1. Voseo4. Use of le as pl. pron. 2. Pl. existential haber5. Nouns with ambiguous gender 3. V number variation passive constructions (se) 6. Dequeísmo and queísmo
Results Additional observations Characteristics described in textbooks are not equally representative of this dialect in authentic texts. The normative counterparts of non-normative characteristics are more popular in the corpus. VoseoConfirmedLe (pl)Tentative support Pl. Existential haberUnconfirmedAmbiguous genderConfirmed V number variation passives Tentative support Dequeísmo Queísmo Tentative negative Confirmed
Two groups of translators (Agua and Fuego) All native speakers of peninsular Spanish Two source texts to translate From En/Fr into Buenos Aires Spanish Two types of resources Conventional (dictionaries, ref books, Web) Monodialectal Buenos Aires corpus (Clarín) Procedure Agua translate ST1 with corpus, ST2 with conv. res. Fuego translate ST1 with conv. res. and ST2 with corpus Study 2: Translation experiment
Preparatory work Selecting the subject field, corpus and corpus analysis tool Selecting the participants Selecting the source texts Selecting the conventional resources Preparing a WordSmith Tools tutorial Preparing translators and evaluator’s information packages Execution
Quantitative Results General Qualitative Results Verbal: improvement of voseo Lexical: Phraseological: no comments # OF SCORESWITHOUT CORPUSWITH CORPUS Inadequate41 Mostly adequate99 Adequate36 TOTAL16 SolutionWithoutWith Buenos Aires7 (58%) Peninsular5 (42%)2 (17%) Neutral0 (0%)3 (25%) Total12 (100%)
CONCLUSION
Assessment Modifications Scale up and deepen the experiments (bigger corpus, more participants, more detailed tagger, etc. ) Organize the translation experiment on-site Minimize “open” questions in questionnaires Give participants more training in tools and corpus use Further Work Extend to new dialects Reverse the experiment (Buenos Aires >> peninsular Spanish) Cover different register and subjects Factor the impact of familiarity with the non-native dialect Look into lexical differences …
Conclusion Corpora have proved useful for empirical investigation of dialectal linguistic characteristics. Tentative results point that corpus-based resources could be useful to translate into a non-native dialect. More research is needed. - Thanks for your attention!