Using a parallel corpus in translation practice and research Ana Frankenberg-Garcia
Machine Translation Using machines to analyse Human Translation
The study of human translation Traditionally not a hard science Difficult to be systematic But with the technology of corpus linguistics, things can change …
What is a corpus? large specific criteria text-retrieval software machine-readable
Advantages of using corpora to study human translation An enormous amount of translated texts Systematic analyses Quantifiable results
A bi-directional parallel corpus of Portuguese and English COMPARA Project leaders Ana Frankenberg-Garcia & Diana Santos Research assistants Rosário Silva & Susana Inácio Initial support ( ) FCT (Portugal) ISLA (Lisboa) Oxford University (Language Centre) Present funding ( ) Linguateca: FCT/ POSI (POSI/PLP/43931/2001)
PT source texts EN source texts COMPARA structure EN translations PT translations COMPARA
English Portuguese Original Translated Portuguese Portuguese Original Translated English Source Translations Texts
COMPARA 8.0 varieties Portugal Brazil Angola Mozambique UK US South Africa PORTUGUESE ENGLISH Unbalanced distribution!
COMPARA 8.0 Publication dates
COMPARA 8.0 genre Published fiction other genres EXTENSIBLE
COMPARA 8.0 authors Portuguese writers Camilo Castelo Branco Eça de Queirós José Cardoso Pires José Saramago Jorge de Sena Lídia Jorge Mário de Carvalho Sá Carneiro
COMPARA 8.0 authors Brazilian writers Aluísio Azevedo Autran Dourado Chico Buarque Jô Soares José de Alencar Machado de Assis Manuel Antônio de Almeida Marcos Rey Patrícia Melo Paulo Coelho Rubem Fonseca
COMPARA 8.0 authors Angolan writers José Eduardo Agualusa Mozambiquean writers Mia Couto
COMPARA 8.0 authors British writers David Lodge Ian McEwan Julian Barnes Joseph Conrad Joanna Trollope Kazuo Ishiguro Lewis Carrol Mary Shelley Oscar Wilde
COMPARA 8.0 authors American writers Henry James Edgar Allan Poe Richard Zimler South African writers Nadine Gordimer
Can any text be included in the corpus? Only published source texts and translations Only English translated directly from Portuguese Portuguese translated directly from English Only human translations!
71 source texts (extracts) 74 translations COMPARA 8.0 texts
COMPARA 8.0 size 1,536,269 1,423,937 words in in English Portuguese Largest edited parallel corpus containing Portuguese
COMPARA users and uses Language learners - bilingual dictionary with examples Language teachers - exercises and tests Translators - language equivalents Translation lecturers - exercises & problems Translation theorists - test translation hypotheses Lexicographers - bilingual dictionaries Computational linguists - machine translation Latest statistics: queries per month
COMPARA availability Free, online For research and education
COMPARA access COMPARA
“nodded”
Studies using COMPARA 1.Observing source texts and translations 2.Constrasting Portuguese and English 3.Comparing translated and untranslated language 4.Examining the characteristics of translated texts
1. Observing source texts & translations Improving bilingual dictionaries and machine-translation programs Frankenberg-Garcia (2002) nod Ribeiro & Dias (2005) grande Specia et al. (2005) word-sense disambiguation
2. Contrasting English and Portuguese Contrasting original fiction in English and Portuguese Frankenberg-Garcia (2005) PT Loan words EN Loan words PT Loan languages EN Loan languages
3. Comparing translated and untranslated language diferente(s) simplesmente end.* up translations source texts * 30,715,4 15,6 5,1 13,5 2,8 * frequency/100 K words in COMPARA x 3 x 4 x lemma “rezar” 5,612,4 2 x
4. Examining the characteristics of translated texts Are translations longer than source texts? Frankenberg-Garcia (2004) Explicitation Hypothesis
Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words ? Source texts Translations 8 PT authors 8 EN authors 8 PT translators 8 EN translators
ST TT + 5% Matched t-test: 95% probability TT longer than ST Source texts Translations
Studies such as these were unthinkable before corpora Many other studies are possible! COMPARA is free and available online Contact us: To conclude....