Download presentation
Presentation is loading. Please wait.
Published byJaron Croll Modified over 9 years ago
1
Using a parallel corpus in translation practice and research Ana Frankenberg-Garcia ana.frankenberg@sapo.pt
2
Machine Translation Using machines to analyse Human Translation
3
The study of human translation Traditionally not a hard science Difficult to be systematic But with the technology of corpus linguistics, things can change …
4
What is a corpus? large specific criteria text-retrieval software machine-readable
5
Advantages of using corpora to study human translation An enormous amount of translated texts Systematic analyses Quantifiable results
6
A bi-directional parallel corpus of Portuguese and English COMPARA Project leaders Ana Frankenberg-Garcia & Diana Santos Research assistants Rosário Silva & Susana Inácio Initial support (1999-2000) FCT (Portugal) ISLA (Lisboa) Oxford University (Language Centre) Present funding (2001-2006) Linguateca: FCT/ POSI (POSI/PLP/43931/2001)
7
PT source texts EN source texts COMPARA structure EN translations PT translations COMPARA
8
English Portuguese Original Translated Portuguese Portuguese Original Translated English Source Translations Texts
9
COMPARA 8.0 varieties Portugal Brazil Angola Mozambique UK US South Africa PORTUGUESE ENGLISH Unbalanced distribution!
10
COMPARA 8.0 Publication dates 1837 2002 1880 1997 1988 1914
11
COMPARA 8.0 genre Published fiction other genres EXTENSIBLE
12
COMPARA 8.0 authors Portuguese writers Camilo Castelo Branco Eça de Queirós José Cardoso Pires José Saramago Jorge de Sena Lídia Jorge Mário de Carvalho Sá Carneiro
13
COMPARA 8.0 authors Brazilian writers Aluísio Azevedo Autran Dourado Chico Buarque Jô Soares José de Alencar Machado de Assis Manuel Antônio de Almeida Marcos Rey Patrícia Melo Paulo Coelho Rubem Fonseca
14
COMPARA 8.0 authors Angolan writers José Eduardo Agualusa Mozambiquean writers Mia Couto
15
COMPARA 8.0 authors British writers David Lodge Ian McEwan Julian Barnes Joseph Conrad Joanna Trollope Kazuo Ishiguro Lewis Carrol Mary Shelley Oscar Wilde
16
COMPARA 8.0 authors American writers Henry James Edgar Allan Poe Richard Zimler South African writers Nadine Gordimer
17
Can any text be included in the corpus? Only published source texts and translations Only English translated directly from Portuguese Portuguese translated directly from English Only human translations!
18
71 source texts (extracts) 74 translations COMPARA 8.0 texts
19
COMPARA 8.0 size 1,536,269 1,423,937 words in in English Portuguese Largest edited parallel corpus containing Portuguese
20
COMPARA users and uses Language learners - bilingual dictionary with examples Language teachers - exercises and tests Translators - language equivalents Translation lecturers - exercises & problems Translation theorists - test translation hypotheses Lexicographers - bilingual dictionaries Computational linguists - machine translation Latest statistics: + 6000 queries per month
21
COMPARA availability Free, online For research and education
22
www.linguateca.pt/COMPARA/ COMPARA access COMPARA
24
“nodded”
27
Studies using COMPARA 1.Observing source texts and translations 2.Constrasting Portuguese and English 3.Comparing translated and untranslated language 4.Examining the characteristics of translated texts
28
1. Observing source texts & translations Improving bilingual dictionaries and machine-translation programs Frankenberg-Garcia (2002) nod Ribeiro & Dias (2005) grande Specia et al. (2005) word-sense disambiguation
29
2. Contrasting English and Portuguese Contrasting original fiction in English and Portuguese Frankenberg-Garcia (2005) PT Loan words EN Loan words PT Loan languages EN Loan languages
30
3. Comparing translated and untranslated language diferente(s) simplesmente end.* up translations source texts * 30,715,4 15,6 5,1 13,5 2,8 * frequency/100 K words in COMPARA 7.0.4 2 x 3 x 4 x lemma “rezar” 5,612,4 2 x
31
4. Examining the characteristics of translated texts Are translations longer than source texts? Frankenberg-Garcia (2004) Explicitation Hypothesis
32
Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words ? Source texts Translations 8 PT authors 8 EN authors 8 PT translators 8 EN translators
33
ST TT + 5% Matched t-test: 95% probability TT longer than ST Source texts Translations
34
Studies such as these were unthinkable before corpora Many other studies are possible! COMPARA is free and available online Contact us: ana.frankenberg@sapo.ptana.frankenberg@sapo.pt diana.santos@sintef.no To conclude....
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.