Working with COMPARA an online parallel corpus of English and Portuguese fiction Ana Frankenberg-Garcia
An online parallel corpus of English and Portuguese fiction ??? An online corpus Allows you to study Portuguese and English fiction and their translations into English and Portuguese in an automatic way…
Machine Translation Human Translation COMPARA
The study of human translation Traditionally not a hard science Difficult to be systematic But with the technology of corpus linguistics, things can change …
What is a corpus?
Advantages of using corpora to study human translation An enormous amount of translated texts Systematic analyses Quantifiable results Baker (1993), Frankenberg-Garcia (2004), Olohan & Baker (2000), Øverås (1998), Sardinha (2002)
A parallel corpus can also be used in language learning Barlow (2000), Frankenberg-Garcia (2000, 2004, forthcoming), Pearson (2003), Roussel (1991)
Advantages of using corpora in language learning Authentic examples of language use Access to information often absent from conventional grammars and dictionaries Learner autonomy (don’t have to rely on native speakers) Risk-taking
COMPARA team Ana Frankenberg-Garcia, Diana Santos Rosário Silva, Susana Inácio, Rosa Pires Initial support ( ) FCT (Portugal) ISLA Lisboa Oxford University Language Centre Present funding ( ) Linguateca: FCT/ POSI (POSI/PLP/43931/2001) COMPARA
PT source texts EN source texts COMPARA structure EN translations PT translations COMPARA
English Portuguese Original Translated Portuguese Portuguese Original Translated English Source Translations Texts
COMPARA users and uses Language learners - bilingual dictionary with examples Language teachers - exercises and tests Translators - language equivalents Translation lecturers - exercises & problems Translation theorists - test translation hypotheses Bilingual lexicographers - bilingual dictionaries Computational linguists - machine translation Since 2001: queries
Remember that the results you get are “only as good as the corpus” J. Sinclair Corpus concordance collocation (1991: 13) Why can’t I find the Portuguese translation of greenhouse gas in COMPARA? Before using it…
COMPARA 5.6 varieties Portugal Brazil Angola Mozambique UK US South Africa PORTUGUESE ENGLISH
COMPARA 5.6 Publication dates
COMPARA 5.6 genre Published fiction other genres EXTENSIBLE
COMPARA 5.6 authors Portuguese writers Camilo Castelo Branco Eça de Queirós José Cardoso Pires Jorge de Sena Mário de Carvalho Sá Carneiro
COMPARA 5.6 authors Brazilian writers Aluísio Azevedo Autran Dourado Chico Buarque José de Alencar Machado de Assis Manuel Antônio de Almeida Marcos Rey Patrícia Melo Paulo Coelho Rubem Fonseca
COMPARA 5.6 authors Angolan writers José Eduardo Agualusa Mozambiquean writers Mia Couto
COMPARA 5.6 authors British writers David Lodge Julian Barnes Joseph Conrad Joanna Trollope Lewis Carrol Oscar Wilde
COMPARA 5.6 authors American writers Henry James Edgar Allan Poe Richard Zimler South African writers Nadine Gordimer + copyright permission to use more
Can any text be included in the corpus? Only published source texts and translations Only English translated directly from Portuguese, and Portuguese translated directly from English Only human translations!
46 source texts (extracts) 49 translations COMPARA 5.6 texts
COMPARA 5.6 size words in in English Portuguese Largest edited parallel corpus in the world
Now I know why I can’t find greenhouse gas in COMPARA!
syntax general language technical terms fiction other genres COMPARA 5.6
When using corpora, remember: Language is “constructed out of a finite set of elements”, but it is something that is used creatively! N. Chomsky Syntactic Structures (1957:13) “rule” “as a rule” “rule of thumb” One more thing… “As a rule of thumb you need a litre of paint to every 12 square metres of wall”
COMPARA availability Free, online For research and education
COMPARA access COMPARA