Working with COMPARA an online parallel corpus of English and Portuguese fiction Ana Frankenberg-Garcia.

Slides:



Advertisements
Similar presentations
A corpus-based study of loan words in original and translated texts
Advertisements

The COMET Project: Comparable and Parallel Corpora for the English- Portuguese Pair Stella E. O. Tagnin University of São Paulo UCCTS – Ormskirk
Are translations longer than source texts? A corpus-based study of explicitation Ana Frankenberg-Garcia ISLA, Lisbon.
Principles of corpus construction Matthew Brook ODonnell University of Liverpool - Corpus Linguistics Summer Institute 2008.
Using monolingual and parallel corpora to teach English in Portugal Ana Frankenberg-Garcia ISLA-LX & FCSH-UNL.
Information and Communication Technologies 1 Working with Portuguese corpora Diana Santos Linguateca
Introducing COMPARA The Portuguese-English Parallel Corpus Ana Frankenberg-Garcia ISLA, Lisbon & Diana Santos SINTEF, Oslo.
Using a parallel corpus in translation practice and research Ana Frankenberg-Garcia
Uses of a Corpus “[E]xplore actual patterns of language use”
Lost in parallel concordances Ana Frankenberg-Garcia ISLA, Lisbon.
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
1 Linguistics and translation theory Mark Shuttleworth Teaching Translation Swansea, 20 January 2006.
1 Analysing and teaching meaning (3) Analysing and teaching meaning (3) SSIS Lazio - Lesson 3 prof. Hugo Bowles January 2007.
The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva.
Bilingual Dictionaries
Corpus Creation for Lexicography Adam Kilgarriff, Michael Rundell Lexicography MasterClass, UK Elaine Ui Dhonnchadha ITE (Linguistics Institute of Ireland)
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Making useful wordlists for ELT Topical vocabulary from the WWW Simon Smith & Scott Sommers Ming Chuan University, Taipei Adam Kilgarriff, Lexical Computing.
Harnessing Corpora for real and virtual ELT purposes IFELT Belinda Maia FLUP 10/
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
Corpora and the ‘general public’ Belinda Maia and Luís Sarmento Universidade do Porto.
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Grammar and Grammars Dialects of Native Speakers.
Corpus Linguistics and Corpora. Corpus Corpus, plural Corpora A collection of linguistic data, either compiled as written texts or as a transcription.
CALL – computer assisted language learning A short course delivered by Dr. Klaus Schwienhorst. MITE January 2002.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
Research methods in corpus linguistics Xiaofei Lu.
C OLLEGE VOCABULARY ENGLISH FOR ACADEMIC SUCCESS CHAUDRON GILLE Dictionaries and Word Study.
Promoting Brazilian Literature Abroad Translation and Publication Incentives Overview – Frankfurt Buchmesse 2013.
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
Using corpora for bespoke language teaching
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
1st Workshop on Natural Language Processing and Human Language Technologies Universidade do Algarve, Faro, Portugal June 16-17, 2010
1 Corpora: Annotating and Searching LING 5200 Computational Corpus Linguistics Martha Palmer.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
Researching language with computers Paul Thompson.
Overview of technologies for translators and language service providers Belinda Maia University of Porto.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
How Can Corpora Help Me To Be Successful in CO150?
Lecture 11: 10/1/2002CS149D Fall CS149D Elements of Computer Science Ayman Abdel-Hamid Department of Computer Science Old Dominion University Lecture.
Information and Communication Technologies Linguateca University of São Paulo ICMC / NILC 1 Yes, user! compiling a corpus according to what the user wants.
Translation Studies 9. The use of corpora in TS Krisztina Károly, Spring, 2006 Sources: Olohan, 2004; Tirkkonen-Condit, 2005.
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
Enda F. Scott 2001 Good morning An introduction to modern dictionary making.
Building and analysing your own corpus 1. Building a corpus.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Types of Dictionaries A. Types of Dictionaries in terms of form/medium: - Books (advantages & disadvantages) - CDs (advantages & disadvantages) - Internet/Online.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
THE PROCESS OF WORDS BEING ENTERED IN A DICTIONARY WORD FORMATION IN ENGLISH Magdalena Soklevska April, 2016.
Corpora and language learning
Corpora: a key part of a materials writer’s toolkit
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Exploring the BNC Corpus
عمادة التعلم الإلكتروني والتعليم عن بعد
Identify the codes that are used in these frames
Are translations longer than source texts?
(word formation: follow up)
Natural Language Processing
Using GOLD to Tracking L2 Development
Corpora, Language Technology and Maltese
Using Dictionaries in Translation (223 TRAJ)
Presentation transcript:

Working with COMPARA an online parallel corpus of English and Portuguese fiction Ana Frankenberg-Garcia

An online parallel corpus of English and Portuguese fiction ???  An online corpus  Allows you to study Portuguese and English fiction and their translations into English and Portuguese in an automatic way…

Machine Translation Human Translation COMPARA

The study of human translation  Traditionally not a hard science  Difficult to be systematic But with the technology of corpus linguistics, things can change …

What is a corpus?

Advantages of using corpora to study human translation  An enormous amount of translated texts  Systematic analyses  Quantifiable results Baker (1993), Frankenberg-Garcia (2004), Olohan & Baker (2000), Øverås (1998), Sardinha (2002)

A parallel corpus can also be used in language learning Barlow (2000), Frankenberg-Garcia (2000, 2004, forthcoming), Pearson (2003), Roussel (1991)

Advantages of using corpora in language learning Authentic examples of language use Access to information often absent from conventional grammars and dictionaries Learner autonomy (don’t have to rely on native speakers) Risk-taking

COMPARA team Ana Frankenberg-Garcia, Diana Santos Rosário Silva, Susana Inácio, Rosa Pires Initial support ( ) FCT (Portugal) ISLA Lisboa Oxford University Language Centre Present funding ( ) Linguateca: FCT/ POSI (POSI/PLP/43931/2001) COMPARA

PT source texts EN source texts COMPARA structure EN translations PT translations COMPARA

English Portuguese Original Translated Portuguese Portuguese Original Translated English Source Translations Texts

COMPARA users and uses  Language learners - bilingual dictionary with examples  Language teachers - exercises and tests  Translators - language equivalents  Translation lecturers - exercises & problems  Translation theorists - test translation hypotheses  Bilingual lexicographers - bilingual dictionaries  Computational linguists - machine translation Since 2001: queries

Remember that the results you get are “only as good as the corpus” J. Sinclair Corpus concordance collocation (1991: 13) Why can’t I find the Portuguese translation of greenhouse gas in COMPARA? Before using it…

COMPARA 5.6 varieties Portugal Brazil Angola Mozambique UK US South Africa PORTUGUESE ENGLISH

COMPARA 5.6 Publication dates

COMPARA 5.6 genre Published fiction other genres EXTENSIBLE

COMPARA 5.6 authors Portuguese writers Camilo Castelo Branco Eça de Queirós José Cardoso Pires Jorge de Sena Mário de Carvalho Sá Carneiro

COMPARA 5.6 authors Brazilian writers Aluísio Azevedo Autran Dourado Chico Buarque José de Alencar Machado de Assis Manuel Antônio de Almeida Marcos Rey Patrícia Melo Paulo Coelho Rubem Fonseca

COMPARA 5.6 authors Angolan writers José Eduardo Agualusa Mozambiquean writers Mia Couto

COMPARA 5.6 authors British writers David Lodge Julian Barnes Joseph Conrad Joanna Trollope Lewis Carrol Oscar Wilde

COMPARA 5.6 authors American writers Henry James Edgar Allan Poe Richard Zimler South African writers Nadine Gordimer + copyright permission to use more

Can any text be included in the corpus?  Only published source texts and translations  Only English translated directly from Portuguese, and Portuguese translated directly from English  Only human translations!

46 source texts (extracts) 49 translations COMPARA 5.6 texts

COMPARA 5.6 size words in in English Portuguese Largest edited parallel corpus in the world

Now I know why I can’t find greenhouse gas in COMPARA!

syntax general language technical terms fiction other genres COMPARA 5.6

When using corpora, remember: Language is “constructed out of a finite set of elements”, but it is something that is used creatively! N. Chomsky Syntactic Structures (1957:13) “rule” “as a rule” “rule of thumb” One more thing… “As a rule of thumb you need a litre of paint to every 12 square metres of wall”

COMPARA availability Free, online For research and education

COMPARA access COMPARA