Using Parallel Corpora for Contrastive Studies Michael Barlow
Overview Introduce the use of parallel (translation) corpora in contrastive studies Examine some simple searches to illustrate the potential of ParaConc and the general corpus-based approach Focus on the mechanics, but also consider some issues related to parallel concordancing
Multilingual Concordancing Advantages Specific corpora -- e.g. architecture Potentially, several examples of the target structure can be examined -- measures of congruence are possible Empirical data Context (sentence/paragraph) is present
Multilingual Concordancing Disadvantages locating/aligning corpora appropriate corpora may not be available time needed to process information direction of translation translationese hot words may not be translations
Using corpora Corpora -- samples of monolingual texts produced by writers and samples of translation texts produced by writers and translators Translators are creating the best fit between two languages Software aids the analysis of monolingual and bilingual formal patterns Analysts need to evaluate and analyse the patterns
Using corpora Language is understood by reference to frames or cultural models -- large corpora can reveal cultural patterns Each form has many meanings -- different meanings are indicated by different collocations and co-text (interpreted by corpus analyst) We can determine translational rather than formal equivalence -- based on parallel corpora
Parallel corpora Translated texts Translation focus single book plus one or more translations Language focus large corpora (e.g., European Parliament output)
Contrastive studies Contrastive analysis -- 60s 70s Nickel (1971) refers to the problem of equivalence - -- “formal equivalence can be established relatively easily”, it is difficult to identify “functional- semantic equivalence.”
Contrastive studies Use corpora to identify functional-semantic equivalence Thus while passives may be formally equivalent in two languages, there may be little overlap in terms of usage Exploit large text corpora to pursue a corpus-based or usage-based approach to contrastive studies (Gellerstam 1996; Aijmer, Altenberg and Johansson 1996)
Parallel corpora - language focus A parallel corpus gives a summation of many individual decisions of what is equivalent Each translator considers all the particular factors associated with any individual translation and makes a best fit estimate.
Contrastive studies Perennial problem of what to contrast. What are equivalent words and structures in two languages. Formal equivalence Functional/pragmatic equivalence Translation equivalence Focus on translational equivalence
Contrastive studies Relying on translational equivalence Translator is translating texts, not words or phrases, and so the matching is more approximate than we would like for our contrastive purposes Direction of translation is important
Use parallel (translated) corpora Access equivalence using parallel corpora Look for congruence and non-congruence for particular language features, e.g., passive, prepositions, spatial adjectives
Assessing congruence Search for a preposition in L1 and assess the uniformity of the equivalents in L2 In addition, assess backwards congruence distinguish formal/token congruence from meaning/functional congruence
Practical session Simple searches Locating equivalents
Assessing congruence For a particular corpus, search for and count the number of instances of word1 Find the most usual translation, trans1 (i) Perform a parallel search for word1-trans1 and examine the usage and collocations for word1 (= which uses of word1 translate) (ii) Perform a parallel search for word1 - NOT trans1 and examine the usage of word1 (= which uses of word1 don’t translate)
Parallel search
Issues Translator is translating texts (typically sentences) rather than words, collocations or constructions. Consequently we need large corpora to find examples of equivalences. Monolingual corpus investigations typically supplement the translation corpus findings
Issues Tools such as ParaConc provide a window on translation data. Good software design makes the tool invisible, but the tool highlights some views of the data and obscures others ParaConc is a word(s) window Alternatives -- sentence (information structure) or paragraph or cohesion windows
Software tools Computer software bring out patterns in language data Tools also hide and obscure data “If you only have a hammer, every problem looks like a nail” Or “Using a hammer, everything becomes a nail”
Searching and counting You formulate the search based on what is in the corpus -- words, tags etc. Software does the counting
Corpus insights Frequency counts Genre/text type affects the form of language The notion of lexico-grammar
Cognitive insights Polysemy Metaphors and blended spaces -- metaphors are part of ordinary language -- do they translate Construal Categorisation
Alignment Texts need to be aligned at roughly the sentence level Alignment is difficult and restricts the availability of parallel texts for analysis
Loading texts
Thank you