Using Parallel Corpora for Contrastive Studies Michael Barlow.

Slides:



Advertisements
Similar presentations
Close Reading at NQ Is it really that different to what I have done before?
Advertisements

Blueprints or conduits? Using an automated tool for text analysis Stuart G. Towns and Richard Watson Todd King Mongkut’s University of Technology Thonburi.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
Active ReadingStrategies. Reader Reception Theory emphasizes that the reader actively interprets the text based on his or her particular cultural background.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Report Assessment AE Semester Two
Corpus Linguistics and Second Language Acquisition – The use of ACORN in the teaching of Spanish Grammar Guadalupe Ruiz Yepes.
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
Machine translation Context-based approach Lucia Otoyo.
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Chapter 10 Language and Computer English Linguistics: An Introduction.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
8. ONLINE REFERENCE TOOLS Dictionaries and Thesauruses Concordancers and corpuses for language analysis Translators for language analysis Encyclopedias.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Constructing and Outline I. What are the various styles of preaching and how they relate to outlining? A. There are several different styles of presentation.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
AMANY ALKHAYAT PSCW ENG371 INTRODUCTION TO CORPUS PROCESSING Corpus Processing Ch1.
Corpora and language learning
Automatic Writing Evaluation
Chapter 14. Conclusions From “The Computational Nature of Language Learning and Evolution” Summarized by Seok Ho-Sik.
contrastive linguistics
Dr Anie Attan 26 April 2017 Language Academy UTMJB
Criterial features If you have examples of language use by learners (differentiated by L1 etc.) at different levels, you can use that to find the criterial.
The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies
Collecting Written Data
E303 Part II The Context of Language Research
TYPES OF TRANSLATION.
Corpus Linguistics Anca Dinu February, 2017.
Introduction to Corpus Linguistics
Statistical NLP: Lecture 7
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
M.Lucero and M.Spyridakis, Spetses, June 2017
What is Knowledge? External objective truth?
Writing Requirements Lecture # 23.
SEMANTICS VS PRAGMATICS
Objectives Importance of Requirement Engineering
Searching corpora.

Computational and Statistical Methods for Corpus Analysis: Overview
Learning Outcomes Carolynn Rankin YULIS Friday 5th May 2006
Text visualisation.
contrastive linguistics
THE QUESTIONS—SKILLS ANALYSE EVALUATE INFER UNDERSTAND SUMMARISE
Translation Problems.
Corpus Linguistics I ENG 617
What writing practices international students bring in EAP programmes
Statistical NLP: Lecture 9
The European Union case law corpus (EUCLCORP)
Chapter 11 user support.
Using GOLD to Tracking L2 Development
Dr. Debaleena Chattopadhyay Department of Computer Science
Applied Linguistics Chapter Four: Corpus Linguistics
Cross Language Information Retrieval (CLIR)
COMPARATIVE Linguistics 2018/2019
contrastive linguistics
LINGUA INGLESE 2A – a.a. 2018/2019 Computer-Aided Translation Technology LESSON 1 prof. ssa Laura Liucci –
contrastive linguistics
Translating Collocations for Bilingual Lexicons
Statistical NLP : Lecture 9 Word Sense Disambiguation
Data Analysis, Interpretation, and Presentation
Presentation transcript:

Using Parallel Corpora for Contrastive Studies Michael Barlow

Overview Introduce the use of parallel (translation) corpora in contrastive studies Examine some simple searches to illustrate the potential of ParaConc and the general corpus-based approach Focus on the mechanics, but also consider some issues related to parallel concordancing

Multilingual Concordancing Advantages Specific corpora -- e.g. architecture Potentially, several examples of the target structure can be examined -- measures of congruence are possible Empirical data Context (sentence/paragraph) is present

Multilingual Concordancing Disadvantages locating/aligning corpora appropriate corpora may not be available time needed to process information direction of translation translationese hot words may not be translations

Using corpora Corpora -- samples of monolingual texts produced by writers and samples of translation texts produced by writers and translators Translators are creating the best fit between two languages Software aids the analysis of monolingual and bilingual formal patterns Analysts need to evaluate and analyse the patterns

Using corpora Language is understood by reference to frames or cultural models -- large corpora can reveal cultural patterns Each form has many meanings -- different meanings are indicated by different collocations and co-text (interpreted by corpus analyst) We can determine translational rather than formal equivalence -- based on parallel corpora

Parallel corpora Translated texts Translation focus single book plus one or more translations Language focus large corpora (e.g., European Parliament output)

Contrastive studies Contrastive analysis -- 60s 70s Nickel (1971) refers to the problem of equivalence - -- “formal equivalence can be established relatively easily”, it is difficult to identify “functional- semantic equivalence.”

Contrastive studies Use corpora to identify functional-semantic equivalence Thus while passives may be formally equivalent in two languages, there may be little overlap in terms of usage Exploit large text corpora to pursue a corpus-based or usage-based approach to contrastive studies (Gellerstam 1996; Aijmer, Altenberg and Johansson 1996)

Parallel corpora - language focus A parallel corpus gives a summation of many individual decisions of what is equivalent Each translator considers all the particular factors associated with any individual translation and makes a best fit estimate.

Contrastive studies Perennial problem of what to contrast. What are equivalent words and structures in two languages. Formal equivalence Functional/pragmatic equivalence Translation equivalence Focus on translational equivalence

Contrastive studies Relying on translational equivalence Translator is translating texts, not words or phrases, and so the matching is more approximate than we would like for our contrastive purposes Direction of translation is important

Use parallel (translated) corpora Access equivalence using parallel corpora Look for congruence and non-congruence for particular language features, e.g., passive, prepositions, spatial adjectives

Assessing congruence Search for a preposition in L1 and assess the uniformity of the equivalents in L2 In addition, assess backwards congruence distinguish formal/token congruence from meaning/functional congruence

Practical session Simple searches Locating equivalents

Assessing congruence For a particular corpus, search for and count the number of instances of word1 Find the most usual translation, trans1 (i) Perform a parallel search for word1-trans1 and examine the usage and collocations for word1 (= which uses of word1 translate) (ii) Perform a parallel search for word1 - NOT trans1 and examine the usage of word1 (= which uses of word1 don’t translate)

Parallel search

Issues Translator is translating texts (typically sentences) rather than words, collocations or constructions. Consequently we need large corpora to find examples of equivalences. Monolingual corpus investigations typically supplement the translation corpus findings

Issues Tools such as ParaConc provide a window on translation data. Good software design makes the tool invisible, but the tool highlights some views of the data and obscures others ParaConc is a word(s) window Alternatives -- sentence (information structure) or paragraph or cohesion windows

Software tools Computer software bring out patterns in language data Tools also hide and obscure data “If you only have a hammer, every problem looks like a nail” Or “Using a hammer, everything becomes a nail”

Searching and counting You formulate the search based on what is in the corpus -- words, tags etc. Software does the counting

Corpus insights Frequency counts Genre/text type affects the form of language The notion of lexico-grammar

Cognitive insights Polysemy Metaphors and blended spaces -- metaphors are part of ordinary language -- do they translate Construal Categorisation

Alignment Texts need to be aligned at roughly the sentence level Alignment is difficult and restricts the availability of parallel texts for analysis

Loading texts

Thank you