In pursuit of the ‘third code’ Using the ZJU Corpus of Translational Chinese (ZCTC) in Translation Studies Richard Xiao Lianzhen He Ming Yue.

Slides:



Advertisements
Similar presentations
Haiyang Ai, Gong Peng Graduate University, Chinese Academy of Sciences
Advertisements

Corpora in grammatical studies
Diachronic study and language change Corpus Linguistics Richard Xiao
Using corpora in contrastive and translation studies
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Cross Cultural Research
Correlational and Differential Research
Variation and regularities in translation: insights from multiple translation corpora Sara Castagnoli (University of Bologna at Forlì – University of Pisa)
“In light of this, it is suggested…”: Comparing n-grams in Chinese and British students’ undergraduate assignments from UK universities Maria LeedhamICAME.
Diachronic study and language change Corpus Linguistics Richard Xiao
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Figurative Language Development Research and Popular Children’s Literature: Why We Should Know, “Where the Wild Things Are” Kathleen Ahrens.
Identifying research questions
A Corpus-based Study of Discourse Features in Learners ’ Writing Development Yu-Hua Chen Lancaster University, UK.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
The origins of language curriculum development
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
Corpora and Language Teaching
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Slide 3.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
Research methods in corpus linguistics Xiaofei Lu.
Corpus Linguistics Case study 2 Grammatical studies based on morphemes or words. G Kennedy (1998) An introduction to corpus linguistics, London: Longman,
“SL shining through” in translational language: A corpus-based study of Chinese translation of English passives Guangrong Dai Richard Xiao.
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,
CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran
Can “translation universals” survive in Mandarin? Idioms, word clusters, and reformulation markers in translational Chinese Richard Xiao.
PSSA Reading Test.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Translation Studies 7. Cohesion in translation Krisztina Károly, Spring, 2006 Source: Klaudy & Károly, 2000.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Paraphrasing and Plagiarism. PLAGIARISM Plagiarism is using data, ideas, or words that originated in work by another person without appropriately acknowledging.
Representatıvness, balance and samplıng ın a corpus Lınguistıcs.
Unit A1 What is Translation?
NTUT Writing: Week 5 “Results”. 6.1 Contents and Structure: An Example.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
Chapter 10 Language and Computer English Linguistics: An Introduction.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
GUIDE : PROF. PUSHPAK BHATTACHARYYA Bilingual Terminology Mining BY: MUNISH MINIA (07D05016) PRIYANK SHARMA (07D05017)
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Translation Studies 9. The use of corpora in TS Krisztina Károly, Spring, 2006 Sources: Olohan, 2004; Tirkkonen-Condit, 2005.
Elena Tarasheva, PhD New Bulgarian University. Conclusions at last year’s BETA conference.
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
1 Ch 1. VOCABULARY SIZE, TEXT COVERAGE & WORD LISTS Nation& Waring.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Passive Generalizations Li, Charles N. & Thompson, Sandra A. (1981). Mandarin Chinese - A Functional Reference Grammar. Los Angeles: University of California.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 1 Research: An Overview.
Later developments in equivalence
Topic The common errors in usage of written cohesive devices among secondary school Malaysian learners of English of intermediate proficiency.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT
Usage-Based Phonology Anna Nordenskjöld Bergman. Usage-Based Phonology overall approach What is the overall approach taken by this theory? summarize How.
English for EAP Practice activities Reading more efficiently Lesson 4 Different text types English for Academic Purposes Practice activities Reading more.
Using Parallel Corpora for Contrastive Studies Michael Barlow.
Corpus Linguistics Anca Dinu February, 2017.
In the Name of God.
Ma Rui Tianjin Normal University

Computational and Statistical Methods for Corpus Analysis: Overview
Introduction to Corpus Linguistics: Exploring Collocation
Hypotheses Hypothesis Testing
Using GOLD to Tracking L2 Development
15.1 The Role of Statistics in the Research Process
Presentation transcript:

In pursuit of the ‘third code’ Using the ZJU Corpus of Translational Chinese (ZCTC) in Translation Studies Richard Xiao Lianzhen He Ming Yue

07/09/2015UCCTS Hangzhou2 CBTS: A new paradigm Laviosa (1998a) –“the corpus-based approach is evolving, through theoretical elaboration and empirical realisation, into a coherent, composite and rich paradigm that addresses a variety of issues pertaining to theory, description, and the practice of translation.” Hypotheses that translation universals can be tested by corpus data (Baker 1993, 1995) Rapid development of corpus linguistics, esp. multilingual corpus research in the early 1990s Increasing interest in Descriptive Translation Studies (Toury 1995) Tymoczko (1998) –“Corpus Translation Studies is central to the way that Translation Studies as a discipline will remain vital and move forward.” Meta 43/4 (1998); Kenny (2001); Laviosa (2002); Granger et al (eds.) (2003); Olohan (2004); Mauranen et al (eds.) (2004); Kruger (ed.) (forthcoming)

07/09/2015UCCTS Hangzhou3 TU: A focus of CBTS An important area of corpus-based TS over the past decade –Baker (1993, 1996); Chesterman (2004); Kenny (1998, 1999, 2000, 2001); Laviosa (1998b); Mauranen & Kujamaki 2004); McEnery & Xiao (2002, 2007); Olohan (2004); Olohan & Baker’s (2000); Øverås (1998); Pym (2005); Xiao and Yue (2008) The Translational English Corpus (TEC) –Manual –Software

07/09/2015UCCTS Hangzhou4 Features of translated English Laviosa (1998b): Four core patterns of lexical use –A relatively low proportion of lexical words over function words –A relatively high proportion of high-frequency words over low-frequency words –A relatively great repetition of the most frequent words –Less variety in most frequently used words

07/09/2015UCCTS Hangzhou5 Features of translated English Beyond the lexical level –Simplification: simpler than native language lexically / syntactically / stylistically “the tendency to simplify the language used in translation” (Baker 1996: ) –Normalization: more “normal” than the target native language the “tendency to exaggerate features of the target language and to conform to its typical patterns” (Baker 1996: 183) –Explicitation: more frequent use of conjunctions, increased cohesion in translated text the tendency in translations to “spell things out rather than leave them implicit” (Baker 1996: 180) –Sanitization: reduced connotational meaning translated texts are “somewhat ‘sanitized’ versions of the original” (Kenny 1998: 515)

07/09/2015UCCTS Hangzhou6 TU: A target of debate Is translational language different from target native language? –Translational language is at best an unrepresentative special variant of the target language because translations cannot possibly avoid the effect of translationese e.g. Baker 1993; Gellerstam 1996; Hartmann 1985; Laviosa 1997; McEnery & Wilson 2001; McEnery & Xiao (2002, 2007); Teubert 1996

07/09/2015UCCTS Hangzhou7 TU: A target of debate Are the features uncovered on the basis of translational English generalizable to other translated languages? –Existing evidence has largely come from translational English and related European languages –If such features are to be generalized as “translation universals”, the language pairs involved must not be restricted to English and closely related languages Cheong’s (2006) study of English-Korean translation contradicts even the least controversial explicitation hypothesis –Evidence from “genetically” distinct language pairs such as English and Chinese is undoubtedly more convincing, if not indispensable

07/09/2015UCCTS Hangzhou8 The ZCTC corpus Created with the explicit aim of studying the features of translated Chinese A translational counterpart of the Lancaster Corpus of Mandarin Chinese (LCMC), a one-million-word balanced corpus of native Chinese (McEnery & Xiao 2004) – Five hundred 2,000-word text samples taken proportionally from fifteen written text categories published in China in the 1990s –

07/09/2015UCCTS Hangzhou9 LCMC / ZCTC corpus design

07/09/2015UCCTS Hangzhou10 ZCTC vs. LCMC

07/09/2015UCCTS Hangzhou11 Corpus markup and annotation CES-compliant XML –CES: Tokenization and POS tagging –ICTCLAS2008: A precision rate of 98.54% for tokenization Paragraph, sentence, word token Encoded in Unicode (UTF-8)

07/09/2015UCCTS Hangzhou12 Core patterns of lexical use Do the core patterns of lexical use Laviosa (1998b) observes in translational English also apply in translated Chinese? Same criteria and parameters as in Laviosa (1998b) –Lexical density –Frequency profiles –Mean sentence length

07/09/2015UCCTS Hangzhou13 Lexical density The Stubbs-style lexical density: the ratio between the number of lexical words (i.e. content words) and the total number of words (Stubbs 1986: 33; 1996: 172) –Measure of informational load –Adopted in Laviosa (1998b) Lexical density measure by TTR or Standardized TTR (Scott 2004) –Measure of lexical variability –Commonly used in Corpus Linguistics

07/09/2015UCCTS Hangzhou14 Stubbs-style lexical density Mean LD: LCMC (66.93%) vs. ZCTC (61.59%) – the mean difference is statistically significant (t = -4.94, p<0.001) All 15 genres have a greater lexical density in native than translated Chinese – significant for all genres barring M

07/09/2015UCCTS Hangzhou15 Standardized TTR LCMC as a whole has a slightly higher STTR than ZCTC (46.58 vs ) – not significant The differences in most genres are marginal Greater STTR scores can be found in both native and translated Chinese genres

07/09/2015UCCTS Hangzhou16 Lexical-function word ratio The mean ratio between lexical and function words is significantly greater in native Chinese (2.08) than translated Chinese (1.64) (t = -4.88, p<0.001) Native Chinese has a greater ratio in all genres, and the differences are statistically significant for all genres barring M (science fiction) In line with Laviosa’s (1998b) initial hypothesis that translational language has a relatively low proportion of lexical words over function words

07/09/2015UCCTS Hangzhou17 Frequency profiles Laviosa’s (1998b) ‘list head’ or ‘high frequency words’ –Wordlist items which individually account for at least 0.10% of the total tokens in a corpus The same criterion for high frequency words in the present study to ensure comparability

07/09/2015UCCTS Hangzhou18 Frequency profiles The numbers of high frequency words are very similar in the two corpora High frequency words account for a considerably greater proportion of tokens in the translational corpus High frequency words display a much greater repetition rate in translational Chinese The ratio between high- and low-frequency words is also greater in translational corpus

07/09/2015UCCTS Hangzhou19 Mean sentence length vs. simplification Conflicting observations of mean sentence length as an indicator of simplification (e.g. Laviosa 1998b vs. Malmkjaer 1997) In our corpora, native Chinese shows a slightly greater mean sentence length (t = , p = 0.17) Mean sentence length appears to be more sensitive to genres than being a reliable indicator of native versus translational language

07/09/2015UCCTS Hangzhou20 Lexical use in translational Chinese Summary –The core lexical features proposed by Laviosa (1998b) for translational English are essentially also applicable in translated Chinese –But mean sentence length is less reliable as an indicator of simplification in translational Chinese

07/09/2015UCCTS Hangzhou21 Connectives: Device for explicitation? Perhaps the most studied topic in TU research and the least controversial hypothesis Chen (2006) –Connectives are a device for explicitation in English- Chinese translation of popular science books Xiao and Yue (2008) –Connectives are significantly more frequent in translated than native Chinese fiction Question –Can we generalize this finding from specific genres to Mandarin Chinese in general?

07/09/2015UCCTS Hangzhou22 Conjunctions in ZCTC and LCMC Mean frequency of conjunctions is significantly greater in ZCTC ( instances per 10,000 tokens) than in LCMC (243.23) (LL= for 1 d.f., p<0.001) Genres of imaginative writing (K-P, R) generally demonstrate a significantly more frequent use of conjunctions in translational Chinese Of expository writing, while conjunctions are considerably more frequent in most genres in translated Chinese (e.g. A, H), there are also genres in which conjunctions are more common in native Chinese (e.g. F, J)

07/09/2015UCCTS Hangzhou23 Conjunctions of different frequency bands More types of conjunctions of high frequency bands (0.10%, 0.05%, and 0.01%) are used in translational corpus There are an equal number of conjunctions (56 types) with a proportion greater than 0.005% in translational and native corpora After this balance point, the native corpus displays a greater number of less frequent conjunctions (a usage band of 0.001% and below) The tendency to use conjunctions more frequently can be taken as a sign of explicitation

07/09/2015UCCTS Hangzhou24 Conjunctions of different styles A closer comparison of the lists of frequent conjunctions (proportion of 0.001%) in their respective corpus also sheds some new light on the simplification hypothesis –There are 91 and 99 types of frequent conjunctions in the two corpora – 86 items overlap in the two lists –Conjunctions on the ZCTC but not LCMC list are all informal, colloquial, and simple, which usually have more formal alternatives –Conjunctions on the LCMC but not ZCTC list are typically formal and archaic Evidence for the simplification hypothesis but against the normalization hypothesis

07/09/2015UCCTS Hangzhou25 Conclusions Laviosa’s (1998b) observations of the core patterns of lexical features of translational English are also applicable in translated Chinese Beyond the lexical level –Mean sentence length is sensitive to genres and may not be a reliable indicator of simplification –A comparison of frequent connectives in native and translational Chinese appears to suggest that simpler forms tend to be used in translations –In spite of some genre-based subtleties, translational Chinese uses conjunctions more frequently than native Chinese, which provides evidence in favour of the explicitation hypothesis

07/09/2015UCCTS Hangzhou26 Conclusions We believe that the newly created ZCTC will play a leading role in the study of translational Chinese by producing more empirical evidence It is our hope that the study of translational Chinese will help to address limitations of imbalance in the current state of translation universal research

Thank you!