TaLC Graz The English Italian Translational Corpus: A resource for learning about translation Federico Zanettin Università di Bologna SSLMIT – Forlì
CEXI Project Corpus English X / Cross / Translational Italian Bi-lingual Bi-lingual Parallel Parallel Bi-directional Bi-directional Translation-driven Translation-driven
Types of comparison ItalianEnglish TTs STs
Aims CEXI as a resource for Learning about (language, culture, translation) Learning to (read, write, translate) Limitations Funds (corpus size: 4M words) Copyright
Design criteria Selection features Description features
Primary selection criteria Translations / translated texts Medium: books Time: contemporary Audience: adult Prose Country of publication
Secondary selection criteria Author Translator Publisher Price and availability
Descriptive features Size vs. variety Full texts vs. samples Domain: fiction vs. non fiction
Fiction vs. non-fiction (Italy)
Corpus description Translation vs. non-translation Fiction vs. non-fiction English vs. Italian
Translations E I (Italy 76-95)I E (USA 77-96) UDC categoryTexts% % Literature/Child. Lit %50228% Art/Games/Sports7576%34319% Edu/Law/So. Sci %18711% Applied Science183516%1388% History/Geo./Biog.9198%17110% Natural & Ex. Sci6436%1116% Philosophy/Psycol.8337%533% Generalities/Info. Sci1011%20% Religion/Theology4774%26715% Total % %
Translation components, non-fiction UDC % from I.T. English (USA)Italian (Italy) Rel/Theo21% 7% Art/G/S27%11% Edu/Law/SocSci15%19% AppSci11%28% His/Geo/Bio13% Nat & EXSci 9% Phi/Psy 4%12% Gen/InfoSci 0% 1% Total100% No. of texts (provisional ) EnglishItalian
Non-translations Italian (titles)English (titles) Translations (E I) Book production Translations (I E) Book production Fiction 40%27% (Italy)28% (USA) 31% (UK) 22% (USA), 24% (UK) Non-fiction 60%73% (Italy)72% (USA) 69% (UK) 78% (USA), 76% (UK)
Non-fictional, non-translational components in the corpus vs. total book production Non- translation Italian Component Production (Italy) Non- translation English component Production (USA) Production (UK) Rel/Theo21%8% 7% 12% Art/G/S27%16%11%9%27% Edu/Law/SocS ci 15%28%19%29%20% AppSci11%16%28%23%15% His/Geo/Bio13%15%13% Nat & ExSci 9%5% 9%8%4% Phi/Psy 4%8%12%5%3% Gen/InfoSci 0%4% 1%6% Total100%
No. of textsSupp. textsTot al EnglishItalianEnglishItalian Rel/Theo83 58 Art/G/S114 7 Edu/Law/SS682 8 AppSci4117 His/Geo/Bio55 5 Nat & ExSci44 4 Phi/Psy253 5 Gen/InfoSci00 0 Total 40 52 Corpus composition (non-fiction)
Core corpus Sub-categoryNo. of text samples No. of words per sample Fiction4012,500 Non-fiction5212,500 Total for one component 92 texts1,150,000 words Total for core corpus (4 components) 368 texts4,600,000 words
Expansions Full texts Corrections to corpus composition Satellite corpora