Download presentation
Presentation is loading. Please wait.
Published byShannon Andrews Modified over 6 years ago
1
Introduction to Computational Methods for Classical Philology
David Bamman The Perseus Project, Tufts University
3
Homer Multitext 39-megapixel scans of the 10th-century Marcianus Graecus Z. 454 (= 822) manuscript of the Iliad. Publicly released under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License by the Biblioteca Nazionale Marciana and the Center for Hellenic Studies
4
Physical Access Perseus Digital Library (http://www.perseus.tufts.edu)
Latin Library ( LASLA ( Index Thomisticus ( Documenta Catholica Omnia ( TLG ( Brepols corpora [BTL etc.] ( Google Books ( Internet Archive (
5
Perseus Digital Library
6
“Open” Access: XML
7
Philologic (Chicago)
8
Archimedes (Harvard)
9
Diogenes (Durham)
10
Hestia (Open University)
11
Open Source Perseus http://www.perseus.tufts.edu/hopper/opensource
4.5 million words of Classical Latin 4.9 million words of Ancient Greek TEI-Compliant XML
12
Internet Archive www.archive.org
27,000+ works in Latin; 1 billion words.
13
Intellectual Access Large-scale linguistic analysis
Tracking language change in 2000 years of Latin Downstream computational tasks Automatically creating dynamic bilingual dictionaries Discovering textual allusions
14
Tracking Language Change
Lexical change (new vocabulary, shift in the meanings of words) Syntactic change (including the influence of the author’s first on the Latin syntax) Topical change (the rise of new genres) Identifying the flow of information. E.g., Cicero + Augustine influencing Petrarch; Petrarch influencing Leonardo Bruni.
15
6,385 Latin works in the Internet Archive, charted by date of publication.
16
6,385 Latin works in the Internet Archive, charted by date of composition.
17
“America” (1,006)
18
“de” (2,955,462) Now an interesting pattern emerges when we start looking at linguistic features over these two thousand years. This here charts the changing frequency of the preposition “de” from the Classical Latin period up to the 19th century. You can see a very visible rise in its use – this is evidence that Latin authors are using “de” much more frequently as time goes on. But it’s not just “de.”
19
“ad” (3,655,191)
20
“in” (8,126,487)
21
“et” (9,317,773)
22
Vocabulary density in Latin authors from 200 BCE to 1900 CE (Type-Token Ratio)
23
Intellectual Access Large-scale linguistic analysis
Tracking language change in 2000 years of Latin Computational tasks to extract information from texts Automatically creating dynamic bilingual dictionaries Discovering textual allusions
24
Use #1: Automatically Building Bilingual Dictionaries
Based on parallel text analysis: aligning source texts (here, in Greek and Latin) to translations (English, Spanish, etc.) Driven mainly by statistical machine translation for modern languages.
25
Parallel Text Data The Internet Archive alone contains editions of Horace’s Odes in eight different languages. Latin: carpe diem quam minimum credula postero (Horace, Ode 1.11) English: Seize the present; trust tomorrow e’en as little as you may (Conington 1872) French: Cueille le jour, et ne crois pas au lendemain (De Lisle 1887) Early Modern French: Jouissez donc en repos du jour present, & ne vous attendez point au lendemain (Dacier 1681) Italian: tu l’oggi goditi: e gli stolti al domani s’affidino (Chiarini 1916) Spanish: Coge este dia, dando muy poco credito al siguiente (Campos and Minguez 1783) Portuguese: colhe o dia, do de amanh ́a mui pouco confiando (Duriense 1807) German: Pflücke des Tag’s Blüten, und nie traue dem morgenden (Schmidt 1820)
26
Sense Discovery SMT based on Brown et al (1990)
Different senses for a word in one language are translated by different words in another. “Bank” (English) financial institution = French “banque” side of a river = French “rive” (e.g., la rive gauche)
27
Progressive Alignment
Sentence level: Moore’s Bilingual Sentence Aligner (Moore 2002) aligns sentences that are 1-1 translations of each other w/ high precision (98.5% on a corpus of 10K English-Hindi sentences) Word level: MGIZA++ (Gao and Vogel 2008) parallel version of: GIZA++ (Och and Ney 2003) - implementation of IBM Models 1-5.
28
Multilingual Alignment
Word-level alignment of Homer’s Odyssey
29
Interlinear translations
30
Interlinear translations
31
Latin/Greek English Senses
32
English Greek/Latin Senses
33
Automatic Bilingual Dictionaries
34
Use #2: Allusion detection
Given a large collection of texts, we can apply computational techniques to look at all pairs of sentences in a collection and determine which are most similar (however we define similarity). --- “Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation ...” (Martin Luther King, Jr. 1963). “Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal” (Abraham Lincoln, 1863).
35
Classical allusion Arma virumque cano (Vergil, Aeneid 1.1)
(Arms and the man I sing) μῆνιν ἄειδε θεὰ (Homer, Iliad 1.1) (Rage sing, goddess) ἄνδρα μοι ἔννεπε, μοῦσα (Homer, Odyssey 1.1) (Man me tell, Muse) Of man’s first disobedience, and the fruit Of that forbidden tree, whose mortal taste Brought death into the world, and all our woe, With loss of Eden, till one greater Man Restore us, and regain the blissful seat, Sing, heavenly Muse (Milton, Paradise Lost 1.1-6)
36
Allusion in Latin poetry
Arma virumque cano … (Vergil, Aen. 1.1) (“I sing of arms and man”) Arma gravi numero violentaque bella parabam Edere … (Ovid, Amores 1.1-2). (“I was planning to write about arms and violent wars in a heavy meter”) First, we need to identify the variables to look for: what defines similarity?
37
#1: Identical words Arma gravi numero violentaque bella parabam
Edere … (Ovid, Amores 1.1-2). (“I was planning to write about arms and violent wars in a heavy meter”) Arma virumque cano … (Vergil, Aen. 1.1) (“I sing of arms and man”)
38
#2: Word order Arma gravi numero violentaque bella parabam
Edere … (Ovid, Amores 1.1-2). (“I was planning to write about arms and violent wars in a heavy meter”) Arma virumque cano … (Vergil, Aen. 1.1) (“I sing of arms and man”)
39
#3: Syntax Arma -que bella edere (Ovid) Arma virumque cano (Vergil)
40
#4: Meter/phonetic similarity
Ārmă grăvī nŭmĕrō || … Ārmă vĭrūmqŭe cănō || …
41
#5: Semantic similarity
Arma gravi numero violentaque bella parabam Edere … (Ovid, Amores 1.1-2). (“I was planning to write about arms and violent wars in a heavy meter”) Arma virumque cano … (Vergil, Aen. 1.1) (“I sing of arms and man”) Both are about war (violenta bella) and the instruments of war (arma).
42
Translate traditional variables into computational terms
Identical words = token similarity Word order = ngram similarity Syntax = dependency tree similarity
43
Allusion Discovery Test corpus of Latin poets from the Perseus digital library. Data syntactically parsed using McDonald et al’s MSTParser (2005), trained on data from the Latin Dependency Treebank. Author Words Sentences Ovid 141,091 10,459 Vergil 97,495 6,553 Horace 35,136 2,345 Catullus 14,793 903 Propertius 4,867 366 293,382 20,626
44
Discovery nulli illum iuvenes, nullae tetigere puellae (Ov., Met ) “No youths, no girls touched him.” idem cum tenui carptus defloruit ungui nulli illum pueri, nullae optavere puellae (Cat., Carm. 62) “This same one withered when plucked by a slender nail; no boys, no girls hope for it.”
45
Discovery Variable TF/IDF nullae:puellae:ATR 9.24 nullae:puellae
nulli/illum p:SBJ_EXD_OBJ_CO:u:COORD:v ,/nullae 8.84 nullus1:puella1 8.55 ... nulli 6.30 puellae 5.55
46
Arma gravi numero ... Arma gravi numero violentaque bella parabam Edere ... (Ov., Amores 1.1) 1. Arma procul currusque virum miratur inanes (.059) (Verg., Aen ) - “At a distance he marvels at the arms and the shadowy chariots of men” 2. Quid tibi de turba narrem numeroque virorum (.042) (Ov., Ep ) - “What could I tell you of the crowd and the number of men?” 11. Arma virumque cano, Troiae qui primus ab oris Italiam, fato profugus, Laviniaque venit litora, multum ille et terris iactatus et alto vi superum saevae memorem Iunonis ob iram (.025) (Aen. 1.1) - “I sing of arms and the man ...
47
Summary: elements of computational philology
48
Tomorrow II. Linguistic Annotation of Classical Texts
how traditional (non-computational) scholars in Classical Studies can get involved in digital philological projects.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.