Download presentation
Presentation is loading. Please wait.
Published byEmil Nicholson Modified over 9 years ago
1
21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of Computer Science Winnick Family Chair of Technology and Entrepreneurship
2
Changing Face of Education
3
The Gauntlet
4
Science Majors The Classes have grown more challenging and demanding…
5
Science Majors The Classes have grown more challenging and demanding… Education has focused upon recognizable problems of various levels
6
Science Majors The Classes have grown more challenging and demanding… Education has focused upon recognizable problems of various levels Concrete methods for contribution at multiple levels
7
Laboratory Culture Freshman Term Paper 2009 Linguistic Patterns in Cambridge, MA – 1930 Atlas of the City – Interviews digitally recorded in 2009 Phonetic patterns analyzed with open source software – Study of linguistic change over 80 years Production of new knowledge
8
The world is already digital
9
2009 Tenure Track Job Candidates must have a strong record of scholarship and teaching and a commitment to expand the use of Greek and Latin within Tufts and beyond. Candidates should be comfortable teaching classes at any level in either language. The department seeks a candidate who can advance the study of classics within an interdisciplinary context. We especially welcome candidates who can support contributions to and original research by undergraduates as well as MA students within the field of Classics. Where are the digital humanities in this job description?
10
2009 Tenure Track Job Candidates must have a strong record of scholarship and teaching and a commitment to expand the use of Greek and Latin within Tufts and beyond. Candidates should be comfortable teaching classes at any level in either language. The department seeks a candidate who can advance the study of classics within an interdisciplinary context. We especially welcome candidates who can support contributions to and original research by undergraduates as well as MA students within the field of Classics. Where are the digital humanities in this job description?
11
Changing scales of research
12
Depth Quality Scale Breadth
13
Quality
14
Depth
15
Machine actionable interpretation
16
Scale
17
How much Latin? Classical Latin (200BCE - 500 CE) – PHI Disk c. 5 millon (through c. 200CE) – Total corpus c. 50 million (pby less) Current working collection – 9,000 books from 27,000 “Latin” books dated – 380 million words that really are Latin Total corpus of Latin through 1800 – Billions of words
18
A Classic Lexicon Project
19
1894 -- work begins
20
A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context
21
A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status
22
A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status – 20 FTE at work
23
A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status – 20 FTE at work – C. 67% of the lexicon complete
24
A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status – 20 FTE at work – C. 67% of the lexicon complete 2050? -- completion of the project
25
A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status – 20 FTE at work – C. 67% of the lexicon complete 2050? -- completion of the project What do we do with a billion words?
26
A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status – 20 FTE at work – C. 67% of the lexicon complete 2050? -- completion of the project What do we do with a billion words? … 10 billion words?
27
A Scalable Lexicon Project
28
Breadth
29
Time
30
Space
31
Digital Humanities balance two forces Absolute necessity to work with far more content than we can ever read and far more languages than we could ever learn.
32
Digital Humanities balance two forces Absolute necessity to work with far more content than we can ever read and far more languages than we could ever learn. The need to read slowly and to think about every word and phrase from every angle.
33
“Philological” Reading Philology is that venerable art which requires of those who honor her one thing above all: to turn aside, to take one's time, to become still and slow.... Precisely for this reason, she is more necessary today than ever, precisely on this account, she attracts and enchants us most powerfully, in an age of "work," which is to say, haste, the unseemly and sweating hurry that wants to be "done" with everything right away, even with every old and new book. She herself will not so easily be done with anything, she instructs reading well, that means, slowly, deeply, carefully, regardfully, looking forward and backward, with second thoughts, with doors left open, reading with delicate fingers and eyes.... F. Nietzsche, Morgenröte (1881)
34
One answer… The re-emergence of editing as a primary activity The definition tasks that have tangible value in the real world and that begin to be accessible at an early stage
35
One answer… The re-emergence of editing as a primary activity The definition tasks that have tangible value in the real world and that begin to be accessible at an early stage Example: the commented edition and translation as undergraduate thesis
36
Venetus A MS of Homer
37
Diplomatic Edition by a class
38
What are new elements in editing?
39
Annotations not predicated on error
40
Editions --> Visualization: Perseus Herodotus --> Hestia Proj.
41
Syntactic Analysis (Treebanks) Τρώων δ᾽ οἰώθη καὶ Ἀχαιῶν φύλοπις αἰνή (Homer, Il. 6.1)
42
Machine actionable interpretation
44
Iliad 6 Treebank by a class
45
Another answer…
46
Expository narrative Machine actionable annotations
47
21st Century Classics
48
Who represents Greece and Rome?
52
From Rabat to Kandahar
53
Who founded Kandahar and what was its original name?
54
Alexander the Great Alexandria
55
Who was the most important classicist of the 20th century?
56
Sometimes political philosophers do have an impact.. Plato’s Republic and the Guardians The Islamic Republic of Iran and the Guardianship of Islamic Jurists
57
How would you go about studying the impact of Plato in Islamic thought?
60
Classics at the U of C
61
Particular emphasis on – The School of Alexandria and its influence – The Translation Movement from Greek into Syriac and Arabic – The Relations of the Ancient Arabs and the Greco- Roman World – The Translation of Arabic into Latin and its effect upon the literary Renaissance
62
Classicists
63
Hisham and Farouk at Furman
64
Where is the English?
66
Goals Learner corpora -- – how much have you mastered? – How much can you transfer to new material? – Customized assessment of corpus/competence User portfolios – Aggregation of increasingly sophisticated contributions Undergraduate research projects – Automatically linked to relevant texts, sites, objects
67
Goals for 2010/2011 Canonical Text Services Protocol middleware for DuraCloud Open Greek and Latin exams for students in the English speaking world based upon student defined corpora.
68
Thank you!
69
Categories of Development Transform existing research – Integrated Papyri, Homer Multitext Enable new areas of research – More people using papyrological data “Physical” access -- done Intellectual access -- can be addressed
70
Transforming Classics Enhancing what scholars can do
71
Transforming Classics Enhancing what scholars can do Lowering barriers to entry
72
Transforming Classics Enhancing what scholars can do Lowering barriers to entry Developing a global, multilingual, multiethnic intellectual community
73
Transforming Classics Enhancing what scholars can do Lowering barriers to entry Developing a global, multilingual, multiethnic intellectual community
74
Funded Projects Greek and Latin Treebanks (Cantus) Greco-Arabic (Mellon) Mining a Million Books (NSF) Digging into Data (NEH/JISC/SSHRC) Google Digital Humanities Hellespont: Arachne and Perseus -- DFG/NEH
75
What can you do?
76
Build up a portfolio of what Greek and/or Latin you have mastered
77
What can you do? Build up a portfolio of what Greek and/or Latin you have mastered – Ask for an evaluation of your knowledge of this corpus and of Greek and Latin
78
What can you do? Build up a portfolio of what Greek and/or Latin you have mastered – Ask for an evaluation of your knowledge of this corpus and of Greek and Latin Look for ways to make a tangible contribution
79
What can you do? Build up a portfolio of what Greek and/or Latin you have mastered – Ask for an evaluation of your knowledge of this corpus and of Greek and Latin Look for ways to make a tangible contribution – Treebank -- how many sentences?
80
What can you do? Build up a portfolio of what Greek and/or Latin you have mastered – Ask for an evaluation of your knowledge of this corpus and of Greek and Latin Look for ways to make a tangible contribution – Treebank -- how many sentences? – XML tagging? GIS analysis?
81
What can you do? Think about an MA thesis that is a publishable contribution.
82
What can you do? Think about an MA thesis that is a publishable contribution. – Publish an inscription, a medieval text, a canonical work
83
What can you do? Think about an MA thesis that is a publishable contribution. – Publish an inscription, a medieval text, a canonical work – Analyze some data about a word, a text, a site, a topic
84
What can you do? Think about an MA thesis that is a publishable contribution. – Publish an inscription, a medieval text, a canonical work – Analyze some data about a word, a text, a site, a topic – Do something!
85
Good luck!
86
Treebanks and Parallel Text Analysis David Bamman The Perseus Project
87
Parallel Text Analysis Driven in large part by statistical MT for modern languages (French/English, German/English, Arabic/English etc). Parliamentary proceedings (Canadian Hansards, Europarl, UN) Legal/government docs (JRC Acquis) Historical texts have often been translated many times into several different languages. Perseus: 4.9M Greek/6.8M English; 3.4M Latin/5M English.
88
Parallel Texts The Internet Archive alone contains editions of Horace’s Odes in eight different languages Latin: carpe diem quam minimum credula postero (Horace, Ode 1.11) English: Seize the present; trust tomorrow e’en as little as you may (Conington 1872) French: Cueille le jour, et ne crois pas au lendemain (De Lisle 1887) Early Modern French: Jouissez donc en repos du jour present, & ne vous attendez point au lendemain (Dacier 1681) Italian: tu l’oggi goditi: e gli stolti al domani s’affidino (Chiarini 1916) Spanish: Coge este dia, dando muy poco credito al siguiente (Campos and Minguez 1783) Portuguese: colhe o dia, do de amanh ́a mui pouco confiando (Duriense 1807) German: Pflücke des Tag’s Blüten, und nie traue dem morgenden (Schmidt 1820)
89
Dynamic Lexicon http://nlp.perseus.tufts.edu/lexicon
90
Sense Discovery SMT based on Brown et al (1990) Different senses for a word in one language are translated by different words in another. “Bank” (English) – financial institution = French “banque” – side of a river = French “rive” (e.g., la rive gauche)
91
Progressive Alignment Sentence level: Moore’s Bilingual Sentence Aligner (Moore 2002) – aligns sentences that are 1-1 translations of each other w/ high precision (98.5% on a corpus of 10K English-Hindi sentences) Word level: MGIZA++ (Gao and Vogel 2008) – parallel version of: GIZA++ (Och and Ney 2003) - implementation of IBM Models 1-5.
92
Tufts cluster 40 nodes, each w/ two 2.83 Ghz Quad-Core Xeon processors (= 320 cores) Impact – Two 1M word alignments (English->Greek, Greek-> English) on single 2 Ghz Mac Pro: 15 hours – Two (simultaneous) 5M word alignments on computing cluster using multi-threaded version (i.e., on one 8-core node): 45 minutes.
93
Multilingual Alignment Word-level alignment of Homer’s Odyssey
94
Latin/Greek English Senses
95
English Greek/Latin Senses
96
Use #1: Automatic Bilingual Dictionaries http://nlp.perseus.tufts.edu/lexicon
97
97 Use #2: Interlinear translations
98
98 Use #2: Interlinear translations
99
Use #3: Bootstrapping Multilingual Digital Library http://www.perseus.tufts.edu
100
Multilingual Digital Libraries http://www.worldofdante.org
101
TEI XML Gallos ab Aquitanis Garumna flumen, a Belgis Matrona et Sequana dividit. Horum omnium fortissimi sunt … (“The Garonne river separates the Gauls from the Aquitani and the Marne and the Seine (rivers) separate them from the Belgae. The bravest of all of these are …”)
102
Solution: Markup Transfer + 1.Alignment of the source document with the target document in a cascading process: document -> sentence -> word 2.Projection of XML tags in the source document to the target document in way that exploits the linguistic similarity of the text pair.
103
Bootstrapping a Multilingual DL Expands depth of translations in a collection to expand the reach of inquiry.
104
Treebanks Annotated corpora where the syntactic role and head of each word in a sentence is made explicit.
105
Historical treebanks Most recent research and investment in treebanks has focused on modern languages, but treebanks for historical languages are now arising as well: – Middle English (Kroch and Taylor 2000) – Medieval Portuguese (Rocio et al. 2000) – Classical Chinese (Huang et al. 2002) – Old English (Taylor et al. 2003) – Early Modern English (Kroch et al. 2004) – Latin (Bamman and Crane 2006, Passarotti 2007) – Ugaritic (Zemánek 2007) – New Testament Greek, Latin, Gothic, Armenian, Church Slavonic (Haug and Jøhndal 2008)
106
Prague Arabic Dependency Treebank
107
Latin Dependency Treebank AuthorWords Caesar1,488 Cicero6,229 Sallust12,311 Vergil2,613 Jerome8,382 Ovid4,789 Petronius12,474 Propertius4,857 Total53,143
108
Ancient Greek Dependency Treebank WorkWords Aeschylus (complete)48,158 Hesiod, Works and Days6,303 Homer, Iliad38,390 Homer, Odyssey99,353 Total192,204
109
Building Treebanks Solicit annotations from two independent annotators; reconcile differences between them. Background: ranges from advanced undergraduates to PhD and professors, with the majority being students in graduate programs in Classics. Average speed: 124 words per hour. Interannotator accuracy: attachment (ATT), label (LAB), labeled attachment (LABATT): ATTLABLABATT Hesiod, W&D85.1%85.9%79.5% Homer, Iliad87.1%83.2%79.3% Homer, Odyssey87.5%85.7%80.9% Total87.4%85.3%80.6%
110
Student Contributions...
111
Syntax in the Dynamic Lexicon
112
URLs Treebank data http://nlp.perseus.tufts.edu/syntax/treebank/ Treebank annotation environment http://nlp.perseus.tufts.edu/hopper/ Translation information http://nlp.perseus.tufts.edu/hopper/sense.jsp Greek lexicon http://nlp.perseus.tufts.edu/lexicon/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.