Presentation is loading. Please wait.

Presentation is loading. Please wait.

21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of Computer Science Winnick Family Chair of Technology.

Similar presentations


Presentation on theme: "21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of Computer Science Winnick Family Chair of Technology."— Presentation transcript:

1 21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of Computer Science Winnick Family Chair of Technology and Entrepreneurship

2 Changing Face of Education

3 The Gauntlet

4 Science Majors The Classes have grown more challenging and demanding…

5 Science Majors The Classes have grown more challenging and demanding… Education has focused upon recognizable problems of various levels

6 Science Majors The Classes have grown more challenging and demanding… Education has focused upon recognizable problems of various levels Concrete methods for contribution at multiple levels

7 Laboratory Culture Freshman Term Paper 2009 Linguistic Patterns in Cambridge, MA – 1930 Atlas of the City – Interviews digitally recorded in 2009 Phonetic patterns analyzed with open source software – Study of linguistic change over 80 years Production of new knowledge

8 The world is already digital

9 2009 Tenure Track Job Candidates must have a strong record of scholarship and teaching and a commitment to expand the use of Greek and Latin within Tufts and beyond. Candidates should be comfortable teaching classes at any level in either language. The department seeks a candidate who can advance the study of classics within an interdisciplinary context. We especially welcome candidates who can support contributions to and original research by undergraduates as well as MA students within the field of Classics. Where are the digital humanities in this job description?

10 2009 Tenure Track Job Candidates must have a strong record of scholarship and teaching and a commitment to expand the use of Greek and Latin within Tufts and beyond. Candidates should be comfortable teaching classes at any level in either language. The department seeks a candidate who can advance the study of classics within an interdisciplinary context. We especially welcome candidates who can support contributions to and original research by undergraduates as well as MA students within the field of Classics. Where are the digital humanities in this job description?

11 Changing scales of research

12 Depth Quality Scale Breadth

13 Quality

14 Depth

15 Machine actionable interpretation

16 Scale

17 How much Latin? Classical Latin (200BCE - 500 CE) – PHI Disk c. 5 millon (through c. 200CE) – Total corpus c. 50 million (pby less) Current working collection – 9,000 books from 27,000 “Latin” books dated – 380 million words that really are Latin Total corpus of Latin through 1800 – Billions of words

18 A Classic Lexicon Project

19 1894 -- work begins

20 A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context

21 A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status

22 A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status – 20 FTE at work

23 A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status – 20 FTE at work – C. 67% of the lexicon complete

24 A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status – 20 FTE at work – C. 67% of the lexicon complete 2050? -- completion of the project

25 A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status – 20 FTE at work – C. 67% of the lexicon complete 2050? -- completion of the project What do we do with a billion words?

26 A Classic Lexicon Project 1894 -- work begins – 10 million slips with keyword in context 2010 -- current status – 20 FTE at work – C. 67% of the lexicon complete 2050? -- completion of the project What do we do with a billion words? … 10 billion words?

27 A Scalable Lexicon Project

28 Breadth

29 Time

30 Space

31 Digital Humanities balance two forces Absolute necessity to work with far more content than we can ever read and far more languages than we could ever learn.

32 Digital Humanities balance two forces Absolute necessity to work with far more content than we can ever read and far more languages than we could ever learn. The need to read slowly and to think about every word and phrase from every angle.

33 “Philological” Reading Philology is that venerable art which requires of those who honor her one thing above all: to turn aside, to take one's time, to become still and slow.... Precisely for this reason, she is more necessary today than ever, precisely on this account, she attracts and enchants us most powerfully, in an age of "work," which is to say, haste, the unseemly and sweating hurry that wants to be "done" with everything right away, even with every old and new book. She herself will not so easily be done with anything, she instructs reading well, that means, slowly, deeply, carefully, regardfully, looking forward and backward, with second thoughts, with doors left open, reading with delicate fingers and eyes.... F. Nietzsche, Morgenröte (1881)

34 One answer… The re-emergence of editing as a primary activity The definition tasks that have tangible value in the real world and that begin to be accessible at an early stage

35 One answer… The re-emergence of editing as a primary activity The definition tasks that have tangible value in the real world and that begin to be accessible at an early stage Example: the commented edition and translation as undergraduate thesis

36 Venetus A MS of Homer

37 Diplomatic Edition by a class

38 What are new elements in editing?

39 Annotations not predicated on error

40 Editions --> Visualization: Perseus Herodotus --> Hestia Proj.

41 Syntactic Analysis (Treebanks) Τρώων δ᾽ οἰώθη καὶ Ἀχαιῶν φύλοπις αἰνή (Homer, Il. 6.1)

42 Machine actionable interpretation

43

44 Iliad 6 Treebank by a class

45 Another answer…

46 Expository narrative Machine actionable annotations

47 21st Century Classics

48 Who represents Greece and Rome?

49

50

51

52 From Rabat to Kandahar

53 Who founded Kandahar and what was its original name?

54 Alexander the Great Alexandria

55 Who was the most important classicist of the 20th century?

56 Sometimes political philosophers do have an impact.. Plato’s Republic and the Guardians The Islamic Republic of Iran and the Guardianship of Islamic Jurists

57 How would you go about studying the impact of Plato in Islamic thought?

58

59

60 Classics at the U of C

61 Particular emphasis on – The School of Alexandria and its influence – The Translation Movement from Greek into Syriac and Arabic – The Relations of the Ancient Arabs and the Greco- Roman World – The Translation of Arabic into Latin and its effect upon the literary Renaissance

62 Classicists

63 Hisham and Farouk at Furman

64 Where is the English?

65

66 Goals Learner corpora -- – how much have you mastered? – How much can you transfer to new material? – Customized assessment of corpus/competence User portfolios – Aggregation of increasingly sophisticated contributions Undergraduate research projects – Automatically linked to relevant texts, sites, objects

67 Goals for 2010/2011 Canonical Text Services Protocol middleware for DuraCloud Open Greek and Latin exams for students in the English speaking world based upon student defined corpora.

68 Thank you!

69 Categories of Development Transform existing research – Integrated Papyri, Homer Multitext Enable new areas of research – More people using papyrological data “Physical” access -- done Intellectual access -- can be addressed

70 Transforming Classics Enhancing what scholars can do

71 Transforming Classics Enhancing what scholars can do Lowering barriers to entry

72 Transforming Classics Enhancing what scholars can do Lowering barriers to entry Developing a global, multilingual, multiethnic intellectual community

73 Transforming Classics Enhancing what scholars can do Lowering barriers to entry Developing a global, multilingual, multiethnic intellectual community

74 Funded Projects Greek and Latin Treebanks (Cantus) Greco-Arabic (Mellon) Mining a Million Books (NSF) Digging into Data (NEH/JISC/SSHRC) Google Digital Humanities Hellespont: Arachne and Perseus -- DFG/NEH

75 What can you do?

76 Build up a portfolio of what Greek and/or Latin you have mastered

77 What can you do? Build up a portfolio of what Greek and/or Latin you have mastered – Ask for an evaluation of your knowledge of this corpus and of Greek and Latin

78 What can you do? Build up a portfolio of what Greek and/or Latin you have mastered – Ask for an evaluation of your knowledge of this corpus and of Greek and Latin Look for ways to make a tangible contribution

79 What can you do? Build up a portfolio of what Greek and/or Latin you have mastered – Ask for an evaluation of your knowledge of this corpus and of Greek and Latin Look for ways to make a tangible contribution – Treebank -- how many sentences?

80 What can you do? Build up a portfolio of what Greek and/or Latin you have mastered – Ask for an evaluation of your knowledge of this corpus and of Greek and Latin Look for ways to make a tangible contribution – Treebank -- how many sentences? – XML tagging? GIS analysis?

81 What can you do? Think about an MA thesis that is a publishable contribution.

82 What can you do? Think about an MA thesis that is a publishable contribution. – Publish an inscription, a medieval text, a canonical work

83 What can you do? Think about an MA thesis that is a publishable contribution. – Publish an inscription, a medieval text, a canonical work – Analyze some data about a word, a text, a site, a topic

84 What can you do? Think about an MA thesis that is a publishable contribution. – Publish an inscription, a medieval text, a canonical work – Analyze some data about a word, a text, a site, a topic – Do something!

85 Good luck!

86 Treebanks and Parallel Text Analysis David Bamman The Perseus Project

87 Parallel Text Analysis Driven in large part by statistical MT for modern languages (French/English, German/English, Arabic/English etc). Parliamentary proceedings (Canadian Hansards, Europarl, UN) Legal/government docs (JRC Acquis) Historical texts have often been translated many times into several different languages. Perseus: 4.9M Greek/6.8M English; 3.4M Latin/5M English.

88 Parallel Texts The Internet Archive alone contains editions of Horace’s Odes in eight different languages Latin: carpe diem quam minimum credula postero (Horace, Ode 1.11) English: Seize the present; trust tomorrow e’en as little as you may (Conington 1872) French: Cueille le jour, et ne crois pas au lendemain (De Lisle 1887) Early Modern French: Jouissez donc en repos du jour present, & ne vous attendez point au lendemain (Dacier 1681) Italian: tu l’oggi goditi: e gli stolti al domani s’affidino (Chiarini 1916) Spanish: Coge este dia, dando muy poco credito al siguiente (Campos and Minguez 1783) Portuguese: colhe o dia, do de amanh ́a mui pouco confiando (Duriense 1807) German: Pflücke des Tag’s Blüten, und nie traue dem morgenden (Schmidt 1820)

89 Dynamic Lexicon http://nlp.perseus.tufts.edu/lexicon

90 Sense Discovery SMT based on Brown et al (1990) Different senses for a word in one language are translated by different words in another. “Bank” (English) – financial institution = French “banque” – side of a river = French “rive” (e.g., la rive gauche)

91 Progressive Alignment Sentence level: Moore’s Bilingual Sentence Aligner (Moore 2002) – aligns sentences that are 1-1 translations of each other w/ high precision (98.5% on a corpus of 10K English-Hindi sentences) Word level: MGIZA++ (Gao and Vogel 2008) – parallel version of: GIZA++ (Och and Ney 2003) - implementation of IBM Models 1-5.

92 Tufts cluster 40 nodes, each w/ two 2.83 Ghz Quad-Core Xeon processors (= 320 cores) Impact – Two 1M word alignments (English->Greek, Greek-> English) on single 2 Ghz Mac Pro: 15 hours – Two (simultaneous) 5M word alignments on computing cluster using multi-threaded version (i.e., on one 8-core node): 45 minutes.

93 Multilingual Alignment Word-level alignment of Homer’s Odyssey

94 Latin/Greek  English Senses

95 English  Greek/Latin Senses

96 Use #1: Automatic Bilingual Dictionaries http://nlp.perseus.tufts.edu/lexicon

97 97 Use #2: Interlinear translations

98 98 Use #2: Interlinear translations

99 Use #3: Bootstrapping Multilingual Digital Library http://www.perseus.tufts.edu

100 Multilingual Digital Libraries http://www.worldofdante.org

101 TEI XML Gallos ab Aquitanis Garumna flumen, a Belgis Matrona et Sequana dividit. Horum omnium fortissimi sunt … (“The Garonne river separates the Gauls from the Aquitani and the Marne and the Seine (rivers) separate them from the Belgae. The bravest of all of these are …”)

102 Solution: Markup Transfer + 1.Alignment of the source document with the target document in a cascading process: document -> sentence -> word 2.Projection of XML tags in the source document to the target document in way that exploits the linguistic similarity of the text pair.

103 Bootstrapping a Multilingual DL Expands depth of translations in a collection to expand the reach of inquiry.

104 Treebanks Annotated corpora where the syntactic role and head of each word in a sentence is made explicit.

105 Historical treebanks Most recent research and investment in treebanks has focused on modern languages, but treebanks for historical languages are now arising as well: – Middle English (Kroch and Taylor 2000) – Medieval Portuguese (Rocio et al. 2000) – Classical Chinese (Huang et al. 2002) – Old English (Taylor et al. 2003) – Early Modern English (Kroch et al. 2004) – Latin (Bamman and Crane 2006, Passarotti 2007) – Ugaritic (Zemánek 2007) – New Testament Greek, Latin, Gothic, Armenian, Church Slavonic (Haug and Jøhndal 2008)

106 Prague Arabic Dependency Treebank

107 Latin Dependency Treebank AuthorWords Caesar1,488 Cicero6,229 Sallust12,311 Vergil2,613 Jerome8,382 Ovid4,789 Petronius12,474 Propertius4,857 Total53,143

108 Ancient Greek Dependency Treebank WorkWords Aeschylus (complete)48,158 Hesiod, Works and Days6,303 Homer, Iliad38,390 Homer, Odyssey99,353 Total192,204

109 Building Treebanks Solicit annotations from two independent annotators; reconcile differences between them. Background: ranges from advanced undergraduates to PhD and professors, with the majority being students in graduate programs in Classics. Average speed: 124 words per hour. Interannotator accuracy: attachment (ATT), label (LAB), labeled attachment (LABATT): ATTLABLABATT Hesiod, W&D85.1%85.9%79.5% Homer, Iliad87.1%83.2%79.3% Homer, Odyssey87.5%85.7%80.9% Total87.4%85.3%80.6%

110 Student Contributions...

111 Syntax in the Dynamic Lexicon

112 URLs Treebank data http://nlp.perseus.tufts.edu/syntax/treebank/ Treebank annotation environment http://nlp.perseus.tufts.edu/hopper/ Translation information http://nlp.perseus.tufts.edu/hopper/sense.jsp Greek lexicon http://nlp.perseus.tufts.edu/lexicon/


Download ppt "21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of Computer Science Winnick Family Chair of Technology."

Similar presentations


Ads by Google