Digital Media Technology Week 8: XSLT 3
Seminar 11 November □ One long seminar (four hours) □ Exports from UBL catalogue □ Records contain data about dates of publication, languages, subjects of books □ Groups firstly work separately on assignments; results discussed during presentations
□ Natural language text is rife with ambiguities and irregularities □ XML bridges the gap in between linear texts and discrete data
Examples: □ Production of 20,000 exabytes on a yearly basis20,000 exabytes □ 50 million tweets sent daily 50 million tweets □ 84,000 hours of video uploaded daily on YouTube 84,000 hours of video □ David Leahy: Three V’s of big data: Volume, Velocity and VarietyThree V’s of big data Big data
The End of Theory?
□ E-science: a confluence of three developments: big data, collaboration and grid-computing (Wouters and Beaulieu, 2009)Wouters and Beaulieu □ E-research: a more inclusive term: various ways in which computer-based methodologies can transform scholarly and scientific practices Escience and e-research
Digital Humanities □ Focus on the various ways in which the computer can be used to investigate traditional questions in the humanities. □ Investigation of the phenomenon of computation from a humanities perspective
Terminology □ Digital Humanities, Humanities Computing, e-Humanities, Humanistic Informatics □ The term does not cover pure digitisation, use of word processors, weblogs, applications
Blackwell Companion to Digital Humanities
□ Alliance of DH Organisations, EADH, CenterNet Alliance of DH Organisations EADH CenterNet □ DH conferences, THATCamp □ Digital Humanities Quarterly, Journal of Digital Humanities, Literary and Linguistic Computing
Father Busa’s Index Thomasticus □ PhD research on the notion of ‘presence’ in the work of Thomas Aquinas □ Restructured and transformed version of the text □ Rationale: dealing with large quantities of text
Text Collections □ Mass digitisation projects: □ Million Book Project at Carnegie Mellon: ca. 1,500,000 titles □ Project Gutenberg: ca. 70,000 titles; □ Delpher: books; 1 million newspapers □ Google Books: ca. 15 million
Digital Scholarly Editions William Blake Archive Rosetti Archive
days: “If we could read a book on each of those days, it would take almost forty lifetimes to work through every volume in a single million book library”
Distant reading vs. Close reading
Google n-gram viewer & culturonomics
Optical Character recognition
Repetitions in individual works
Questions □ What sort of knowledge is produced? □ Is this objective knowledge? A positivist approach within the humanities? □ Can the application of an algorithm be considered a form of reading?
XML and XSLT □ XML divides a linear text into discrete units □ XSLT and Xpath can be used to analyse these units in quantitative way: e.g. counts of elements, string lengths, number of words □ Example during seminar: play by Oscar Wilde, encoded in TEI