Digital histories Workshop 1: introduction K. Navickas.

2 Searching for information and making notes before digitised sources Dewey catalogue Card catalogue My notes, organised by where I read the information, and their cataloguing system A bit of a gamble if I get what I want/looking for Boxes of books & archives County record office; Library; National Archives

3 Searching for information and making notes after digitisation – googlerisation? Digitised archives and newspapers from all over the world Download them to my own computer Write notes on computer – or annotate files Still a gamble to find what I want… Catalogue system – online -could be determined by original order of the repository -could be completely new system -no system? – e.g. flickr collections – crowdsource tagging What archives available dependent on what is digitised – dependent on funding, conservation; volunteers Pay to access?



6 Metadata old style: Leonard Bloomfield, Language (New York: Holt, Rhinehart & Wilson, 1933) Document typeBook Last name of authorBloomfield First name of authorLeonard Year of publication1933 TitleLanguage CityNew York PublisherHolt, Rhinehart & Wilson

7 Metadata now: View page source and see the html or the xml schema

8 FAQ for the Burney collection of 17 th and 18 th century newspapers

9 This Day wvii) he offered to publick TnfivtiG, a: a commudiot.s,]oom, oppoflte the New Inn, Surry S.l of 'W;liL initer Bridge, at is. each, the Ethiopian Sav:;ge. I his aftrinihiriag Animal is of a different Species fi-om any ever feen in Etirope, and feenrs to hie the Link betwcen tile R;:;ion:l and 1,> ut, Ci-eation, as he is a ftriking Relinbl!ance of the Huma: Species, and is allQwcd to lio thle g~reateft *:urioG~y ever exhijib-J. in Lnoiantd. lligl: \Varer- tlls Ddy at l om.,. n-Isridge, i lt C illiiitcs aIter S inl thc Mood nig, andl at Minotes.:ter 5 in rtie Ahternonit. B> 111; :>ocsZ 109, 4 lIper (t. t I -6, 6?', k a ' hIdia ditto, - 4 per Ct. 17',"', 7' T South Sea b3itto, - Ind. Bctin;, Ss d IOS. Dif, Ditto Old Ann. `o § Navy and Vi&t. Pi:is.- )itto New Ann. - Long Annuities, ita I 3 :ier Ct. 13k. red. 6 oEl a 6ai c Short ditto x77S, 3 C rCt. Cf. 6xz a i Scrip, 62 Ditto 1724, - Omniniu, - Ditto 1751, A-nti. 17ZS, 13 3. AnCt. 7_ OCR text from the Burney newspaper collection – what on earth is it saying? OCR is rubbish with tables and old fonts

10 Big data? From ‘computing and history’ to data and text mining, corpus linguistics, topic modelling Are we moving from ‘close reading’ to ‘distant reading’? Methods: N-grams – finding the proportion of occurrence of a word in a corpus of texts Topic-modelling - assessing probability of occurrence of a group of words within a text Study: ‘culturomics’

11 Google n-gram viewer

12 Topic- modelling in Mining the Dispatch

