Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Text and Data Processing Week 8. □ Is it a valid scholarly discipline? Can these technologies genuinely enable scholars to generate valuable insights?

Similar presentations


Presentation on theme: "Digital Text and Data Processing Week 8. □ Is it a valid scholarly discipline? Can these technologies genuinely enable scholars to generate valuable insights?"— Presentation transcript:

1 Digital Text and Data Processing Week 8

2 □ Is it a valid scholarly discipline? Can these technologies genuinely enable scholars to generate valuable insights? □ Adam Kirsch, “Technology Is Taking Over English Departments” □ Stephen Marche, “Literature is not Data: Against Digital Humanities” □ Melissa Dinsman, “The Digital in the Humanities” Themes in the debate about DH

3 □ From deduction to induction (or abduction) □ Stanley Fish, "Mind Your P's and B's: The Digital Humanities and Interpretation“ □ Chris Anderson, “The End of Theory”The End of Theory □ DH as a complementary tool set; different levels of analysis □ Martin Mueller, "Stanley Fish and The Digital Humanities” □ Matthew Jockers and Julia Flanders, "A Matter of Scale"

4 □ Pushing the interpretative paradigm □ Jerome McGann and Lisa Samuels, “Deformance and Interpretation”Deformance and Interpretation □ Stephen Ramsay, "Algorithmic Criticism" □ Alan Liu, "The State of the Digital Humanities” □ Stephen Ramsay, "The Hermeneutics of Screwing Around

5 □ Practical work: recognition and theoretical aspects □ Stephen Ramsay and Geoffrey Rockwell, "Developing Things” □ Richard Grusin, “The Dark Side of Digital Humanities” □ Changing nature of “evidence” □ John Burrows, "Never Say Always Again: Reflections on the Numbers Game” □ Stephen Ramsay, "Algorithmic Criticism"

6 □ Vocabulary diversity (type-token ratios) □ Grammatical categories (POS categories) □ Average number of words per sentence □ Word frequency: lists or PCA diagrams □ Distinctive words: td-idf □ KWIC lists and Collocation Quantitative analyses

7 □ Eight Short stories by Rudyard Kipling (1865-1936) from the collection In Black and White (1888) □ Four stories have an Indian narrator and four stories have a British narrator Case study

8 □ Program analyseTexts.pl creates data about tokens, types and POS categories □ To analyse differences and similarities, it can be useful to add a facet variable, using, for example, the ifelse function: d$narrator <- ifelse( rownames(d) == "AtHowliThana" | rownames(d) == "DrayWaraYowDee" | rownames(d) == "Gemini"| rownames(d) =="InFloodTime", "Indian", "British" )

9 □ Two-dimensional hashes have two indexes. □ The first of these represents the row, and the second indicates the column. □ Example: $tdm{ $text }{ $word }++ ;

10 Number of tokens

11 Type-token ratios

12 tiff("image.tif", compression = "lzw") print(p) dev.off() Saving images

13 Type-token ratios and average number of words

14 □ Sum of the values in different columns: d$adjectives = d$JJ + d$JJR + d$JJS

15 □ Graphs can be combined using the ”gridExtra” package. □ Example: p1 <- ggplot( d, aes( x = rownames(d), y = verbs / tokens, fill = narrator ) ) + geom_bar( stat = "identity") p2 <- ggplot( d, aes( x = rownames(d), y = adj / tokens, fill = narrator ) ) + geom_bar( stat = "identity" ) grid.arrange(p1, p2, p3, p4, ncol=1 )

16 Occurrences of POS categories

17 □ Program tdm.pl creates data about the 40 most common words in a corpus □ It is also possible to supply words yourself in the @words array □ It also creates a term-document matrix of the 40 most distinctive words on the basis of the td-idf formula.

18 Subsetting a dataframe d[ condition for rows, condition for columns ] Example: t <- read.csv("data.csv") i <-t[ rownames(t) == "AtHowliThana" | rownames(t) == "DrayWaraYowDee" | rownames(t) == "Gemini"| rownames(t) == "InFloodTime", ]

19 colSums() function Example: i <- as.data.frame ( t( colSums(i) ) ) □ Adds all the values of all the columns □ All values must be numeric □ Creates a list, in which all the calculated values ae on separate rows □ The function t() can be used to transpose a dataframe (convert rows into columns or vice versa)

20

21 Principal Component Analysis □ Performed on a ”term-document matrix” □ Frequency counts ought to be normalised: tdm <- read.csv("tdm.csv", head=TRUE ) md <- read.csv("data.csv", head=TRUE ) tdm <- tdm / md$tokens □ The analysis creates new variables which account for most of the variability in the data set

22 Principal Component Analysis

23 pca <- prcomp( tdm, center = TRUE, scale. = TRUE) summary(pca)

24 loadings <- p$rotation

25

26 p$rotation

27 Collocation Indian British


Download ppt "Digital Text and Data Processing Week 8. □ Is it a valid scholarly discipline? Can these technologies genuinely enable scholars to generate valuable insights?"

Similar presentations


Ads by Google