Download presentation
Presentation is loading. Please wait.
1
Digital Text and Data Processing
Week 2
2
Computation and Literary studies
Statistical analysis as a “blunt hermeneutic instrument” (Trumpener) “Does the digital component of digital humanities give us new ways to think, or only ways to illustrate what we already know?” (Kirsch) “It’s like hitting a painting with a fish – why would you?” (Kennedy)
3
Algorithmic Criticism
There is a need for an “criticism derived from algorithmic manipulation of text” (Ramsay) Digital methods ought to “assist the critic in the unfolding of interpretative possibilities” Cf. “Literary informatics” (Martin Mueller)
4
“Secondary query potential” of digital text (Mueller)
From the “conduit model” to “transformation” and to “object manipulation” (Bradley) “performative” and “deformative criticism” (McGann)
5
Moretti dismisses close reading as a “theological exercise” and as a “very solemn treatment of very few texts taken very seriously” Literary research as "a patchwork of other people’s research, without a single direct textual reading” Chronological and geographical developments in "devices, themes, tropes — or genres and systems" Literary research which uses the analogy of science A method resting “solidly on facts” Concepts and visual models from natural sciences
6
From “The Slaughterhouse of Literature”
7
Effects on the research agenda
Martin Mueller “The underlying methods (…) are probabilistic and in many ways more compatible with a spirit of tentative inquiry” “Is it an instance of the old joke about the drunk who is looking for his lost car key under a lamp post because that is where the light is?” Digital methods are concerned more with “Establishing the ‘‘fact that’’ than with explaining the ‘‘reason why’’. Shawna Ross Digital humanities needs to focus on “the conditional and the subjunctive, rather than inside absolutes and interdictions”
8
Source Criticism Which edition was digitised precisely? Does this edition have authority, or any historical importance? Which organisation has digitised the text? Does this organisation have sufficient expertise in digitisation projetcs? Which measures have been taken to avoid errors? Has the digitised text been appraised or checked? Did the digitisation process introduce changes to the text? If yes, has this editorial process been documented accurately? Which organisation has published these sources? Are you allowed to perform text mining on these sources?
9
IPR and licences Possibilities to mine recent texts depend on Intellectual Property Rights (IPR) and agreements in licences with Publishers National Library assumes that texts published before 1873 (2x70 years) are in the open domain. Texts from period in between 1873 and 1940 can be made available because of agreement with organisations such as LIRA and Pictoright
10
Study commissioned by EC led by by prof. Ian Hargreaves
The right to read does not imply the right to mine
11
The Hague Declaration “A lack of clarity around the legality of TDM is inhibiting TDM-based research in Europe” “The solutions offered by publishers are insufficient to meet the needs of researchers and are placing European researchers at a disadvantage” “The introduction of a mandatory copyright exception to allow anyone to use computers to analyse anything to which they have legal access is essential”
12
Regular expressions Components of text patterns
Character classes, e.g. \w , \d or . Quantifiers, e.g. {2,4} or ?, +, * Anchors, e.g. \b , ^ , $ Patterns need to be given in forward slashes
13
/\bthe (\w+ ){0,2}light\b/
14
Match variables Parentheses create substrings within a regular expression Perl stores the texts that is matched as variable $1 Example: $keyword = “well-known” ; if ( $keyword =~ /(\w+)-\w+/ ) { print $1 ; #This will print “well” }
15
Three types of variables
Scalars: a single value; start with $ Arrays: multiple values; start Hashes: Multiple values which can be referenced with ‘keys’; start with %
16
@potter = ("The Philosopher's Stone", "The Chamber of Secrets", "The Prisoner of Azkaban", "The Goblet of Fire", "The Order of the Phoenix", "The Half-Blood Prince", "The Deathly Hallows") ; $potter[0] # The Philosopher’s Stone $potter[4] # The Order of the Phoenix $potter[-1] # The Deathly Hallows
17
Looping through an array Looping through an array
foreach my $book ) { print $book ; }
18
A hash my %capitals = ( "Italy"=>"Rome",
Can be thought of as an array in which you specify the keys yourself my %capitals = ( "Italy"=>"Rome", "Belgium"=> "Brussels" ) print $capitals{"Italy"} ## Rome
19
keys value Belgium Brussels Italy Rome France Paris …
20
Looping through a hash foreach my $c ( keys %capitals ) {
print $c . ': ' . $capitals{$c} ; }
21
Sorting a hash foreach my $f ( sort keys %hash ) { print $f ; }
Sorting, by default, is done alphabetically, by key, in ascending order
22
Ways of sorting Numerically by key: sort { $a <=> $b}
Numerically by value: sort { $hash{$a} <=> $hash{$a} } Alphabetically by value: sort { $hash{a} cmp $hash{b} }
23
Exercises 13 and 14
24
Finding words $line = "If music be the food of love, play on" ;
@array = split( /\s/ , $line ) ; # $array[0] contains "if" # $array[4] contains "food"
25
Tokenisation @words = split( /\s+/ , $line )
foreach my $w ) { print $w ; }
26
Frequency list for Heart of Darkness produced using TaporWare
Frequency lists Frequency list for Heart of Darkness produced using TaporWare
27
Assigning / updating a value
Creating a hash my %freq ; $freq{"if"}++ ; $freq{"music"}++ ; print $freq{"if"} . "\n" ; Assigning / updating a value
28
N.B. $a = $a + 1 ; is the same as $a++ ;
29
Calculation of frequencies
my %freq ; @words = split( /\s+/ , $line ) foreach my $w ) { $freq{$w}++ ; }
30
But she returned to the writing-table, observing, as she passed her son, "Still page 322?" Freddy snorted, and turned over two leaves. For a brief space they were silent. Close by, beyond the curtains, the gentle murmur of a long conversation had never ceased.
31
Actually a “word”? foreach my $w ( @words ) { if ( $w =~ /(\w)/ ) {
$freq{ $1 }++ ; } }
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.