Clash between tacit and intuitive knowledge of scholar and computer’s need for consistency and explicitness"> Clash between tacit and intuitive knowledge of scholar and computer’s need for consistency and explicitness">
Download presentation
Presentation is loading. Please wait.
Published byFerdinand McGee Modified over 9 years ago
1
Digital Text and Data Processing Introduction to R
2
□ Tools themselves are often based on specific assumptions / subjective decisions □ There is subjectivity in the way in which tools are used □ Reproducible results □ Rockwell & Ramsay, in “Developing Things”: A tool is a theoryDeveloping Things Objectivity of DH Research
3
Willard McCarty, Humanities Computing (Palgrave, 2005) "The point of all modelling exercises, as of scholarly research generally, is the process seen in and by means of a developing product, not the definitive achievement" (p. 22). Models, "however finely perfected, are better understood as temporary states in a process of coming to know rather than fixed structures of knowledge" (p. 27) -> Clash between tacit and intuitive knowledge of scholar and computer’s need for consistency and explicitness
4
□ Data creation □ Data analysis Two stages in text mining
5
□ Finding distinctive vocabulary □ Finding stylistic or grammatical differences and similarities □ Examining topics or themes □ Clustering texts on the basis of quantifiable aspects Types of analyses
6
opendir (DIR, $dir) or die "Can't open directory!"; while (my $file = readdir(DIR)) { if ( $file =~ /txt$/) { push ( @files, $file ) ; } Reading a directory
7
Inverse document frequency For an application, see Stephen Ramsay, Algorithmic Criticism Algorithmic Criticism
8
□ Both a programme and a programming language □ Successor of “S” □ “a free software environment for statistical computing and graphic” □ The capabilities of R can be extended via external “packages”
10
□ Any combination of alphanumerical characters, underscore and dot □ Unlike Perl, they do not begin with a $ □ First characters cannot be a number. The second characters cannot be a number if the first character is a dot Variables in R Allowed:Not allowed: data3rdDataSet my.data.4thData.set my_2ndDataSet.myCsv
11
□ A collection of indexed values □ Can be created using the c() function, or by supplying a range □ N.B. The assignment operator in R is <- □ Examples: Vectors x <- c( 4, 5, 3, 7) ; y <- 1:30 ;
12
□ A collection of vectors, all of the same length □ Each column of the table is stored in R as a vector. Data frame V1V2V3 R13,4,5 R21,21,8 R323,5,6
13
Comma Separated Values i,you,he Emma,160416,3178,1994 Persuasion,77431,1284,918 PrideAndPrejudice,121812,2068,1356 N.B. The first row has one column less
14
□ Use the read.csv function, with parameter header = TRUE □ The CSV file will be represented as a data frame □ Values on first line and first value of each subsequent line will be used as rownames and colnames Reading data data <- read.csv( "data.csv", header = TRUE) ; colnames(data)
15
□ Can be accessed using the $ operator Data frame columns data <- read.csv( "data.csv", header = TRUE) ; data$you
16
□ max(), min(), mean(), sd() Calculations y <- data$you ; max(y) ; sd(y) ;
17
□ Run the program “typeToken.pl” □ Use the file “ratio.csv” that is created by this program. □ Print a list of all the texts that have been read □ Calculate the average number of tokens □ Calculate the total number of tokens in the full corpus □ Identify the lowest number in the column “types” □ Identify the highest number in the column “ratio” Exercise
18
d <- read.csv("data.csv") ; d <- d[ 1, 2 ] ; d <- d[ 2, ] ; od <- data[ order( data$ratio ), ] Subsetting and sorting
19
□ Qualitative data (categorical) □ Nominal scale (unordered scale), e.g. eye colour, marital status □ Ordinal scale (ordered scale), e.g. educational level □ Quantitative data □ Interval (scale with no mathematical zero) □ Ratio (multipliable scale), e.g. age Quantitative and Qualitative Source: Seminar Basic Statistics, Laura Bettens
20
□ Two quantitative variables can be clarified in a variety of ways (e.g. line chart, pie chart) □ A combination of one qualitative variable and one quantitative variable is best presented using a bar chart or a dot chart Diagrams
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.