Clash between tacit and intuitive knowledge of scholar and computer’s need for consistency and explicitness"> Clash between tacit and intuitive knowledge of scholar and computer’s need for consistency and explicitness">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Text and Data Processing Introduction to R.

Similar presentations


Presentation on theme: "Digital Text and Data Processing Introduction to R."— Presentation transcript:

1 Digital Text and Data Processing Introduction to R

2 □ Tools themselves are often based on specific assumptions / subjective decisions □ There is subjectivity in the way in which tools are used □ Reproducible results □ Rockwell & Ramsay, in “Developing Things”: A tool is a theoryDeveloping Things Objectivity of DH Research

3 Willard McCarty, Humanities Computing (Palgrave, 2005) "The point of all modelling exercises, as of scholarly research generally, is the process seen in and by means of a developing product, not the definitive achievement" (p. 22). Models, "however finely perfected, are better understood as temporary states in a process of coming to know rather than fixed structures of knowledge" (p. 27) -> Clash between tacit and intuitive knowledge of scholar and computer’s need for consistency and explicitness

4 □ Data creation □ Data analysis Two stages in text mining

5 □ Finding distinctive vocabulary □ Finding stylistic or grammatical differences and similarities □ Examining topics or themes □ Clustering texts on the basis of quantifiable aspects Types of analyses

6 opendir (DIR, $dir) or die "Can't open directory!"; while (my $file = readdir(DIR)) { if ( $file =~ /txt$/) { push ( @files, $file ) ; } Reading a directory

7 Inverse document frequency For an application, see Stephen Ramsay, Algorithmic Criticism Algorithmic Criticism

8 □ Both a programme and a programming language □ Successor of “S” □ “a free software environment for statistical computing and graphic” □ The capabilities of R can be extended via external “packages”

9

10 □ Any combination of alphanumerical characters, underscore and dot □ Unlike Perl, they do not begin with a $ □ First characters cannot be a number. The second characters cannot be a number if the first character is a dot Variables in R Allowed:Not allowed: data3rdDataSet my.data.4thData.set my_2ndDataSet.myCsv

11 □ A collection of indexed values □ Can be created using the c() function, or by supplying a range □ N.B. The assignment operator in R is <- □ Examples: Vectors x <- c( 4, 5, 3, 7) ; y <- 1:30 ;

12 □ A collection of vectors, all of the same length □ Each column of the table is stored in R as a vector. Data frame V1V2V3 R13,4,5 R21,21,8 R323,5,6

13 Comma Separated Values i,you,he Emma,160416,3178,1994 Persuasion,77431,1284,918 PrideAndPrejudice,121812,2068,1356 N.B. The first row has one column less

14 □ Use the read.csv function, with parameter header = TRUE □ The CSV file will be represented as a data frame □ Values on first line and first value of each subsequent line will be used as rownames and colnames Reading data data <- read.csv( "data.csv", header = TRUE) ; colnames(data)

15 □ Can be accessed using the $ operator Data frame columns data <- read.csv( "data.csv", header = TRUE) ; data$you

16 □ max(), min(), mean(), sd() Calculations y <- data$you ; max(y) ; sd(y) ;

17 □ Run the program “typeToken.pl” □ Use the file “ratio.csv” that is created by this program. □ Print a list of all the texts that have been read □ Calculate the average number of tokens □ Calculate the total number of tokens in the full corpus □ Identify the lowest number in the column “types” □ Identify the highest number in the column “ratio” Exercise

18 d <- read.csv("data.csv") ; d <- d[ 1, 2 ] ; d <- d[ 2, ] ; od <- data[ order( data$ratio ), ] Subsetting and sorting

19 □ Qualitative data (categorical) □ Nominal scale (unordered scale), e.g. eye colour, marital status □ Ordinal scale (ordered scale), e.g. educational level □ Quantitative data □ Interval (scale with no mathematical zero) □ Ratio (multipliable scale), e.g. age Quantitative and Qualitative Source: Seminar Basic Statistics, Laura Bettens

20 □ Two quantitative variables can be clarified in a variety of ways (e.g. line chart, pie chart) □ A combination of one qualitative variable and one quantitative variable is best presented using a bar chart or a dot chart Diagrams


Download ppt "Digital Text and Data Processing Introduction to R."

Similar presentations


Ads by Google