Digital Text and Data Processing Week 8. □ Is it a valid scholarly discipline? Can these technologies genuinely enable scholars to generate valuable insights?

Slides:



Advertisements
Similar presentations
GSSR Research Methodology and Methods of Social Inquiry January 10, 2012 Research Using Available Data.
Advertisements

From Words to Meaning to Insight
Qualitative text analysis. Why do qualitative text analysis? A number of scholars say you cannot capture the meaning of a text by counting the number.
Formulas, Ranges, and Functions. Formulas n Formulas perform operations such as addition, multiplication, and comparison on worksheet values. n Formulas.
Order Structure, Correspondence, and Shape Based Categories Presented by Piotr Dollar October 24, 2002 Stefan Carlsson.
ORGANIZING AND PRESENTING QUALITATIVE DATA
33 CHAPTER BASIC APPLICATION SOFTWARE. © 2005 The McGraw-Hill Companies, Inc. All Rights Reserved. 1-2 Announcement: QUIZ#02 In Lecture Session # 9 (5.
Blueprints or conduits? Using an automated tool for text analysis Stuart G. Towns and Richard Watson Todd King Mongkut’s University of Technology Thonburi.
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Designing Research Concepts, Hypotheses, and Measurement
GROUNDED THEORY © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Introduction to Statistics & Measurement
Poetry by the Book, Poetry by Numbers. Dr Justin Tonra 20 May 2013.
Analysing and Interpreting Data Chapter 11. O'Leary, Z. (2005) RESEARCHING REAL-WORLD PROBLEMS: A Guide to Methods of Inquiry. London: Sage. Chapter 11.2.
Polya’s Four Step Problem Solving Process
Lecture 6 Ordination Ordination contains a number of techniques to classify data according to predefined standards. The simplest ordination technique is.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
CIS 101: Computer Programming and Problem Solving Lecture 2 Usman Roshan Department of Computer Science NJIT.
CMSC 250 Discrete Structures Summation: Sequences and Mathematical Induction.
Contingency tables and Correspondence analysis Contingency table Pearson’s chi-squared test for association Correspondence analysis using SVD Plots References.
1 Summary Statistics Excel Tutorial Using Excel to calculate descriptive statistics Prepared for SSAC by *David McAvity – The Evergreen State College*
Cluster Validation.
Chapter 9 Creating Graphs in Illustrator. Objectives Create a graph Edit a graph using the Graph Data window Use the Group Selection tool Use the Graph.
Basic Concept of Data Coding Codes, Variables, and File Structures.
1 1.1 © 2012 Pearson Education, Inc. Linear Equations in Linear Algebra SYSTEMS OF LINEAR EQUATIONS.
Digital Text and Data Processing Week 7. □ POS: total counts: normalise by token count □ Unicode support □ Synchronic and diachronic variation (dialects.
RSBM Business School Research in the real world: the users dilemma Dr Gill Green.
Digital Text and Data Processing Introduction to R.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 8: Quantitative.
Statistics Chapter 9. Statistics Statistics, the collection, tabulation, analysis, interpretation, and presentation of numerical data, provide a viable.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Data Handbook Chapter 4 & 5. Data A series of readings that represents a natural population parameter A series of readings that represents a natural population.
Qualitative and Quantitative Research Quantitative Deductive: transforms general theory into hypothesis suitable for testing Deductive: transforms general.
Using Advanced Formatting and Analysis Tools. 2 Working with Grouped Worksheets: Grouping Worksheets  Data is entered simultaneously on all worksheets.
Chapter 9 Creating and Designing Graphs. Creating a Graph A graph is a diagram of data that shows relationship among a set of numbers. Data can be represented.
Numeric Processing Chapter 6, Exploring the Digital Domain.
An unknown author once said, “In literature, evil often triumphs but never conquers.” In other words, bad things may win at first, but never overpowers.
SINGULAR VALUE DECOMPOSITION (SVD)
1 CS 106 Computing Fundamentals II Chapter 84 “Array Formulae” Herbert G. Mayer, PSU CS status 6/14/2013 Initial content copied verbatim from CS 106 material.
Qualitative and Quantitative Research Methods
1 Arrays of Arrays An array can represent a collection of any type of object - including other arrays! The world is filled with examples Monthly magazine:
1 An Introduction to R © 2009 Dan Nettleton. 2 Preliminaries Throughout these slides, red text indicates text that is typed at the R prompt or text that.
Mixed-methods data analysis September 2011 Richard Watson Todd
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Student Grades Application Introducing Two-Dimensional Arrays and RadioButton.
CS 450: COMPUTER GRAPHICS TRANSFORMATIONS SPRING 2015 DR. MICHAEL J. REALE.
Experimental Psychology PSY 433 Chapter 5 Research Reports.
 The term “spreadsheet” covers a wide variety of elements useful for quantitative analysis of all kinds. Essentially, a spreadsheet is a simple tool.
= the matrix for T relative to the standard basis is a basis for R 2. B is the matrix for T relative to To find B, complete:
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Today we will discuss on - Scientific Method Scientific method is the systematic study through prearranged steps that ensures utmost objectivity and.
Fig. 4.1, p. 75: Lia Litosseliti, ed. Research Methods in Linguistics Continuum.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 3 Designing Research Concepts, Hypotheses, and Measurement.
13.4 Product of Two Matrices
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Digital Text and Data Processing
Digital Text and Data Processing
Boston Tutoring Services: The Redesigned SAT
Notes 13-1 Basic Statistics
CSE 4705 Artificial Intelligence
Two Way Frequency Tables
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
MG3117 Issues and Controversies in Accounting
The Scientific Method.
ORGANIZING AND PRESENTING QUALITATIVE DATA
Multivariate Genetic Analysis: Introduction
Data Analysis, Interpretation, and Presentation
Presentation transcript:

Digital Text and Data Processing Week 8

□ Is it a valid scholarly discipline? Can these technologies genuinely enable scholars to generate valuable insights? □ Adam Kirsch, “Technology Is Taking Over English Departments” □ Stephen Marche, “Literature is not Data: Against Digital Humanities” □ Melissa Dinsman, “The Digital in the Humanities” Themes in the debate about DH

□ From deduction to induction (or abduction) □ Stanley Fish, "Mind Your P's and B's: The Digital Humanities and Interpretation“ □ Chris Anderson, “The End of Theory”The End of Theory □ DH as a complementary tool set; different levels of analysis □ Martin Mueller, "Stanley Fish and The Digital Humanities” □ Matthew Jockers and Julia Flanders, "A Matter of Scale"

□ Pushing the interpretative paradigm □ Jerome McGann and Lisa Samuels, “Deformance and Interpretation”Deformance and Interpretation □ Stephen Ramsay, "Algorithmic Criticism" □ Alan Liu, "The State of the Digital Humanities” □ Stephen Ramsay, "The Hermeneutics of Screwing Around

□ Practical work: recognition and theoretical aspects □ Stephen Ramsay and Geoffrey Rockwell, "Developing Things” □ Richard Grusin, “The Dark Side of Digital Humanities” □ Changing nature of “evidence” □ John Burrows, "Never Say Always Again: Reflections on the Numbers Game” □ Stephen Ramsay, "Algorithmic Criticism"

□ Vocabulary diversity (type-token ratios) □ Grammatical categories (POS categories) □ Average number of words per sentence □ Word frequency: lists or PCA diagrams □ Distinctive words: td-idf □ KWIC lists and Collocation Quantitative analyses

□ Eight Short stories by Rudyard Kipling ( ) from the collection In Black and White (1888) □ Four stories have an Indian narrator and four stories have a British narrator Case study

□ Program analyseTexts.pl creates data about tokens, types and POS categories □ To analyse differences and similarities, it can be useful to add a facet variable, using, for example, the ifelse function: d$narrator <- ifelse( rownames(d) == "AtHowliThana" | rownames(d) == "DrayWaraYowDee" | rownames(d) == "Gemini"| rownames(d) =="InFloodTime", "Indian", "British" )

□ Two-dimensional hashes have two indexes. □ The first of these represents the row, and the second indicates the column. □ Example: $tdm{ $text }{ $word }++ ;

Number of tokens

Type-token ratios

tiff("image.tif", compression = "lzw") print(p) dev.off() Saving images

Type-token ratios and average number of words

□ Sum of the values in different columns: d$adjectives = d$JJ + d$JJR + d$JJS

□ Graphs can be combined using the ”gridExtra” package. □ Example: p1 <- ggplot( d, aes( x = rownames(d), y = verbs / tokens, fill = narrator ) ) + geom_bar( stat = "identity") p2 <- ggplot( d, aes( x = rownames(d), y = adj / tokens, fill = narrator ) ) + geom_bar( stat = "identity" ) grid.arrange(p1, p2, p3, p4, ncol=1 )

Occurrences of POS categories

□ Program tdm.pl creates data about the 40 most common words in a corpus □ It is also possible to supply words yourself in array □ It also creates a term-document matrix of the 40 most distinctive words on the basis of the td-idf formula.

Subsetting a dataframe d[ condition for rows, condition for columns ] Example: t <- read.csv("data.csv") i <-t[ rownames(t) == "AtHowliThana" | rownames(t) == "DrayWaraYowDee" | rownames(t) == "Gemini"| rownames(t) == "InFloodTime", ]

colSums() function Example: i <- as.data.frame ( t( colSums(i) ) ) □ Adds all the values of all the columns □ All values must be numeric □ Creates a list, in which all the calculated values ae on separate rows □ The function t() can be used to transpose a dataframe (convert rows into columns or vice versa)

Principal Component Analysis □ Performed on a ”term-document matrix” □ Frequency counts ought to be normalised: tdm <- read.csv("tdm.csv", head=TRUE ) md <- read.csv("data.csv", head=TRUE ) tdm <- tdm / md$tokens □ The analysis creates new variables which account for most of the variability in the data set

Principal Component Analysis

pca <- prcomp( tdm, center = TRUE, scale. = TRUE) summary(pca)

loadings <- p$rotation

p$rotation

Collocation Indian British