Download presentation
Presentation is loading. Please wait.
1
Introduction to Textual Analysis
Mikal Eckstrom and Gabi Kirilloff Digital Humanities Bootcamp 2016
2
What we are covering What is textual analysis
How it strengthens the humanities Its application in the classroom and to your research Terminology Various online methods
3
Text Analysis “It’s not that we no longer read books, but we now have new ways of studying them in their natural habitat.”-Matthew Jockers (2013) “But it must be recognized that the notion of “probability of a sentence” is an entirely useless one, under any known interpretation of this term.” Noam Chomsky (1969) Mikal
4
What is text analysis? Analyzing text(s) through computational analysis that employs new methodologies in an effort to construct new meaning to an already existing (set of) written work. Mikal
5
Text as Science We often have an hypothesis—even as close readers
We have conclusions—even our own worst paper has conclusions Now with text analysis, or data mining, we, like scientists, have data. Like scientists, digital humanists also seek to discover new evidence and meaning from texts, no matter what the scale of the corpora is. Mikal
6
Mikal
7
Terminology Sentence: unit of written language
Utterance: unit of spoken language Word Form: the inflected form as it actually appears in the corpus Lemma: an abstract form, shared by word forms having the same stem, part of speech, word sense – stands for the class of words with same stem Types: number of distinct words in a corpus (vocabulary size) Tokens: total number of words Mikal
8
What Text Analysis Enables
What you can do: Categorize and Cluster documents Compare and contrast vocabulary Examine syntactical relationships Entity Recognition This can allow you to: Examine differences based on metadata Examination of voice and style Geographic mapping and helpful visualizations Gabi
9
Clustering and Examining Similarity
Context Words High Frequency Words Punctuation Sentence Length Gabi
10
Exploring Syntactical Relationships
“He quickly ran up the old steps to the castle.” Gabi
11
Word Clouding | Text Analysis
Mikal American Indian Male Jewish Male Jewish Female
12
Data Collection Getting good data is trickier than you think
Large Corpus Metadata Clean text Where to find data Hathitrust Internet Archive Gutenberg Women Writers Project Gabi and Mikal
13
Martha Ballard’s Diary
Mikal
14
Textalyzer Mikal
15
Voyant Gabi
16
WordSeer Gabi
17
Stanford Tools NER: http://nlp.stanford.edu:8080/ner/
DParse: Gabi
18
N-Grams Mikal
19
Human Word Prediction Clearly, at least some of us have the ability to predict future words in an utterance. How? Domain knowledge: red house vs. red hat Syntactic knowledge: the…<adj|noun> Lexical knowledge: baked <steak vs. cake> Mikal
20
Useful Applications for N-Grams
Why do we want to predict a word, given some preceding words? Rank the likelihood of sequences containing various alternative hypotheses, e.g. for ASR Theatre owners say popcorn/unicorn sales have doubled... Assess the likelihood/goodness of a sentence, e.g. for text generation or machine translation The doctor recommended a cat scan. El doctor recommendó una exploración del gato. Mikal
21
Coding (and why you might want to consider it)
Custom questions may call for custom methods Understanding the options available to you can make it easier to envision new research questions R Statistical language Works with plain text and XML Very easy to create complex visualizations Python Gabi and Mikal
22
Limitations and Constraints
“Flattening” data and obscuring information Corpus selection bias Imperfect datasets Gabi
23
Summary Text analysis can allow us to derive new meaning from text
Visually understand the relationships between various texts, tokens, and data sets. N-gram probabilities can be used to estimate the likelihood Of a word occurring in a context (N-1) Of a sentence occurring at all Smoothing techniques deal with problems of unseen words in corpus
24
Resources Stanford Lit Lab Pamphlets: Ted Underwood: start-with-text-mining/ Lincoln Mullen:
25
Example Exercise Split into groups of 3 or 4 people and take 10 minutes to use Voyant to explore your text. Report to the group at least 1 interesting finding.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.