Download presentation
Presentation is loading. Please wait.
1
Presented By: Grant Glass
Distant Reading with Hathitrust analytics.hathitrust.org Presented By: Grant Glass
2
TABLE OF CONTENTS Distant Reading Pro/Cons Word Frequency Algorithms
Named Entity Recognizer Topic Models Reflection Activity
3
Distant Reading Issues -Positive How can computers help us understand traditional reading processes in new ways? How can we find new ways of reading through technology? How can we use computers to understand complicated categories like emotions and themes?
4
Distant Reading Issues -Negative How does computer-assisted interpretation undermine the very point of reading? Do these techniques show us anything new, or are they all fancy ways to describe what we already know? How does reading with technology exacerbate racial, social, and economic inequalities?
5
Named Entity Recognizer
Algorithms Making a Collection essentially sets what corpus of material you want to use in one place. It allows you to constrain the analysis by the algorithms through only using particular texts. Topic Models Named Entity Recognizer Word Frequency
6
What words are closely associated with one another in the corpus?
Algorithm Questions Word Frequency Named Entity What words were most commonly occurring in the corpus? If there are titles that include Queen Anne, what were the most frequently occurring terms? What people or places exist in the corpus? Topic Models What words are closely associated with one another in the corpus?
7
queensofantiquity.web.unc.edu
8
What is a Topic Model? Topic modeling can be described as a method for finding a group of words (i.e topic) from a collection of documents that best represents the information in the collection. It can also be thought of as a form of text mining – a way to obtain recurring patterns of words in textual material.
9
What is it used for? Topic modelling provides us with methods to organize, understand and summarize large collections of textual information. We can discover hidden topical patterns that are present across the collection. We can annotate documents according to these topics and use these annotations to organize, search and summarize texts. Keywords for journal searching.
11
InPho Topic Model Explorer
• Create a new topic model for each number of topics specified. The model shows 20 topics, 40 topics, 60 topics and 80 topics. • Display a visualization of how topics across models cluster together. This enables a user to see the granularity of the different models and how terms may be grouped together into "larger" topics. Creates Interactive Visualization.
13
Named Entity Recognizer
Generate a list of all of the names of people and places, as well as dates, times, percentages, and monetary terms, found in a workset. Result of job: table of the named entities found in a workset.
17
Word Count Identify the tokens (words) that occur most often in a workset and the number of times they occur. Create a tag cloud visualization of the most frequently occurring words in a workset, where the size of the word is displayed in proportion to the number of times it occurred. • removes stop words as specified by user Result of job: tag cloud showing the most frequently occurring words, and a file with a list of those words and the number of times they occur.
19
Critical Visualization
Queens of Antiquity Critical Visualization What visualization is the most useful? Why? What does the visualization help you understand about the corpus? What does it obscure? What research questions can you generate from the visualization? Generate one research question based on your observations of the visualization. Answer: How can close reading help answer one of your research questions? And what texts will you use to better contextualize the visualization? Why?
20
Next Time..we discuss your research questions and the direction you will take for your response papers.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.