Presented By: Grant Glass

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Web Mining.
Critical Reading Strategies: Overview of Research Process
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Information Retrieval in Practice
Overview of Search Engines
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
The Literature Search and Background of the Problem.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
ITGS Databases.
Human Interaction with Data “Meaningful Interpretations” “The Power of Crowdsourcing” &
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
+ CATPAC & WordStat Anne D. Sito & Erin Sonenstein COM 633: FA 09.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Major Issues n Information is mostly online n Information is increasing available in full-text (full-content) n There is an explosion in the amount of.
Discovery and Metadata March 9, 2004 John Weatherley
Reading literacy. Definition of reading literacy: “Reading literacy is understanding, using and reflecting on written texts, in order to achieve one’s.
QUALITATIVE DATA ANALYSIS. RESEARCH STRATEGY IDENTIFICATION RESEARCH PROBLEM RESEARCH PURPOSE RESEARCH QUESTIONS ISSUES TO BE EXPLORED APPROPRIATE TECHNIQUES.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Databases and Database User ch1 Define Database? A database is a collection of related data.1 By data, we mean known facts that can be recorded and that.
Search Engine Optimization
Information Retrieval in Practice
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Queensland University of Technology
AP CSP: Cleaning Data & Creating Summary Tables
Chapter 2: The Visual Studio .NET Development Environment
DATA COLLECTION METHODS IN NURSING RESEARCH
Lesson Concept: Products, Factors, and Factor Pairs
Search Engine Architecture
Sentiment analysis algorithms and applications: A survey
Designing Your Study and Selecting a Sample
Clustering of Web pages
The Literature Search and Background of the Problem
Text Based Information Retrieval
Content-level intellectual control for digital archives
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
ANATOMY OF A DOTPLOT Typically used for a small data set, a Dotplot uses a dot for each time a value occurs. For example, there were two occasions where.
Personalized Social Image Recommendation
Measuring Sustainability Reporting using Web Scraping and Natural Language Processing Alessandra Sozzi
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Introduction to Textual Analysis
Wrestling with Reading
Database Vocabulary Terms.
What is a Database and Why Use One?
Chapter 5 Determining System Requirements
Essentials of Systems Analysis and Design Fourth Edition
Multi-Dimensional Data Visualization
The Starting Point: Asking Questions
Group Work Lesson 9.
HCC class lecture 13 comments
Document Clustering Matt Hughes.
Finding Trends with Visualizations
SDMX Information Model: An Introduction
Module 4.1: Analyzing Textual Data Using Off-the-Shelf Tools
Overview Characteristics for gathering requirements.
Text Mining & Natural Language Processing
Applied Linguistics Chapter Four: Corpus Linguistics
Research Problem: The research problem starts with clearly identifying the problem you want to study and considering what possible methods will affect.
Objective Explain concepts used to create websites.
Sentiment Analysis In Student Learning Experience By Obinna Obeleagu
Sentiment Analysis In Student Learning Experience By Obinna Obeleagu
What is qualitative research?
Introduction to JMP Text Explorer Platform
From Unstructured Text to StructureD Data
Data Analysis, Interpretation, and Presentation
Presentation transcript:

Presented By: Grant Glass Distant Reading with Hathitrust analytics.hathitrust.org Presented By: Grant Glass

TABLE OF CONTENTS Distant Reading Pro/Cons Word Frequency Algorithms Named Entity Recognizer Topic Models Reflection Activity

Distant Reading Issues -Positive How can computers help us understand traditional reading processes in new ways? How can we find new ways of reading through technology? How can we use computers to understand complicated categories like emotions and themes?

Distant Reading Issues -Negative How does computer-assisted interpretation undermine the very point of reading? Do these techniques show us anything new, or are they all fancy ways to describe what we already know? How does reading with technology exacerbate racial, social, and economic inequalities?

Named Entity Recognizer Algorithms Making a Collection essentially sets what corpus of material you want to use in one place. It allows you to constrain the analysis by the algorithms through only using particular texts. Topic Models Named Entity Recognizer Word Frequency

What words are closely associated with one another in the corpus? Algorithm Questions Word Frequency Named Entity What words were most commonly occurring in the corpus? If there are titles that include Queen Anne, what were the most frequently occurring terms? What people or places exist in the corpus? Topic Models What words are closely associated with one another in the corpus?

queensofantiquity.web.unc.edu

What is a Topic Model? Topic modeling can be described as a method for finding a group of words (i.e topic) from a collection of documents that best represents the information in the collection. It can also be thought of as a form of text mining – a way to obtain recurring patterns of words in textual material.

What is it used for? Topic modelling provides us with methods to organize, understand and summarize large collections of textual information. We can discover hidden topical patterns that are present across the collection. We can annotate documents according to these topics and use these annotations to organize, search and summarize texts. Keywords for journal searching.

InPho Topic Model Explorer • Create a new topic model for each number of topics specified. The model shows 20 topics, 40 topics, 60 topics and 80 topics. • Display a visualization of how topics across models cluster together. This enables a user to see the granularity of the different models and how terms may be grouped together into "larger" topics. Creates Interactive Visualization.

Named Entity Recognizer Generate a list of all of the names of people and places, as well as dates, times, percentages, and monetary terms, found in a workset. Result of job: table of the named entities found in a workset.

Word Count Identify the tokens (words) that occur most often in a workset and the number of times they occur. Create a tag cloud visualization of the most frequently occurring words in a workset, where the size of the word is displayed in proportion to the number of times it occurred. • removes stop words as specified by user Result of job: tag cloud showing the most frequently occurring words, and a file with a list of those words and the number of times they occur.

Critical Visualization Queens of Antiquity Critical Visualization What visualization is the most useful? Why? What does the visualization help you understand about the corpus? What does it obscure? What research questions can you generate from the visualization? Generate one research question based on your observations of the visualization. Answer: How can close reading help answer one of your research questions? And what texts will you use to better contextualize the visualization? Why?

Next Time..we discuss your research questions and the direction you will take for your response papers.