Mess with Text: textual analysis using AntConc and TagAnt

Slides:



Advertisements
Similar presentations
Reading an EBSCOhost Article
Advertisements

Getting Started with Dreamweaver DREAMWEAVER MX. Getting Started with Dreamweaver Contents –What Can Dreamweaver MX Do? –Dreamweaver Learning and Support.
Part Two: Using Xaira to explore corpora Richard Xiao
Hyper Text Markup Language.  HTML is a language for describing web pages.  HTML stands for Hyper Text Markup Language  HTML is not a programming language,
Excel Objects, User Interface, and Data Entry. ◦ Application Window  Title Bar  Menu Bar  Toolbars  Status Bar  Worksheet Window  Worksheet Input.
Sakai Overview ITS Teaching and Learning Interactive Aurora Collado January 10, 2008.
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
From Words to Meaning to Insight Julia Cretchley & Mike Neal.
ELN – Natural Language Processing Giuseppe Attardi
TERA: PAMS Reporting By Michael McGuire
OBJECTIVES  What is HTML  What tools are needed  Creating a Web drive on campus (done only once)  HTML file layout  Some HTML tags  Creating and.
Beyond the Basics Steven Butzel, Nashua Public Library , Yahoo IM: nashuaref.
Intro to PHP IST2101. Review: HTML & Tags 2IST210.
My Workspace ELearning in Sakai Randy Graff, PhD HSC Training.
Family Reunion Who are you?. Instrument Family Pre- Test  Look at each instrument  On your papers, write the name of the instruments in the left column.
英 3B 戴偲婷. WConcord is a fast and easy to use concordancer for unlimited amounts of text. It allows the user to load multiple plain text files (.txt)
Web Page Design Introduction. The ________________ is a large collection of pages stored on computers, or ______________ around the world. Hypertext ________.
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
Copy all files on CD to D drive D:\workshop. Corpus: An Internet Metaphor  Web pages + search engine  Texts + Tools.
Gerald Schmidt Learning and Teaching Solutions The Open University Producing DAISY talking books without manual intervention.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
My Google Tools by Susan W Quinn College of Education University of South Carolina - Columbia
Federalist Alexander Hamilton James Madison John Jay Federalist Papers.
Course Content Emily Dixon. Content Strategy Web > Class > Modules > Module 1 (etc) > Projects > Exercises Public > NO strategy–It’s a mess Dixonem1 >
Web Page Design XHTML Lesson 4. Adding Structure 4 A div tag –Used to divide up a web page and to add structural meaning to the page. –Will not change.
Making the website. Get your folders sorted first Create a new folder in “N” called “My hockey website” Create folders inside called “Documents”, “images”
TEI Workshop Digitization of Text 文字數位化 Reasons, Methods, Stages.
Using Corpus Linguistics Tools to Aid Research in the Social Sciences
The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
Chapter 2 (Constitution) Reading Quiz #2 : 6 points
CaRT eCapacity Initiative Ghana Productivity Apps
Sentiment analysis tools
Federalist vs. Anti-Federalist
Tutorial Reading in EBSCOhost support.ebsco.com.
AntConc is a freeware, multiplatform of application suitable for all types of users
Computational and Statistical Methods for Corpus Analysis: Overview
Federalists and Anti-Federalists
Topics in Linguistics ENG 331
Introduction to Corpus Linguistics: Dispersion/concordance plots
Introduction to Corpus Linguistics: Key Word Analysis
Corpus Linguistics I ENG 617
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Google Books From the Papers of Alexander Hamilton. Looking for Harison’s reply to Hamilton. Google Books is a great help with historical research.
Text Analytics Giuseppe Attardi Università di Pisa
Automatically Enhancing Tagging Accuracy and Readability for Common Freeware Taggers Martin Weisser Center for Linguistics & Applied Linguistics Guangdong.
Starter What things did the new Constitution have that fixed the problems created by the Articles of Confederation?
A Brief Intro to Corpus Techniques in ELT Research
Topics in Linguistics ENG 331
RefWorks Rapid Review for Student Nurses
WordNet WordNet, WSD.
Chicago Style… and no, unfortunately we don’t mean pizza.
SSUSH5 The student will explain specific events and key ideas that brought about the adoption and implementation of the United States Constitution. b.
ICEweb 2 a new way of compiling high-quality web-based components for ICE corpora Martin Weisser Center for Linguistics & Applied Linguistics, Guangdong.
American History Flash Cards.
Federalist Papers.
Science Reference Center
Federalist, Anti-Federalist and Federalist Papers
Applied Linguistics Chapter Four: Corpus Linguistics
Corpus processing tools
Federalist vs. Anti-Federalist
American History Flash Cards.
American History Flash Cards.
The American colonies declared their independence in 1776, but King of England did not want to give the colonies freedom.
Federalist vs. Anti-Federalist
What is a spreadsheet? A program that can carry out calculations
American History Flash Cards.
American History Flash Cards.
Tools for Collaboration, Time & Tasks Management, and More
One Set of Styles Connected to As Many Pages as You Want!!!
Presentation transcript:

Mess with Text: textual analysis using AntConc and TagAnt Liorah Golomb, Humanities Librarian University of Oklahoma ResBaz OU 2017

Find text to mess with Project Gutenberg (public domain, mostly pre-1923) tvsubtitles.net Any HTML that can be saved as text using Notepad or any text editor PDF and Word docs, using a tool such as AntFileConverter Epubs, Kindle books, etc. using a tool such as Calibre

The tools AntConc A freeware corpus analysis toolkit for concordancing and text analysis. TagAnt A freeware Part-Of-Speech (POS) tagger built on TreeTagger (developed by Helmut Schmid). Developer: Laurence Anthony, Waseda University Laurenceanthony.net

The corpora Common Sense by Thomas Paine (1776). Project Gutenberg Ebook #147. The Federalist Papers by Alexander Hamilton, John Jay, and James Madison (1787-1788). Project Gutenberg Etext #1404. The Tweets of Donald J. Trump, Jan. 20-Oct. 3 ,7:30 a.m., 2017. Via Trump Twitter Archive.

Basic AntConc features Create word list from one or multiple text files Use concordance to find specific words Compare one corpus to another using tool preferences Find a term in context Find the words near a term using collocates Find the words surrounding a term using clusters/n-grams Create a keyword list by comparing a corpus to a larger standard corpus Can use truncation (*) Export results to text file

Some comparisons Thomas Paine used 3817 individual words in Common Sense (word count ~21736). 2001 words were used only once. Use of word sad=0, awful=1, bad=4, terrible=0, fake=0 Hamilton et al. used 8608 individual words in the Federalist Papers (word count ~129423). 2941 words were used only once. Use of word sad=0, awful=4, bad=14, terrible=0, fake=0 Trump used 4797 individual words in his tweets* (word count ~29935). 2632 words were used only once.** Use of word sad=15, awful=0, bad=43, terrible=15, fake=114. *Stripped of retweets but not of date and time stamps. Also not stripped, URLs or components of URLs. ** Includes strings of letters that were part of a URL.

TagAnt part of speech tagger Creates a new file labelled as tagged and places it in the same folder as the corpus examined Sort vertically to place into a spreadsheet Column A=word, Column B=tag, Column C=lemma Interpret the tags using a site such as the Penn Treebank Project or (my preference) Georgetown’s