Download presentation
Presentation is loading. Please wait.
Published byHillary Shana Hodge Modified over 9 years ago
1
More HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign lauvil@illinois.edu, capitanu@ncsa.uiuc.edu
2
Outline HTRC Analysis –Topic Modeling –Spell Checking
3
Meandre Flow Encapsulation and integration environment for tools and algorithms
4
Topic Modeling
5
Topic Modeling Flow
6
Topic Modeling in HTRC
7
Topics for Jane Austen Workset Some of the topics from Jane Austen
8
Topic Modeling References http://www.matthewjockers.net/2011/09/29/the-lda-buffet-is-now-open-or- latent-dirichlet-allocation-for-english-majors/http://www.matthewjockers.net/2011/09/29/the-lda-buffet-is-now-open-or- latent-dirichlet-allocation-for-english-majors/ http://dsl.richmond.edu/dispatch/pages/intro http://historying.org/2010/04/01/topic-modeling-martha-ballards-diary/ http://www.ics.uci.edu/~newman/pubs/JASIST_Newman.pdf https://dhs.stanford.edu/visualization/topic-networks/ Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 96–104, Portland, OR, USA, 24 June 2011. © 2011 Association for Computational Linguistics Matthew Jockers, Macroanalysis: Digital Methods and Literary History, UIUC Press, 2013 Termite: Visualization Techniques for Assessing Textual Topic Models, Jason Chuang, Christopher D. Manning, Jeffrey Heer, Advanced Visual Interfaces, 2012Termite: Visualization Techniques for Assessing Textual Topic Models Jason ChuangJeffrey Heer Mallet website: http://mallet.cs.umass.edu David Mimno’s website: http://www.cs.princeton.edu/~mimno/
9
Spell Checking
10
Spell Check in HTRC
11
Spell Check Report
12
Spell Check Replacement Rules
13
Spellchecking Analysis Not just OCR detection but OCR correction Can also be used for cleaning other messy data
14
Spell Check Flow
15
Demonstration HTRC Portal –Topic Modeling –Spellcheck
16
Learning Exercises (1) 1.Run Meandre_Topic_Modeling Algorithm A.Click on “Algorithms” B.Click on “Meandre_Topic_Modeling” 1.Provide Job Name (required) 2.Select a Workset (required) 3.Adjust Additional Parameters (optional) a.Provide the number of tokens to be displayed in the tagcloud (default: 200): b.Provide the number of topics to be created (default: 10): 4.Click “Submit” button C.Once Job finishes, select Job Name D.View Results by clicking on “topic_tagclouds.html”
17
Learning Exercises (2) 2.Run Meandre_Spellcheck_Report_Per_Volume A.Click on “Algorithms” B.Click on “Meandre_Spellcheck_Report_Per_Volume” 1.Provide Job Name (required) 2.Select a Workset (required) 3.Adjust Additional Parameters (optional) a.Provide a text for transformation, e.g. h=li; li=h; rn=m; m=rn; s=f; b.Provide a url that contains the dictionary c.Provide a url for token counts that can be used for choosing the best correctly spelled word based on popularity. 4.Click “Submit” button C.Once Job finishes, select Job Name D.View Results by clicking on “spellcheck_report.html”, “replacement_rules.txt”, etc
18
Attendee Project Plan Study/Project Title Team Members and their Affiliation Procedural Outline of Study/Project –Research Question/Purpose of Study –Data Sources –Analysis Tools Activity Timeline or Milestones Report or Project Outcome(s) Ideas on what your team needs from SEASR staff to help you achieve your goal. Identify Research Question Identify Research Question
19
Discussion Questions What analytical tools or applications do you want to utilize with HT data?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.