More HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
Outline HTRC Analysis –Topic Modeling –Spell Checking
Meandre Flow Encapsulation and integration environment for tools and algorithms
Topic Modeling
Topic Modeling Flow
Topic Modeling in HTRC
Topics for Jane Austen Workset Some of the topics from Jane Austen
Topic Modeling References latent-dirichlet-allocation-for-english-majors/ latent-dirichlet-allocation-for-english-majors/ Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 96–104, Portland, OR, USA, 24 June © 2011 Association for Computational Linguistics Matthew Jockers, Macroanalysis: Digital Methods and Literary History, UIUC Press, 2013 Termite: Visualization Techniques for Assessing Textual Topic Models, Jason Chuang, Christopher D. Manning, Jeffrey Heer, Advanced Visual Interfaces, 2012Termite: Visualization Techniques for Assessing Textual Topic Models Jason ChuangJeffrey Heer Mallet website: David Mimno’s website:
Spell Checking
Spell Check in HTRC
Spell Check Report
Spell Check Replacement Rules
Spellchecking Analysis Not just OCR detection but OCR correction Can also be used for cleaning other messy data
Spell Check Flow
Demonstration HTRC Portal –Topic Modeling –Spellcheck
Learning Exercises (1) 1.Run Meandre_Topic_Modeling Algorithm A.Click on “Algorithms” B.Click on “Meandre_Topic_Modeling” 1.Provide Job Name (required) 2.Select a Workset (required) 3.Adjust Additional Parameters (optional) a.Provide the number of tokens to be displayed in the tagcloud (default: 200): b.Provide the number of topics to be created (default: 10): 4.Click “Submit” button C.Once Job finishes, select Job Name D.View Results by clicking on “topic_tagclouds.html”
Learning Exercises (2) 2.Run Meandre_Spellcheck_Report_Per_Volume A.Click on “Algorithms” B.Click on “Meandre_Spellcheck_Report_Per_Volume” 1.Provide Job Name (required) 2.Select a Workset (required) 3.Adjust Additional Parameters (optional) a.Provide a text for transformation, e.g. h=li; li=h; rn=m; m=rn; s=f; b.Provide a url that contains the dictionary c.Provide a url for token counts that can be used for choosing the best correctly spelled word based on popularity. 4.Click “Submit” button C.Once Job finishes, select Job Name D.View Results by clicking on “spellcheck_report.html”, “replacement_rules.txt”, etc
Attendee Project Plan Study/Project Title Team Members and their Affiliation Procedural Outline of Study/Project –Research Question/Purpose of Study –Data Sources –Analysis Tools Activity Timeline or Milestones Report or Project Outcome(s) Ideas on what your team needs from SEASR staff to help you achieve your goal. Identify Research Question Identify Research Question
Discussion Questions What analytical tools or applications do you want to utilize with HT data?