Presentation is loading. Please wait.

Presentation is loading. Please wait.

More HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign

Similar presentations


Presentation on theme: "More HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign"— Presentation transcript:

1 More HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign lauvil@illinois.edu, capitanu@ncsa.uiuc.edu

2 Outline HTRC Analysis –Topic Modeling –Spell Checking

3 Meandre Flow Encapsulation and integration environment for tools and algorithms

4 Topic Modeling

5 Topic Modeling Flow

6 Topic Modeling in HTRC

7 Topics for Jane Austen Workset Some of the topics from Jane Austen

8 Topic Modeling References http://www.matthewjockers.net/2011/09/29/the-lda-buffet-is-now-open-or- latent-dirichlet-allocation-for-english-majors/http://www.matthewjockers.net/2011/09/29/the-lda-buffet-is-now-open-or- latent-dirichlet-allocation-for-english-majors/ http://dsl.richmond.edu/dispatch/pages/intro http://historying.org/2010/04/01/topic-modeling-martha-ballards-diary/ http://www.ics.uci.edu/~newman/pubs/JASIST_Newman.pdf https://dhs.stanford.edu/visualization/topic-networks/ Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 96–104, Portland, OR, USA, 24 June 2011. © 2011 Association for Computational Linguistics Matthew Jockers, Macroanalysis: Digital Methods and Literary History, UIUC Press, 2013 Termite: Visualization Techniques for Assessing Textual Topic Models, Jason Chuang, Christopher D. Manning, Jeffrey Heer, Advanced Visual Interfaces, 2012Termite: Visualization Techniques for Assessing Textual Topic Models Jason ChuangJeffrey Heer Mallet website: http://mallet.cs.umass.edu David Mimno’s website: http://www.cs.princeton.edu/~mimno/

9 Spell Checking

10 Spell Check in HTRC

11 Spell Check Report

12 Spell Check Replacement Rules

13 Spellchecking Analysis Not just OCR detection but OCR correction Can also be used for cleaning other messy data

14 Spell Check Flow

15 Demonstration HTRC Portal –Topic Modeling –Spellcheck

16 Learning Exercises (1) 1.Run Meandre_Topic_Modeling Algorithm A.Click on “Algorithms” B.Click on “Meandre_Topic_Modeling” 1.Provide Job Name (required) 2.Select a Workset (required) 3.Adjust Additional Parameters (optional) a.Provide the number of tokens to be displayed in the tagcloud (default: 200): b.Provide the number of topics to be created (default: 10): 4.Click “Submit” button C.Once Job finishes, select Job Name D.View Results by clicking on “topic_tagclouds.html”

17 Learning Exercises (2) 2.Run Meandre_Spellcheck_Report_Per_Volume A.Click on “Algorithms” B.Click on “Meandre_Spellcheck_Report_Per_Volume” 1.Provide Job Name (required) 2.Select a Workset (required) 3.Adjust Additional Parameters (optional) a.Provide a text for transformation, e.g. h=li; li=h; rn=m; m=rn; s=f; b.Provide a url that contains the dictionary c.Provide a url for token counts that can be used for choosing the best correctly spelled word based on popularity. 4.Click “Submit” button C.Once Job finishes, select Job Name D.View Results by clicking on “spellcheck_report.html”, “replacement_rules.txt”, etc

18 Attendee Project Plan Study/Project Title Team Members and their Affiliation Procedural Outline of Study/Project –Research Question/Purpose of Study –Data Sources –Analysis Tools Activity Timeline or Milestones Report or Project Outcome(s) Ideas on what your team needs from SEASR staff to help you achieve your goal. Identify Research Question Identify Research Question

19 Discussion Questions What analytical tools or applications do you want to utilize with HT data?


Download ppt "More HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign"

Similar presentations


Ads by Google