More HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign

Slides:



Advertisements
Similar presentations
Bookseller Guide How to TOP-UP a National Book Token Electronic Gift Card using the Web Recommended solution for booksellers with broadband computer access.
Advertisements

Instructional Guide. How does EasyBib make research easier? Citation Generation Easily create a bibliography in MLA, APA, and Chicago styles Export to.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
Blogging at Btown. How might I use a blog? As a Do-Now starter activity To build a vocabulary or other collaborative list To cross class and/or grade.
Pasewark & Pasewark 1 Word Lesson 8 Increasing Efficiency Using Word Microsoft Office 2007: Introductory.
University of Illinois OCR Workshop Loretta Auvil UIUC October 18, 2011.
HathiTrust Research Center Tools SHARC: Secure HathiTrust Analytics Research Commons Dirk Herr-Hoyman HTRC Operations Manager + Architect Indiana University.
Elephant in the Room: Scaling Storage for the HathiTrust Research Center Robert H. McDonald Associate Dean for Library Technologies Deputy.
Ti Advanced Parallel Computing Chapter X: Topic Group X: Members
Tutorial EBSCOadmin Branding support.ebsco.com. To help you enhance the search experience for your users, EBSCO offers a number of custom branding options.
SEASR Overview Loretta Auvil, Boris Capitanu National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
What is so good about Archie and RevMan 5
VMWare Workstation Installation. Starting Vmware Workstation Go to the start menu and start the VMware Workstation program. *Note: The following instructions.
Inspiration - Getting Started Anastasia Trekles Milligan Clinical Asst. Professor Purdue University Calumet
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Overview: A brief synopsis of the changes on the upgraded report interface in ProviderConnect. Presenter: Michael Berry IMHC  Log-on ProviderConnect via.
Course ILT Proofing and printing documents Unit objectives Automatically or manually review and correct spelling and grammar Preview how a document will.
NPI Reporting 2010 Visions Conference July 28, 2010.
Rubicon ATLAS A Basic User’s Manual.
HathiTrust Research Center Dedicated to provision of computational access to comprehensive body of published works for scholarship and education.
Instructional Guide. How does EasyBib make research easier? Citation Generation Easily create a bibliography in MLA, APA, and Chicago styles Export to.
HTRC Workshop 101 THATCamp Gainesville April 24, 2014.
Copyright 2007, Paradigm Publishing Inc. WORD 2007 Chapter 2 BACKNEXTEND 2-1 LINKS TO OBJECTIVES Spell Checking a Document Spell Checking a Document Checking.
SEASR Applications and Future Work University of Illinois at Urbana-Champaign.
Productivity Programs Common Features and Commands.
SEASR Analytics for Zotero Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for.
THE ULTIMATE SUPER AWESOME “LEARNING TO WRITE A WORKS CITED PAGE” POWERPOINT! Corey Dubois Jack Moynihan Shawn Armogan Jonathan Campbell.
HathiTrust Research Center Architecture Overview Robert H. McDonald Executive Committee-HathiTrust Research Center (HTRC) Deputy Director-Data.
Moodle with Style Integrating new technologies to empower learning and transform leadership.
Accessing HTRC Data. What is Hathitrust Research Center? A collaborative research center launched jointly by Indiana University and the University of.
Summer Extended 2006 PowerPoint. Summer Extended 2006 PowerPoint Window Outline tab Slides tab Slide Pane Notes Pane View buttons Task pane Select pane.
Finding online information on the library website Tutorial Nyenrode Library.
ETMF Reporting Tool 1.4 v. 1.0, 31-Mar-2008.
Microsoft Visual Basic 2005 BASICS Lesson 1 A First Look at Microsoft Visual Basic.
Computer Literacy for IC 3 Unit 2: Using Productivity Software Chapter 3: Formatting and Organizing Paragraphs and Documents © 2010 Pearson Education,
SEASR Analytics Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for Supercomputing.
1 CA203 Presentation Application Working with Slide Text Lecture # 4.
Inspiration & Kidspiration: Developing Standards-Based Lessons with Digital Hypermedia Helen Siukola Jancich Anastasia Trekles Purdue University Calumet.
Mashups and Dashboards National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.
Visualizations, Mashups and Dashboards University of Illinois at Urbana-Champaign.
PubMed/Limits and Advanced Search (module 4.2). MODULE 4.2 PubMed/Limits & Advanced Search Instructions - This part of the:  course is a PowerPoint demonstration.
Centra Quick Tips Press button or Ctrl Key to speak Use button to ask questions Use button for Yes, button for No Use buttons for feedback - Step Out Text.
SEASR Overview Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
PowerPoint Project 1 Pages PPT PowerPoint Views Normal Slides Tab Outline Tab Slide Sorter Notes Page Slide Show.
This is how you invoke the Microsoft Visual Studio 2010 Software. All Programs >> Microsoft Visual Studio 2010.
MSOffice PowerPoint 1 Part 1 ® Microsoft® Office 2010: Illustrated Introductory.
Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.
OLSP Spring Online Learning Support Project Team
By Susan Fuentes Media Specialist Smiley Elementary.
CHOOSE 1 OF THESE.
Windows Internet Explorer 8- Illustrated Essentials Unit A – Getting Started with Internet Explorer 8 Finding and Navigating a Web Site.
MetaViewer Interface and Navigation start now. What is MetaViewer? Log On/Off to the system Main Window and Panes Master Index and Folder Pane Account.
Microsoft Excel 2007 Noris Bt. Ismail Faculty of Information and Communication Technology Tel : (Ext 8408) BCOMP0101.
The Next Step Hudson Fare Files 102 – Import & upload Rev. 10/14.
ETH 316 Week 3 Individual Organizational Issues Check this A+ tutorial guideline at 316-Week-3-Individual-Organizational-Issues.
Massimo Rimondi - CINECA - EQUASP IT coordinator
SEASR Overview Loretta Auvil, Boris Capitanu
Together Let’s Design an Online Quiz
PubMed/Filters (Basic Course Module 5)
Louisiana: Our History.
Non Faculty Overrides.
Media, Technology and Politics
PubMed/Filters (Basic Course Module 5)
This presentation demonstrates a tool which allows collaborative editing of documents. It is useful when working with others remotely or asynchronously.
PubMed/Filters (Basic Course: Module 5)
Refined14ReportPowerpointTopic templates
Presentation transcript:

More HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign

Outline HTRC Analysis –Topic Modeling –Spell Checking

Meandre Flow Encapsulation and integration environment for tools and algorithms

Topic Modeling

Topic Modeling Flow

Topic Modeling in HTRC

Topics for Jane Austen Workset Some of the topics from Jane Austen

Topic Modeling References latent-dirichlet-allocation-for-english-majors/ latent-dirichlet-allocation-for-english-majors/ Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 96–104, Portland, OR, USA, 24 June © 2011 Association for Computational Linguistics Matthew Jockers, Macroanalysis: Digital Methods and Literary History, UIUC Press, 2013 Termite: Visualization Techniques for Assessing Textual Topic Models, Jason Chuang, Christopher D. Manning, Jeffrey Heer, Advanced Visual Interfaces, 2012Termite: Visualization Techniques for Assessing Textual Topic Models Jason ChuangJeffrey Heer Mallet website: David Mimno’s website:

Spell Checking

Spell Check in HTRC

Spell Check Report

Spell Check Replacement Rules

Spellchecking Analysis Not just OCR detection but OCR correction Can also be used for cleaning other messy data

Spell Check Flow

Demonstration HTRC Portal –Topic Modeling –Spellcheck

Learning Exercises (1) 1.Run Meandre_Topic_Modeling Algorithm A.Click on “Algorithms” B.Click on “Meandre_Topic_Modeling” 1.Provide Job Name (required) 2.Select a Workset (required) 3.Adjust Additional Parameters (optional) a.Provide the number of tokens to be displayed in the tagcloud (default: 200): b.Provide the number of topics to be created (default: 10): 4.Click “Submit” button C.Once Job finishes, select Job Name D.View Results by clicking on “topic_tagclouds.html”

Learning Exercises (2) 2.Run Meandre_Spellcheck_Report_Per_Volume A.Click on “Algorithms” B.Click on “Meandre_Spellcheck_Report_Per_Volume” 1.Provide Job Name (required) 2.Select a Workset (required) 3.Adjust Additional Parameters (optional) a.Provide a text for transformation, e.g. h=li; li=h; rn=m; m=rn; s=f; b.Provide a url that contains the dictionary c.Provide a url for token counts that can be used for choosing the best correctly spelled word based on popularity. 4.Click “Submit” button C.Once Job finishes, select Job Name D.View Results by clicking on “spellcheck_report.html”, “replacement_rules.txt”, etc

Attendee Project Plan Study/Project Title Team Members and their Affiliation Procedural Outline of Study/Project –Research Question/Purpose of Study –Data Sources –Analysis Tools Activity Timeline or Milestones Report or Project Outcome(s) Ideas on what your team needs from SEASR staff to help you achieve your goal. Identify Research Question Identify Research Question

Discussion Questions What analytical tools or applications do you want to utilize with HT data?