Text Mining with JMP Pro 13: A Case Study Mia Stephens – mia.stephens@jmp.com Who is familiar with JMP? JMP 12 release Who am I Copyright © 2010, SAS Institute Inc. All rights reserved.
What is JMP (and JMP Pro)? Statistical Discovery Software from SAS Developed in 1989 For teaching and doing Comprehensive Basic Advanced Extendible powerful scripting language application and add-in builders Excel, R, MATLAB, and SAS Visual, dynamic and interactive JMP Pro – Advanced tools for analytics and modeling Runs natively By its very nature, JMP is visual, interactive and dynamic, which makes it an ideal tool for teaching and learning statistical concepts. JMP has a programming language, which is not required to use JMP, but allows users to expand applications of JMP Copyright © 2010, SAS Institute Inc. All rights reserved.
Agenda Introduction to Text Mining General Workflow Simple Example – Pet Survey Case Study: Toronto Casino Survey Reference: A practical guide to text mining with topic extraction, Karl, Wisnowski, and Rushing, 2015 Citation: WIREs Comput Stat 2015, 7:326–340. doi: 10.1002/wics.1361
Text Mining: General Workflow Define the problem Collect/compile the data (unstructured text, and other relevant information) Process/prepare the text Correct spelling/capitalization (Recode) Combine words with same root (Stemming) Remove common words, symbols, numbers,… (Stopwords) Transform text (Document Term Matrix) Explore clusters and topics to identify themes Group similar documents and words *Create new variables for predictive modeling
What is a Document Term Matrix? Take the following “documents” Find the number of unique terms Create indicator variables for each term The prices at Lowes are amazing Finding help in Lowes is a problem
Simple Example: Pet Survey
Case Study: Toronto Casino Survey
Case Study: Toronto Casino Survey Data exploration Are respondents generally in favor of or opposed to the casino? Does the response depend on gender or age? Text Explorer Term and Phrase Lists: What are the most frequently used terms and phrases? In what context are the terms and phrases used? Word Cloud: Do those in favor use different words/phrases than those opposed? Clustering: Which terms tend to appear together? Can similar documents be grouped together? Topic Analysis: What are the recurring themes? Predictive Modeling Can the topics or terms be used to predict whether a respondent was in favor of or opposed to the casino?
JMP Text Explorer Resources Info Kit Includes a webinar and book chapter on text mining Short guides (in the JMP Learning Library) jmp.com/learn Videos: http://www.jmp.com/en_us/events/ondemand/mastering-jmp/text-explorer.html http://www.jmp.com/en_us/events/ondemand/technically-speaking/tackling-unstructured-data-with-text-exploration.html General overview discussion of text explorer (Heath Rushing) with a simple example: http://www.jmp.com/en_us/events/ondemand/analytically-speaking/analytically-speaking-heath-rushing.html
Mia.stephens@jmp.com jmp.com/academic Discussion and Q&A Mia.stephens@jmp.com jmp.com/academic