Workshop: JMP & R for Analytics Instruction Stephen Hill & Barry Wray
JMP www.jmp.com
Descriptive Analytics Analyze/Distribution Continuous Box plots/outliers Scale for X axis Selecting a cell or many cells (by selecting a “portion” of a cell the user call learn something about how the data is dispersed within a range) Ordinal – Be sure to use “Value Ordering” Nominal – Detect data cleaning issues in labeling
Descriptive Analytics
Descriptive Analytics Graph Builder – Freedom to choose and be creative
Descriptive Analytics Scatter Plot 3D – See patterns
Predictive Analytics Ease of creating (Stratified) Validation sets Regression (Fit model) SLS and GLM Stepwise Nominal Logistic Ease of adding Cross products and Factorial designs Artificial Neural Networks Flexibility of Hidden Layer (# nodes, activation method (Sigmoid, Tangent, Linear, Gaussian) Penalty Method Decision Trees – Recursive Partitioning Classification/Regression Kfold Validation Bootstrap Forest, Boosted Tree, K nearest neighbors, Naïve Bayes
Data Cleaning Columns Rows (exclude) Tables Recode Make indicator columns Combine Columns Explore Missing values – Multivariate Normal Imputation Explore Outliers Rows (exclude) Tables Concatenate Transpose Vlookup (like) Split Stack Join
R & RStudio www.r-project.org www.rstudio.com
Context Undergraduate “Big Data” Analytics Course Required for BAN Concentration, Elective for Others Only Prerequisite: Business Statistics Topic Coverage: Data Preparation Descriptive Analytics Visualization Focus on Predictive Analytics
Unofficial Textbook Free (Legally!) r4ds.had.co.nz
The Tidyverse
The Tidyverse
Workflow Working Directory R Markdown Document R Project Data .Rproj .Rmd Data .csv, etc.
R Markdown Code (shown in Markdown Pad)
R Markdown Code (shown in Markdown Pad) Detail View of R Code Chunk
Knitted HTML Link
Lessons Learned Don’t Underestimate Learning Curve Use Frequent Assignments and Feedback Plenty of Online Resources Available Be Prepared to Evolve