Data Analytics at CNU Dmitriy Shaltayev Associate Professor of Management Christopher Newport University
Introduction to Data Analytics BUSN 305 Undergraduate program only at the moment Four majors: ACCT, FINC, MGMT, MKTG Business core class (required for all majors, except ACCT) Replaced general Management Information Systems class in fall 2017 Class received favorable evaluations after first semester Material is still being tweaked around
Software Used Tried combination of JMP and R Selection was done in favor of JMP for an introductory class Tableau is used for visualizations Continuation of JMP from business statistics class Course is meant to be a “story telling with data” class Focus is on both informing students of popular data analytics techniques and effectively value of communicating results
BUSN 305 Course Content What is “big data”? After this introduction never using terms “big data”, working with “small data” Descriptive/diagnostic/predictive/prescriptive Focus is on the first three Visualization: Use JMP Focus is on Tableau Types of charts, focus on line, bar, bubble, boxplot, map (bubble and filled), treemap, tables/highlight tables/ heat maps Why pie charts are bad?
BUSN 305 Course Content Pre-attentive attributes Use of color in visualizations, color blindness Invited speaker to discuss effective oral presentations delivery techniques Supervised methods: Classification/regression trees Multiple regression Logistic regression Neural networks Unsupervised methods: Clustering (k-means, hierarchical)
Learning by Doing Four group projects (groups of 3) Project 1 – build charts with Tableau (line, bar, map, treemap), create a fully interactive dashboard Project 2 – use classification tree to predict delayed flights from IAD/DCA/BWI to JFK/LGA/EWR Introduced to idea of binning continuous variable Project 3 – predict movie box office based on budget, genre, cast composition. Choose three movies and predict their box office Project 4 – choose data set, create presentation, deliver it to class (10-15 minutes) Graded by both peers and instructor
Learning by Doing Each project should follow the structure: What is the issue at hand and why is that important Describe the data which was given and any data manipulations (binning) Any preliminary insights from data visualization Explain the choice of modeling tool selection Discuss model properties Which variables are important/significant How accurate is the model? Model applications examples
Advanced Data Analytics (MGMT 495) Issues relevant to data analytics How data are stored (database design principles) How data are retrieved (SQL “by hand”) What are the most popular data formats (.xlsx, .csv, .json, .xml) Data quality issues Outliers Missing values and imputation Skewed distributions
Advanced Data Analytics (MGMT 495) Introduction to Python programming Variables Built-in functions If-else For loops and while loops User-defined functions Importing libraries Most popular visualization libraries: pandas, seaborn, matplotlib
Advanced Data Analytics (MGMT 495) Advanced modeling techniques Random forest A/B testing Association rules Text mining Feature engineering
Where Is It Going? Plans are to split a current management concentration and create a “HR/Leadership” and “Decision Analytics” concentrations While introductory class is required for everyone, advanced analytics will be in “Decision Analytics” stream Trying to develop partnerships with companies Sentara Healthcare Booz Allen Hamilton Ferguson Enterprises