as presented on that date, with special formatting removed What Would Constitute the Data Component in a Business Analytics Course? David Stephan, Baruch College, New York City (prev.) Kathy Szabat, La Salle University, Philadelphia DSI 2016 DASI Session, “Laying the Right Foundation for Analytics by Properly Framing the Business Problem and Getting Usable Data” Sunday, November 20, 2016, 8:30-10 AM, as presented on that date, with special formatting removed
Data in an introductory business statistics course vs Data in an introductory business statistics course vs. a business analytics course Comparison to instant coffee or teabag vs. expresso machine
What Would Constitute the Data Component in a Business Analytics Course? Operating assumptions for such a course: Business analytics with an emphasis on management decision-making (not data science) Students entering course have varied backgrounds
Why does a Business Analytics Student Need to Know About Data? Choosing and preparing data may be difficult because data may not be well-known or have been collected with the direct oversight of those analyzing the data Possible complexity of data to be analyzed Need to manipulate the data in ways not done when using basic statistical methods
What does a Business Analytics Student Need to Know About Data? Concepts and skills from business statistics Concepts and skills from information systems Not considered today: practical skills Related: must know problem formulation basics
What does a Business Analytics Student Need to Know About Data? Problem formulation basics (from business statistics or elsewhere): State the problem/opportunity Specify of business objective(s) Specify of business questions Specify the business analytics questions (fine-tune business questions as necessary)
What does a Business Analytics Student Need to Know About Data? Concepts from (introductory) business statistics such as: Importance of operational definitions Data cleaning Outliers, missing values, and inconsistent values (categories) Recoding variables Data encoding and type
The Parable of the ASCII Table DEC HEX BIN Symbol Description 00 00000000 NUL Null char 1 01 00000001 SOH Start of Heading 2 02 00000010 STX Start of Text 3 03 00000011 ETX End of Text 4 04 00000100 EOT End of Transmission 5 05 00000101 ENQ Enquiry 6 06 00000110 ACK Acknowledgment 7 07 00000111 BEL Bell 8 08 00001000 BS Back Space 9 09 00001001 HT Horizontal Tab 10 0A 00001010 LF Line Feed
Lessons from the Parable Minimize technical descriptions of how things work (they change over time!) Hadoop, NoSQL, MapReduce Emphasize descriptions of how to apply things to make them work for you Emphasize the conceptual and use non-technical examples
What does a Business Analytics Student Need to Know About Data? Concepts and skills from business statistics Concepts and skills from information systems Not considered today: practical skills Related: must know problem formulation basics
Cross-discipline Problems One Example: What is a “data model?”
Some Data Models
“Real” Data Models: Some may be more complete than others
What does a Business Analytics Student Need to Know About Data? Concepts from information systems curriculum? Require/borrow a second course in IS? ACM/AIS IS2010.2 “Data & information management” Draft MSIS 2016 (refers to IS2010.2 as a bridge or foundational course); curriculum to be developed
What does a Business Analytics Student Need to Know About Data (from IS)? Awareness of : How data is organized in an information system The ways data can be stored The ways data can be manipulated before and during analysis
Specifics: Storing and Retrieving Data Concept of a fixed record and file and its equivalence to worksheet data table of 20 rows and 10 columns Static data led to duplication of data Reducing duplication of stored data involves building relationships that remove, or factor out, variables and placing those removed variables in separate, new tables
Specifics: Connecting Data from Different Tables There must be at least one way back that connects or links the removed variable from the original table from which it came. “Key” concept: A new variable that uniquely identifies each row of the original table could be duplicated in the second (new) table. How to explain this?
Specifics: Matching without keys Adding keys not always an option. What to do?
Specifics: Subsetting Examine data by excluding rows or variables that contain particular values. Determing the best level of “grain” requires understanding the business context Basic statistical methods permit subsetting before an analysis begins; business analytics analysis permit subsetting during analysis
Specifics: Aggregating Aggregating data reduces the number of rows or columns of the data being analyzed, thereby easing both data processing and computational requirements Descriptive statistics can be used to aggregate data Other transformations, particularly those associated with predictive analytics methods can be more abstract or mathematically complex, but also result of having fewer rows (or columns) to analyze
Specifics: Influence of Problem Being Analyzed Aggregation may be driven by calculation complexity or by the nature of the business problem being analyzed. Aggregation can also be necessary because the data being analyzed has not been collected in a way that best serves the business problem being analyzed.
Specifics: Sensible Manipulation Decision-maker must be certain that the aggregating or subsetting is consistent to the business and the business problem. Automated processes may aggregate or subset in way that make no practical sense to the decision-maker
Specifics: Data Retrieval and “Structure” skipped
Specifics: Conceptualizing Data that is Highly Structured skipped
Specifics: Is there such a thing as semistructured data? skipped
Specifics: Unstructured data To be truly unstructured, data must be values that are not comprehensible without additional interpretation Unstructured data: Pictures, videos, and audio tracks as well as unstructured text such as product reviews posted online unstructured data Business analytics methods are more developed for unstructured text than for other types of unstructured data.
“Couldn’t a IS course teach these things” Would emphasis be the same? The “Excel” Challenge Would all of the “data concepts” be found/taught in a typical IS course.
Specifics: Training Data skipped
Open Question: When should the data component be introduced? skipped
Other open questions skipped
What Would Constitute the Data Component in a Business Analytics Course? Thank you! Kathy Szabat, szabat@lasalle.edu David Stephan, david@TwoBridgesIT.com DSI 2016 DASI Session, Sunday, November 20, 2016, 8:30-10 AM