Download presentation
Presentation is loading. Please wait.
Published byFlora Neal Modified over 6 years ago
1
Spring 2016 BUSA 3110 - Statistics for Business Module 1: Data
Kim I. Melton, Ph.D.
2
Syllabus Text, MyStatLab, JMP, D2L, MS Office
Accessing material D2L and MyStatLab Software Availability: JMP and MS Office Course Format Grading General expectations (especially deadlines, make-ups, extra credit, academic integrity, phones) Inclement Weather
3
Data/Information/Knowledge/Wisdom
Doing things right (Efficiency) Doing the right things (Effectiveness) DATA INFORMATION KNOWLEDGE/ UNDERSTANDING WISDOM Symbols (raw values) that represent properties of objects/events Describes; provides answers to who, what, where, and when questions Explains; provides answers to how to and why questions Evaluates knowledge/understanding; deals with values; uses judgment; answers what is best and why Based on the work of Russell Ackoff. See “From Data to Wisdom” in Ackoff’s Best, pp , 1999.
4
Content Six “Modules” (Sets of Slides)
Data Information Knowledge Wisdom Content Six “Modules” (Sets of Slides) Data – What is it, Types of data, How can we use it Summarizing Data – Visually and Quantitatively Collecting “Good” Data Inference Involving One Variable Simple Linear Regression Multiple Regression and Model Building
5
Grading MyStatLab Homework (16 points) MyStatLab Quizzes (16 points)
90 and above A 80 – 89 B 70 – 79 C 60 – 69 D Below 60 F MyStatLab Homework (16 points) Drop the lowest two and average the rest [then take percent of 16] MyStatLab Quizzes (16 points) Average all [then take percent of 16] Instructor Supplied Assignments (64 points) Eight assignments each graded out of 8 [add them up] Preparation / Participation (10 points) Total earned/total available [then take percent of 16] Pre-final grade = Add the points from each section Final (0-16 points) Two problems each out of eight point Final Grade = Points from (HW + Quizzes + Preparation / Participation + 8 Highest Instructor Supplied Assignments)
6
Instructor Supplied Assignment Topics (Tentative List)
Fundamentals of using JMP Summarizing Data Collecting “Good” Data for Statistical Inference Inference about One Variable Equations, Graphs, Model Statements, Hypotheses Simple Linear Regression Multiple Regression and Testing Theories Model Building and Selecting the “Best” Model
7
General Expectations Learning is not a divided responsibility (I teach, you learn)—learning is a joint responsibility (we learn together) My “hot buttons” Timeliness Ethical behavior Professional orientation toward learning This includes putting phones away and engaging in class Recognition that “true” learning involves more than getting the right answer
8
What is/are Statistics?
Statistics vs. statistics Statistics vs. Math
9
How Does Statistics (as a field of study) Apply to these Videos
How Does Statistics (as a field of study) Apply to these Videos? … And to what industries? Videos Think Business Analytics and Optimization Turning Data into Insight Business Analytics: Data Trends let Businesses Spot New Opportunities THINK: A Film about Making the World Work Better
10
How (and Why) is the Field of Statistics Changing?
Source:
11
Analytics
12
Impact of Analytics on the Way we Think about ___
Statistics? Evolution vs. revolution Improvement vs. innovation 1st order change vs. 2nd order change change in how we do something (1st order) vs. change in what we do (2nd order) Paradigm shift…makes us go back to the most basic assumptions
13
Based on the HBR Article
Analytics 2.0 Analytics 3.0 What is this? Examples from your life: What is this? Data aligned with analytics 3.0 that you are providing companies:
14
A Word about Deadlines (MyStatLab and D2L)
Deadlines are set to: Allow you time to see assignments well before the due date Allow you time to complete the assignments after the material is covered Provide you with as much time as possible prior to when I will start grading Therefore, I will use early morning deadlines rather than late night deadlines (giving you the option of the overnight hours to work) Remember, you can submit assignments before the deadline
15
Why Start with the LAST Chapter in the Book? (Chapters 24)
This is a second course in statistics. This chapter lets you reflect on the tools/ techniques from the first course…and sets the stage for this course. CONTEXT
16
24.8 The Data Mining Process
QTM1310/ Sharpe 24.8 The Data Mining Process (and also applies to most any data analysis in practice)
17
24.8 The Data Mining Process (and data analysis in practice)
QTM1310/ Sharpe 24.8 The Data Mining Process (and data analysis in practice) The process must start with the Business Understanding phase. Data Understanding is central to the entire data mining project – it is crucial to understand the data warehouse, what it contains, and what limitations are present. Once variables are selected and the response variable has been agreed upon, the Data Preparation phase begins. Following preparation is the Data Modeling phase. The more knowledge of the data and the variables that goes into the model, the higher the chances of success for the entire project. Finally, if the model seems to give business insight, it’s time for the Deployment phase – just keep in mind that the business environment changes rapidly, so models can become stale quickly.
18
24.4 Data Mining Myths Myth 1: Find answers to unasked questions.
QTM1310/ Sharpe 24.4 Data Mining Myths Myth 1: Find answers to unasked questions. Myth 2: Automatically monitor a database for interesting patterns. Myth 3: Eliminate the need to understand the business. Myth 4: Eliminate the need to collect good data. Myth 5: Eliminate the need to good data analysis skill.
19
24.5 Successful Data Mining
QTM1310/ Sharpe 24.5 Successful Data Mining The first step is to have a well-defined business problem, which can help you avoid going down a lot of blind paths. Typically, 65% to 90% of the time is spent in data preparation – investigating missing values, correcting wrong entries, reconciling data definitions, or creating new variables from old ones.
20
QTM1310/ Sharpe Be sure that the question to be answers is specific. A goal as vague as “improving the business” is not likely to be successful. Be sure that the data have the potential to answer the question. Check the variables to see whether a model can reasonably be built to predict the response. Be aware of overfitting the data. Make sure you validate the model on a test set. Make sure that the data are ready to use in the data mining model. Missing values, incorrect entries, and different time scales are all challenges that need to be overcome. Don’t try it alone. Data mining projects require a variety of skills and a lot of work. Assemble the right team of people. 20
21
What do we mean by “good data”?
Considerations when collecting data Considerations when evaluating claims based on data
22
Characteristics of “Good” Data
Accuracy of measurement Precision of measurement Uses an appropriate type data (level of measurement) Nominal, Ordinal, Interval, Ratio Interval and Ratio are often grouped as continuous or quantitative Aligns with the characteristic of interest Different numbers reflect differences in the items measured (rather than an inability to measure consistently) Measurement is a yardstick for “how we are doing” rather than the “mission” Parking Space Reserved for Drive-Thru
23
Putting Data in Context (5 W’s and H)
Who does the data describe (doesn’t have to be people) What characteristics are recorded (variables of interest) Why are we collecting data (purpose, guiding questions,…) How were the data collected (theory-wise and physically) Sampling, convenience, primary or secondary data, training for data collection Operational definitions will describe what is “measured”, how the measurements are taken (getting to the level of measurement level/modeling type and method of measurement), and provide a way that two people looking at the same item would come to the same conclusion about the characteristic. When were the data collected (date/time, across time, …) Where were the data collected (geographic, point in process, source…)
24
Describe, Explain, Understand, Predict, Prescribe
What were our sales for the month? (describing) How does this compare to the same month last year? (still describing) What’s changed that might account for the differences? (moves toward explaining) Why have sales changed? (starts to move from explaining to understanding) What will sales be in the future? (predicting and/or prescribing)
25
Data for Decision Making
Major issues Purpose (descriptive, predictive, prescriptive) Measurement Level (quantitative/qualitative, nominal, ordinal, interval, ratio) Variable choice and definition Sources of variation (population, across time, process) Methods of accessing (primary, secondary) Choice of observations (random, convenience, rational) External influences (ethical and practical)
26
Variable Choice and Measurement Level (Modeling Type)
Identify the Level/Type Nominal (Qualitative, Categorical) Ordinal (Qualitative, Categorical, Logical to Order the Categories) Interval (Quantitative, Differences have consistent meaning) Ratio (Quantitative, Differences and Ratios have meaning) NOTE: JMP combines Interval and Ratio into Continuous Major Grade in a course Job title Year in school (Freshman,…, Senior) Price of a gallon of regular gas Salary Rank of your favorite college team Size of a house Gender Level of agreement (1, 2, …, 9, 10 where higher numbers relate to stronger agreement)
27
1 3 2 4 Lists of Most Stolen Vehicles Ford F-250 crew 4WD
Chevrolet Silverado 1500 crew Chevrolet Avalanche 1500 GMC Sierra 1500 crew Ford F-350 crew 4WD Cadillac Escalade 4WD Chevrolet Suburban 1500 GMC Sierra 1500 extended cab GMC Yukon Chevrolet Tahoe Toyota Camry/Solara Toyota Corolla Chevrolet Impala Dodge Charger Chevrolet Malibu Ford Fusion Nissan Altima Ford Focus Chevrolet Cobalt Honda Civic 1 3 1994 Honda Accord 1998 Honda Civic 2006 Ford Full Size Pickup 1991 Toyota Camry 2000 Dodge Caravan 1994 Acura Integra 1999 Chevrolet Full Size Pickup 2004 Dodge Full Size Pickup 2002 Ford Explorer 1994 Nissan Sentra Dodge Charger Pontiac G6 Chevrolet Impala CHRYSLER 300 Infiniti FX35 Mitsubishi Galant Chrysler Sebring Lexus SC Dodge Avenger Kia Rio Highway Loss Data Institute (Insurance claims $) National Insurance Crime Bureau (thefts reported to law enforcement) National Highway Traffic Safety Adm. (FBI data) 2 4
28
http://www. realclearpolitics
(accessed 1/19/16)
29
Cross Sectional vs. Time Series
(accessed 1/19/16)
30
Other Issues in Data Collection
External influences (practical and ethical) Practical (time, money, access) Ethical policies that can interfere with collecting good data: evaluation systems that look at components separately reward systems quotas and arbitrary goals fiscal year budgets Other issues to cover in Chapter 8 Methods of accessing (primary, secondary; survey, experiment, observational) Choice of observations (random, convenience, rational)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.