Download presentation
Presentation is loading. Please wait.
Published byMarybeth Payne Modified over 7 years ago
1
BUSA 3110 – Spring 2017 Module 1: Data (Chapters 24 and 1)
Kim I. Melton, Ph.D.
2
Timeline Day 0: The changing role of data in decision making (carry over from second day of class) Day 1: Data in the context of decision making Types of data Uses of data Day 2: Orientation to JMP (NOC 109) Day 3: Data as part of a larger “picture” (assumptions, limitations, and cautions)
3
The Historical Role of Data in Statistics
Describe (Descriptive Statistics) Summarizes data Graphically Through formulas and tables Infer (Inferential Statistics) Use data from a small number of observations to draw conclusions about the larger group Improve (Process Studies) Use data from past experience to help predict expected outcomes at a different time or place or to direct action to influence future outcomes
4
How (and Why) is the Field of Statistics Changing? Think “Data”
Source:
5
The Evolving Role of Data in Statistics
Descriptive/Informative Includes current descriptive and inferential statistics Looks at past and current performance to “describe” Predictive/Explanatory Looks at past and current performance with a goal of predicting future performance (i.e., to be able to “explain”) Addresses “what if” questions Prescriptive/Understanding of Interactions & Implications Uses quantitative models to assess how to operate in order to achieve some objective within constraints (and may include deterministic and probabilistic aspects)
6
Analytics
7
Data There is no such thing as “objective data.” Someone decides:
What data to collect When to collect the data How to collect the data How to define the characteristic of interest Some data are more objective than other data. Examples: Count the f’s Count the pages Write a one page paper describing _____. What constitutes “most” of the time?
8
Data for Decision Making
Major issues Purpose (descriptive, predictive, prescriptive) Measurement Level (quantitative/qualitative, nominal, ordinal, interval, ratio) Variable choice and definition Sources of variation (within a population, across time, in an on-going process, sample, census) Cross sectional, time series Population, process Methods of accessing (primary, secondary) Choice of observations (random, convenience, rational) External influences (ethical and practical)
9
What is/are data? Vocabulary
Variable - generic name for a characteristic of interest [each variable has its own column] Examples: Gender, Height, Price, Weight, Color, Pay Grade Observations/Items/Records/respondents/experimental units/ – the “thing” that is being measured/described [each row corresponds to the same observation/item…] Entries/values– the observed value in the cell (these are the “data” for analysis) Examples: Male/Female; 62”/59”/72”; $1.25/$2.00; 6/8/10 oz; Red/Blue/Green; E-1/E-2/E-3 Sample (as used in statistics vs. some places in science) – the collection of observations to be analyzed (e.g., a sample of size 6 would include 6 observations on the same variable) Census – data collected on every item in a population
10
Getting Data into a Data Table
Tidy Data There is a column for each variable Each row represents an observation/item Each type of observational unit forms its own table
11
Table (tabulated results) vs. Data Table
12
Spreadsheet vs. Data Table
Variable names (column headers – NOT entries in the data table) Data Table 1 Data Table 2 Data Table 3
13
Putting Data in Context (5 W’s and H)
Who does the data describe (doesn’t have to be people) [Rows] What characteristics are recorded (variables of interest) [Columns] Why are we collecting data (purpose, guiding questions,…) How were the data collected (theory-wise and physically) Sampling, convenience, primary or secondary data, training for data collection Operational definitions will describe what is “measured”, how the measurements are taken (getting to the level of measurement level/modeling type and method of measurement), and provide a way that two people looking at the same item would come to the same conclusion about the characteristic. When were the data collected (date/time, across time, …) Where were the data collected (geographic, point in process, source…)
14
Not all data are “created” equally
Quantitative vs. qualitative Level of specificity/precision Height of an individual: tall vs. 6’7” Distance to the nearest fire hydrant: 3 blocks vs. .25 miles Diameter of a bottle closure: 1.0, 1.001, , … Match with the characteristic being considered (operational definition)
15
How many bedrooms are in a house?
16
Question at budget hearing at a university: How many students do you have?
Would the same methods of “counting” work for English, Math, Nursing, and Business? Majors/Minors/Service courses Full-time/Part-time Degree level At the time of consolidation, GSC had about 9000 students and NGCSU had about 6000 students…how many students would you expect UNG to have had the day after the two consolidated?
17
Different Measurement Scales for Data Require Different Types of Analysis
Modeling Type (measurement scale/level) Data Type Character At least some observations include non-numeric values Analysis must use approaches appropriate for qualitative data Numeric All observations are reported as numbers Analysis can use approaches appropriate for quantitative data UNLESS the observations are a way of identifying a category or rank rather than a measurable characteristic Nominal Observations represent categories where there is no agreed to ordering Ordinal Observations represent categories where there is agreed to ordering Interval and Ratio (JMP combines these and calls them Continuous) Quantitative data that represents measurement, counts, and were distance between values has consistent meaning.
18
Examples Major Grade in a course Job title
Year in school (Freshman,…, Senior) Price of a gallon of regular gas Salary Rank of your favorite college team Size of a house Grade on a test Gender Level of agreement (1, 2, …, 9, 10 where higher numbers relate to stronger agreement)
19
Focus of data collection— snap shot vs. motion picture
Cross-sectional vs Time Series Population vs Process Cross-sectional All data for the variable is from a single point in time Time-Series Data come for a single variable are collected at regular intervals over time (often with the goal of looking for trends/patterns) Population Homogeneous group where the goal is to describe the characteristics of that group Process On-going melding of materials, methods, machines, environment, and people to create an output where the aim is to understand the “cause system” that creates current and potentially future output.
20
Questions about Focus…
Building schedules involves forecasting demand. Can we use headcount (of majors) to estimate credit hours? Would we expect the same relationship between headcount and credit hours in the English Department vs. Business? How is enrollment changing over time? Are we growing? If so, how? What is happening to enrollment by campus? How are our students distributed across majors? How are students distributed across earned hours? How is the level of student learning changing over time?
21
Day 2: Data – Lab Session 1 JMP Session Accessing a JMP Data File
Running an existing script Incorporating JMP output into Word (as a picture) Accessing data stored in an Excel file Understanding the layout of a Data Table Selecting Data Type and Modeling Type Value Ordering Ordinal Scale data Cleaning data (using the Recode Utility) Creating a new variable (using the Recode Utility) Obtaining summary output Saving scripts and data tables
22
Day 3 – Data as Part of the Bigger Picture
Inter-disciplinary Assumptions Good data – meaning…??? Limitations Time Money Skill Accessibility Cautions Changing conditions between collection and application Understanding the data collection process for secondary data When Measurement becomes the mission!
23
24.5 Successful Data Mining
QTM1310/ Sharpe 24.5 Successful Data Mining The first step is to have a well-defined business problem, which can help you avoid going down a lot of blind paths. Typically, 65% to 90% of the time is spent in data preparation – investigating missing values, correcting wrong entries, reconciling data definitions, or creating new variables from old ones.
24
24.8 The Data Mining Process (and data analysis in practice)
QTM1310/ Sharpe 24.8 The Data Mining Process (and data analysis in practice) The process must start with the Business Understanding phase. Data Understanding is central to the entire data mining project – it is crucial to understand the data warehouse, what it contains, and what limitations are present. Once variables are selected and the response variable has been agreed upon, the Data Preparation phase begins. Following preparation is the Data Modeling phase. The more knowledge of the data and the variables that goes into the model, the higher the chances of success for the entire project. Finally, if the model seems to give business insight, it’s time for the Deployment phase – just keep in mind that the business environment changes rapidly, so models can become stale quickly.
25
24.8 The Data Mining Process
QTM1310/ Sharpe 24.8 The Data Mining Process (and also applies to most any data analysis in practice)
26
What do we mean by “good data”?
Considerations when you are collecting data Considerations when you are evaluating reports that claim to be based on data
27
Other Issues in Data Collection
External influences (practical and ethical) Practical (time, money, access) Ethical policies that can interfere with collecting good data: evaluation systems that look at components separately reward systems quotas and arbitrary goals fiscal year budgets Other issues to cover in Chapter 8 Methods of accessing (primary, secondary; survey, experiment, observational) Choice of observations (random, convenience, rational) Parking Space Reserved for Drive-Thru
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.