Download presentation
Presentation is loading. Please wait.
1
Chapter 2 Describing Data: Graphs and Tables
Basic Concepts Frequency Tables and Histograms Bar and Pie Charts Scatter Plots Time Series Plots Some information adapted from: Levine, Brenson and Stephan’s Statistics for Managers Alok Srivastava
2
Basic Concepts in Data Analysis
Data, Information, and Knowledge Populations and Samples Variables and Observations Types of Data: Categorical and Numerical Types of Data: Cross Sectional and Time Ordered Alok Srivastava
3
Data, Information, and Knowledge
Data are building blocks of information. These are observations on entities (observation units). Variables are used to measure observations. Information is processed data (organized, summarized, analyzed and filtered) that are made meaningful and relevant to the situation/phenomenon being understood. Knowledge is the ability to apply/use information to decision situations. Meaning associated with information is knowledge …. Actionable Information! Processing Analysis Reports Application Meaning Relevance Alok Srivastava
4
Populations and Samples
Statistical Inference Sample: Subset of collection of all possible entities (observation units) Data on sample is what is available. KNOWN Statistics are used to describe samples. These can vary across samples. Population: Collection of all possible entities (observation units) Data on the whole population is usually not available. UNKNOWN Parameters are used to describe populations. These are constants for a population. Statistical Inference is the art and science of drawing inferences/ conclusions about a population of interest. Statistical Inference is the process by which a characteristics/aspects of a population are understood (known). Conclusions about the population are drawn (inferred) based in the knowledge gained from the sample. A sample should be a good representation of the population. Alok Srivastava
5
Variables and Observations
Entity Height (inches) Weight (pounds) Age (years) Sex (Category) Person 1 Person 2 Person 3 * 67 61 72 170 120 220 33 38 62 Male Female O B S E R V A T I O N S Variables are characteristics (aspects) of entities that are different for different entities. Observations on an entity are values of these characteristics that have been measured. So, a dataset is a collection of observations on a group (sample) of entities. Each row is an observation on a particular entity. Each column is an aspect or characteristic of individual entities (measured as variables). Measurement Alok Srivastava
6
Types of Data: Categorical and Numerical
We can do arithmetic on numerical data (age and salary). These data are actual measurements. Categorical data is qualitative. Sometimes qualitative data is coded. For example, opinion can be coded 1-5 and arithmetic (calculations) can be performed. Such data is ordinal (has implied order). State is a categorical variable and cannot be used for calculations. Such data are nominal. Categorical Numerical Alok Srivastava
7
Types of Data: Cross-sectional and Time Ordered
Questions What was the absenteeism at Plant 1 in Jan. 1998? Was the annual absenteeism the same for all plants? Was absenteeism stable at plant 1 during 1998? Alok Srivastava
8
Percentage Class Frequency Frequency Tables
A Frequency Table showing a classification of the AGE of attendees at an event. Class Frequency 10 but under 20 but under 30 but under 40 but under 50 but under Total Relative Frequency Percentage Class is a range for the values of a variable. Frequency is the number of observations associated with a class. Relative Frequency is the proportion of observations (frequency) associated with a class. Alok Srivastava
9
A graphical display of distribution of frequencies
Frequency Histograms A graphical display of distribution of frequencies Alok Srivastava
10
Developing Frequency Tables and Histograms
Sort Raw Data in Ascending Order: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Find Range: = 46 Select Number of Classes: 5 (usually between 5 and 15) Compute Class Interval (width): 10 (range/classes = 46/5 then round up) Determine Class Boundaries (limits): 10, 20, 30, 40, 50 Compute Class Midpoints: 15, 25, 35, 45, 55 Count Observations & Assign to Classes Alok Srivastava
11
Displaying Categorical Data
Bar and Pie Charts Displaying Categorical Data CD 14% Investment Category Amount Percentage (in thousands $) Stocks Bonds CD Savings Total Savings 15% Stocks 42% Bonds 29% Alok Srivastava
12
Side by Side Chart Displaying Categorical Bivariate Data: Contingency Tables and Side-by-Side Charts Alok Srivastava
13
Scatter Plot for bivariate numerical data
Shows relationship between two variables. Can one be used to predict the other? Time-Series and Regression Analysis are used to predict one variable’s value based on the other. Correlation analyses is used to measure the strength of linear relationship among two variables. Alok Srivastava
14
Chapter Summary Alok Srivastava
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.