Graphical and Tabular Descriptive Techniques Statistics for Management and Economics Chapter 2 Updated: 11/28/2015
Objectives Types of Data and Information Graphical and Tabular Techniques for Nominal Data Graphical Techniques for Interval Data Describing the Relationship Between Two Variables Describing Time-Series Data
Types of data and information A variable is a characteristic of population or sample of interest Data, or data points, is the actual values of variables. There are three general types: Interval, Nominal and Ordinal We also commonly designate them: Quantitative or Categorical
Types of data and information: Interval Data Real numbers, for example: heights, weights, prices, etc. Also referred to as quantitative or numerical. Arithmetic operations can be performed on Interval Data, thus it’s meaningful to talk about 2*Height, or Price + $1, and so on.
Types of data and information: Nominal Data Nominal data are also called qualitative or categorical. The values of nominal data are categories, for example: responses to questions about marital status, coded as Single = 1, Married = 2, Divorced = 3, Widowed = 4 Because the numbers are arbitrary arithmetic operations don’t make any sense (e.g. does Widowed ÷ 2 = Married?!)
Types of data and information: Ordinal Data Ordinal Data appear to be categorical in nature, but their values have an order; a ranking to them. For example, college course rating system: poor = 1, fair = 2, good = 3, very good = 4, excellent = 5. Order is maintained no matter what numeric values are assigned to each category. While it’s still not meaningful to do arithmetic on this data (e.g. does 2*fair = very good?!), we can say things like: excellent > poor or fair < very good
Types of data and information Cross-sectional data is data that is collected at a certain point in time Marketing survey Starting salaries of MBA graduates Longitudinal data is collected over a period of time. Sometimes also referred to as time-series data. Weekly starting prices of gold Daily bid price on posted e-Bay item
Types of data and information Prospective data is collected from the current point into the future. Retrospective or historical data is collected on events that have happened in the past. Compare total return on past years for a particular IRA
Graphical and Tabular Techniques for Nominal Data To summarize nominal data, counts (frequencies) are used to describe the number of observations in each category. These counts are reported in a table called a frequency distribution. If the percents (or proportions) associated with each count are also reported, this is a relative frequency distribution (i.e., the amount in each group relative to the total).
Graphical and Tabular Techniques for Nominal Data Bar Chart Often used to display frequencies The bars represent each category Height of the bar represents the frequency The base of the bar is arbitrary Pie Chart Shows relative frequencies A circle, divided into a number of slices, each of which represent a category The size of the “slice” is proportional to the size of the percentage for the corresponding category
Graphical and Tabular Techniques for Interval Data Histogram, stem-and-leaf plot, and the ogive are used when the data are interval (i.e. numeric, non-categorical). The most important of these graphical methods is the histogram. The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities.
Histogram A bar-like graph where the bases are intervals and heights are frequencies Steps to building a histogram: 1. Collect data 2. Create a frequency distribution 3. Draw the histogram Interpretation of the histogram: Symmetry, Skewness, Modality, Bell Shape
Frequency Distribution for Interval Data Interval data has to be broken down into a series of intervals, or classes Table 2.6 shows the approximate number of classes in a frequency distribution based on the number of observations (more observations more classes) Sturges’ formula Number of class intervals = log(n) Determine interval widths (largest obs – smallest obs) / number of classes Summarize data into classes
Interpretation: Symmetry A histogram is said to be symmetric if, when we draw a vertical line down the center of the histogram, the two sides are identical in shape and size: Frequency Variable Frequency Variable Frequency Variable
Interpretation: Skewness A skewed histogram is one with a long tail extending to either the right or the left: Frequency Variable Frequency Variable Positively SkewedNegatively Skewed
Interpretation: Modality A unimodal histogram is one with a single peak, while a bimodal histogram is one with two peaks: Frequency Variable Unimodal Frequency Variable Bimodal A modal class is the class with the largest number of observations
Interpretation: Bell Shape A special type of symmetric unimodal histogram is one that is bell shaped: Frequency Variable Bell Shaped Many statistical techniques require that the population be bell shaped. Drawing the histogram helps verify the shape of the population in question. Let’s See…
Stem-and-leaf plot A graphical technique often used in preliminary analyses. Actual values for each observation is used to built the plot (as opposed to summaries as in the histogram). Each observation is split into a stem and a leaf and plotted.
Creating a Stem-and-Leaf Plot Observation value: There are several ways to split the stem and leaf: We could split it at the decimal point: Or split it at the “tens” position (first round to 42) StemLeaf
Creating a Stem-and-Leaf Plot Continue this process for all the observations. Then, line up “stems” in increasing numerical order, with “leaves” to the right (also in increasing numerical order) StemLeaf Thus, we still have access to our original data point’s value! The length of each line represents the frequency of the class defined by the stem.
Ogives Pronounced “Oh-jive” Graph of a cumulative frequency distribution Steps to an Ogive: 1. Create relative frequency distribution 2. Calculate cumulative frequencies 3. Graph cumulative relative frequencies
Interpretation: Ogive Q: 75% of First- Year accountants obtain what salary? Q: An accounting firm is offering Suzy $55,000 for her first year. Where does she fall among her peers? A: About $58,600 A: Around the 50 Percentile.
Graphical and Tabular Techniques for Two Variables Sometimes it is of interest to summarize more than one variable in your dataset or to compare similar variables from two groups. Different combinations of variables: Nominal/Ordinal, Nominal/Nominal, Ordinal/Ordinal, Time Series Data Contingency Tables, Scatter Diagrams, Bar Graphs, Line Plot, side-by-side box plot, back-to-back stem plot
Tabular Techniques For Two Nominal Variables A contingency table (also called a cross- classification table or cross-tabulation table) is used to describe the relationship between two nominal variables. A contingency table lists the frequency of each combination of the values of the two variables. Relative frequencies are also reported in the table. The data can then be summarized graphically with a bar cart
Creating a Contingency Table Rows and columns of table represent categories in each variable Each combination of the levels of each variable is summarize in the cells of the table. This reader’s response is captured as part of the total number on the contingency table… Example 2.8
Interpretation of a Contingency Table Percentages or proportions calculated to allow comparisons across cells Identify patterns that appear among the data
Using the Pivot Table in Excel In order to create a contingency table in Excel to summarize or graph data, you can use the Pivot Table. Let’s see…
Graphical Comparison Between Two Interval Variables Sometimes we are interested in how two interval variables are related. To explore this relationship, we employ a scatter diagram or scatterplot, which plots two variables against one another. The independent (predictor, explanatory) variable is labeled X and is usually placed on the horizontal axis, while the other, dependent (outcome, response) variable, Y, is mapped to the vertical axis. Let’s see…
Interpretation: Scatterplot Positive Linear RelationshipNegative Linear RelationshipWeak or Non-Linear Relationship Strength The extent to which the data points fit the pattern in the plot Pattern Do the data fall in a linear pattern? The pattern tells us if there is a relationship or not Direction The direction in which the data fall – tells important information about the relationship
Relationship Between One Nominal and One Interval Variable Bar Chart is an effective way to summarize with one set of bars for each of the levels of the nominal variable Back-to-back stem plot Side-by-side box plot
Time Series Plot Observations measured at successive points in time are called time-series data. Time-series data graphed on a line chart, which plots the value of the variable on the vertical axis against the time periods on the horizontal axis. Can be used to compare multiple groups over time Let’s see…
Time Series Plot Total amounts of U.S. income tax for the years 1987 to 2002 Identify Patterns in the chart in order to identify what is happening over time. From ’87 to ’92, the tax was fairly flat. Starting ’93, there was a rapid increase taxes until Finally, there was a downturn in 2002 We could plot several lines here as well to compare groups.