Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 02 Graphical and Tabular Descriptive Techniques
A variable is some characteristic of a population or sample. E.g. student grades. Typically denoted with a capital letter: X, Y, Z… The values of the variable are the range of possible values for a variable. E.g. student marks (0..100) Data are the observed values of a variable. E.g. student marks: {67, 74, 71, 83, 93, 55, 48} Definitions Statistics Lecture Notes – Chapter 02
Interval data Real numbers, i.e. heights, weights, prices, etc. Also referred to as quantitative or numerical. Arithmetic operations can be performed on Interval Data, thus its meaningful to talk about 2*Height, or Price + $1, and so on. Statistics Lecture Notes – Chapter 02 Definitions
Nominal Data The values of nominal data are categories. E.g. responses to questions about marital status, coded as: Single =1, Married =2, Divorced =3, Widowed =4 Because the numbers are arbitrary arithmetic operations don’t make any sense (e.g. does Widowed ÷ 2 = Married?!) Nominal data are also called qualitative or categorical. Statistics Lecture Notes – Chapter 02 Definitions
Ordinal Data appear to be categorical in nature, but their values have an order; a ranking to them: E.g. College course rating system: poor = 1, fair = 2, good = 3, very good = 4, excellent = 5 While its still not meaningful to do arithmetic on this data (e.g. does 2*fair = very good?!), we can say things like: excellent > poor or fair < very good That is, order is maintained no matter what numeric values are assigned to each category. Statistics Lecture Notes – Chapter 02 Definitions
Time Series Data: Ordered data values observed over time Cross Section Data: Data values observed at a fixed point in time Statistics Lecture Notes – Chapter 02 Definitions
Ex2.1 -For each of the following examples of data, determine whether the data are quantitative, qualitative, or ranked. a.the month of the highest sales for each firm in a sample (qualitative) b.the department in which each of a sample of university professors teaches (qualitative) c.the weekly closing price of gold throughout a year (quantitative) d.the size of soft drink (large, medium, or small) ordered by a sample of customers in a restaurant (ranked) e.the number of barrels of crude oil imported monthly by the United States (quantitative) Statistics Lecture Notes – Chapter 02 Definitions
Statistics Lecture Notes – Chapter 02 Tabular and Graphical Techniques for Nominal Data Categorical Variables Frequency distribution Bar chart Pie chart Pareto diagram Numerical Variables Line chart Frequency distribution Histogram and ogive Stem-and-leaf display Scatter plot
The only allowable calculation on nominal data is to count the frequency of each value of the variable. We can summarize the data in a table that presents the categories and their counts called a frequency distribution. A relative frequency distribution lists the categories and the proportion with which each occurs. Statistics Lecture Notes – Chapter 02 Tabular and Graphical Techniques for Nominal Data
Ex2.2 – The student placement office at university conducted a survey of last year’s business school graduates to determine the general areas in which the graduates found jobs. The placement office intended to use the resulting information to help decide where to concentrate its efforts in attracting companies to campus to conduct job interviews The areas of employment are; 1.Accounting 2.Finance 3.General Management 4.Marketing/Sales 5.Other Statistics Lecture Notes – Chapter 02 Tabular and Graphical Techniques for Nominal Data
Statistics Lecture Notes – Chapter Tabular and Graphical Techniques for Nominal Data
Statistics Lecture Notes – Chapter 02 AreaFrequencyRelative Frequency(%) Accounting Finance General Management Marketing/Sales Other Total It all the same information, (based on the same data). Just different presentation. Tabular and Graphical Techniques for Nominal Data
There are several graphical methods that are used when the data are interval (i.e. numeric, non-categorical). The most important of these graphical methods is the histogram. The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities. Statistics Lecture Notes – Chapter 02 Graphical Techniques for Ordinal Data
1.Collect the Data 2.Create a frequency distribution for the data. 3.Draw the Histogram. Ex2.3 – Draw a histogram for long distance telephone bills data Statistics Lecture Notes – Chapter 02 Graphical Techniques for Ordinal Data … … … … ………………… … …
Statistics Lecture Notes – Chapter 02 Graphical Techniques for Ordinal Data Class LimitsFrequency 0 to to to to to to to to Total200 Histogram
Statistics Lecture Notes – Chapter 02 Graphical Techniques for Ordinal Data Skewness Measure of the degree of asymmetry of a frequency distribution Skewed to left Skewed to right Symmetric or unskewed
Statistics Lecture Notes – Chapter 02 Graphical Techniques for Ordinal Data Kurtosis Measure of flatness or peakedness of a frequency distribution relatively peaked relatively flat normal
A stemplot (or stem-and-leaf display), in statistics, is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution. Ex2.4 - The weights in pounds of a group of workers are as follows Construct a stem and leaf display for these data Statistics Lecture Notes – Chapter 02 Stem-and-Leaf Display (Stemplot)
The first step in constructing a stem and leaf display is to decide how to split each observation (weight) into two parts: a stem and a leaf.Thus, the first two weights are split into a stem and a leaf as follows: Statistics Lecture Notes – Chapter 02 Stem-and-Leaf Display (Stemplot) WeightStemLeaf Next, we consider each observation in turn and place its leaf in the same row as its stem, to the right of the vertical line. The resulting stem and leaf display shown below has grouped the 25 weights into five categories. StemLeaf
Is a graph of a cumulative frequency distribution. We create an ogive in three steps… 1.Calculate relative frequencies. 2.Calculate cumulative relative frequencies by adding the current class’ relative frequency to the previous class’ cumulative relative frequency. 3.Draw the Ogive (For the first class, its cumulative relative frequency is just its relative frequency) Statistics Lecture Notes – Chapter 02 Ogive
Statistics Lecture Notes – Chapter 02 Ogive Class LimitsRelative Frequency Cumulative Rel. Freq. 0 to 1571/200= to 3037/200 = to 4513/200 = to 609/200 = to 7510/200 = to 9018/200 = to 10528/200 = to 12014/200 = Total200 Ex2.5 – Draw an Ogive for the value given in ex2.3 Ogive for Telephone Bills What telephone bill value is at the 50th percentile?
Tabular Method: A Contingency table (Cross-Tab Classification table) is used to describe the relationship between two nominal variables. Ex2.5 –To help advertising campaigns, the advertising managers of the newspapers need to know which segments of the newspaper market are reading their papers. Discuss the relationship between newspaper and occupation with contingency table given below Statistics Lecture Notes – Chapter 02 Describing the Relationship Between Two Variables Occupation NewspaperBlue CollarWhite CollarProfessionalTotal G&M Post Star Sun Total
Graphic Method (Scatter Diagram): We create a scatter diagram in three steps 1)Collect the data 2)Determine the independent variable (X – house size) and the dependent variable (Y – selling price) 3)Use Excel to create a “scatter diagram”… Ex2.6 - A real estate agent wanted to know to what extent the selling price of a home is related to its size. Use a graphical technique to describe the relationship between size and price Statistics Lecture Notes – Chapter 02 Describing the Relationship Between Two Variables Size Price
It appears that in fact there is a relationship, that is, the greater the house size the greater the selling price Statistics Lecture Notes – Chapter 02 Scatter Diagram
Linearity and Direction are two concepts we are interested in Statistics Lecture Notes – Chapter 02 Scatter Diagram
Observations measured at the same point in time are called cross-sectional data. Observations measured at successive points in time are called time-series data. Time-series data graphed on a line chart, which plots the value of the variable on the vertical axis against the time periods on the horizontal axis. Statistics Lecture Notes – Chapter 02 Describing Time Series Data
Q2.1 - Identify the type of data observed for each of the following variables. a.the number of students in a statistics class b.the student evaluations of the professor (1 = poor, 5 = excellent) c.the political preferences of voters d.the states in the United States of America e.the size of a condominium (in square feet) Statistics Lecture Notes – Chapter 02 Exercises
Q2.2 - The salaries (in hundreds of dollars) of a sample of 40 government employees are as follows: a.Construct a stem and leaf display for these data. (When leaves consist of two digits, they should be separated from one another by commas.) b.Construct a frequency distribution for these data. c.Construct a relative frequency histogram for the data. d.Construct an ogive for the data. Statistics Lecture Notes – Chapter Exercises
Q2.3 - The number of men and women who have received an M.B.A. degree from a particular university in each of five years is shown below: a.Use a component bar chart to depict these data b.Use a line chart to depict these data Statistics Lecture Notes – Chapter 02 YearMenWomen Exercises
Q2.4 - The manager of a large furniture store wanted to know if the number of ads influenced the number of customers. During the past eight months, she kept track of both figures, which are shown below. Construct a scatter diagram for these data, and describe the relationship between the number of ads and the number of customers. Statistics Lecture Notes – Chapter 02 MonthNumber of Ads (x)Number of Customer (y) Exercises
Keller, Gerald; Statistics for Management and Economics, 9e, 2012 Groebner, D.F.; Shannon, P.W., Fry, P.C, Smith, K.D; Business Statistics: A decision Making Approach, 7e, 2007 Azcel, A.D; Complete Business Statistics, 7e, 2009 Newbold, P., Carlson, W., Thorne B.; Statistics for Business and Economics, 6e, 2007 Ott, R.L., Longnecker, M.; An Introduction to Statistical Methods and Data Analysis, 6e, 2010 Black, K.; Business Statistics for Contemporary Decision Making, 6e, 2010 References Statistics Lecture Notes – Chapter 01