Picturing Distributions with Graphs BPS chapter 1 Chapter 1: Picturing Distributions with Graphs © 2006 W. H. Freeman and Company
Objectives (BPS chapter 1) Picturing Distributions with Graphs Individuals and variables Two types of data: categorical and quantitative Levels of Data Ways to chart quantitative data: histograms and stemplots Interpreting histograms Time plots Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs What is STATISTICS ? Using “data” to draw a conclusion about something unknown. Decision making in the presence of uncertainty. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Statistics- Meaning ? Method of analysis a collection of methods for planning experiments or observational studies, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Statistics- Meaning ? Our Book: Statistics is the science (or ‘art’) of data. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 What Is “Data”? Pieces of information. Numbers. The above are data only if the information has a meaning attached. Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 The Nature of Data Data Data are observations that have been collected. The observation may be numerical (example: age, height, GPA) or non-numerical (example: gender, eye colour, province of residence) Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Individuals and variables Individuals are the objects described by a set of data. Individuals may be people, animals, or things. Example: Freshmen, 6-week-old babies, golden retrievers, fields of corn, cells A variable is any characteristic of an individual. A variable can take different values for different individuals. Example: Age, height, blood pressure, ethnicity, length, first language Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Two types of variables Chapter 1: Picturing Distributions with Graphs
Two types of variables Cont. quantitative Something that can be counted or measured for each individual and then added, subtracted, averaged, etc., across individuals in the population. Example: How tall you are, your age, your blood cholesterol level, the number of credit cards you own. categorical Something that falls into one of several categories. What can be counted is the count or proportion of individuals in each category. Example: Your blood type (A, B, AB, O), your hair color, your ethnicity, whether you paid income tax last tax year or not. Chapter 1: Picturing Distributions with Graphs
Two types of variables Cont. See page 5 for description of variables. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 Levels of Measurements 1. Nominal: characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme (such as low to high) Example: Survey responses may be yes, no, or undecided. Eye colour, gender etc. page 7 of text Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 2. Ordinal: involves data that may be arranged in some order, but differences between data values either cannot be determined or are meaningless Example: Course grades: A, B, C, D, or F Dress size: small, medium, large, XL Understanding the differences between the levels of data will help students later in determining what type of statistical tests to use. Nominal and ordinal data should not be used for calculations (even when assigned ‘numbers’ for computerization) as differences and magnitudes of differences are meaningless. Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 3. Interval: like the ordinal level, with the additional property that the difference between any two data values is meaningful. However, there is no natural zero starting point (where none of the quantity is present) Example: Years 1000, 2000, 1776, and 1492 Temperature in 0C - 0 0C does not mean no temperature. Students usually have some difficulty understanding the difference between interval and ratio data. Fortunately, interval data occurs in very few instances. Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 4. Ratio: the interval level modified to include the natural zero starting point (where zero indicates that none of the quantity is present). For values at this level, differences and ratios are meaningful. Example: Prices of college textbooks. Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 Levels of Measurement ______________________ - categories only __________________________- categories with some order ______________________- differences but no natural starting point ______________________- differences and a natural starting point review of four levels of measurement Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs Summary Chapter 1: Picturing Distributions with Graphs
Exploratory Data Analysis (EDA) Can be a table, graph, or function Distribution Tells what values a variable takes and how often it takes these values. Can be a table, graph, or function. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs EDA Cont. The W’s: Who What (in what units) When Where Why Can be a table, graph, or function Chapter 1: Picturing Distributions with Graphs
“University Students are Healthier than you Think” Math 161 Spring 2008 “University Students are Healthier than you Think” A Spring 2002 random undergraduate classroom survey of n=810 students was conducted by the Office of Health Promotion within a University Student Health Services, Division of Student Affairs. Statistics from this survey led to the following conclusions: - most students (67%) have 0-4 drinks when they go out - most (69%) have had 0-1 sex partners in the past year - most (76%) either don’t drink, or use designated drivers if they do Identify the W’s. Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Important Characteristics of Data The following characteristics of data are usually important: Center: An average value that indicates where the middle of the dataset is located. Variation/Spread: A measure of the amount of variation in the data (average variation from the center). Distribution: The shape of the distribution of the data (symmetric, uniform or skewed). Outliers: Sample values that lie far away from the vast majority of the other values. Time: Trend- changing characteristics of the data over time. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Presentation of Data To better analyze a dataset, we first present it in a summarized form using: Frequency Tables Pictures or Graphs Numerical Summaries (Center and Variation) Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Frequency Tables A frequency table lists data values (either individually or by groups of intervals called classes ), along with the number of items that fall into each class (frequency). Example: Test Score Frequency 0- 4 3 5 - 9 10 10-14 12 15-19 35 20-24 20 25-29 15 30-34 5 This frequency table has 7 classes (0-4,5-9,10-14,15-19,20-24,25-29,30-34). The frequency represents the number of students receiving that score. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Example The heights (in inches) of 30 students are as follows: 68 64 70 67 67 68 64 65 68 64 70 72 71 69 72 64 63 70 71 63 68 67 67 65 69 65 67 66 61 65 Create a frequency table for the above data using the classes 60-61, 62-63, 64-65 etc. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs
RELATIVE FREQUENCY TABLES Relative frequency = frequency / total # of items The relative frequency gives the percent of items in each class. A relative frequency table is a frequency table with a column for the relative frequencies. The relative frequencies might not add to 1 (100%) due to rounding. Example: Construct a relative frequency table for our last example. Chapter 1: Picturing Distributions with Graphs
Graphs for Categorical Data A picture (a good one) is worth a thousand words. Bar Graph Horizontal axis represents the categories. Vertical axis represents the frequencies. A bar whose height is proportional to the frequency is drawn centered at the category. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Example The following table gives the grade distributions of a Math 161 Test: Grade Frequency A 5 B 7 C 12 D 5 F 3 Draw a bar graph for the data. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Top 10 causes of death in the U.S., 2001 Bar graph sorted by rank Easy to analyze Sorted alphabetically Much less useful Chapter 1: Picturing Distributions with Graphs
Graphs for Categorical Data Double (Side-by-side) Bar Graphs Used to compare two different distributions. For each category, draw two adjacent bars (one for each distribution). Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Example Suppose we now have the grades for two sections of Math 161: Grade Frequency Section 1 Section 2 A 5 3 B 7 5 C 12 9 D 5 4 F 3 1 Draw a double bar graph for the data. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Example Are the frequency bar graphs the right way to compare the performance of the two sections? Note that the class sizes are not the same. What would be a better way to compare the two sections? Grade Frequency Section 1 Section 2 A 5 3 B 7 5 C 12 9 D 5 4 F 3 1 Chapter 1: Picturing Distributions with Graphs
Graphs for Categorical Data Pie Chart Shows the whole group of categories in a circle. Shows the parts of some whole . The area of the sector representing a category is proportional to the frequency of the category. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Example The following table gives the grade distributions of a Math 161 Test: Grade Frequency A 5 B 7 C 12 D 5 F 3 Draw a pie chart for the data. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Example: The pie chart shows the distribution of the type of housing a random sample of VIU students have. If 20 students live in apartments, how many students live in shared houses? A) 15 B) 20 C) 45 D) 60 E) 75 What central angle does the sector for Dorm have? Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Pictographs A picture of a set of small figures or icons used to represent data, and often to represent trends. Usually, the icons are suggestively related to the data being represented. They can be misleading. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Pictographs Double the length, width, and height of a cube, and the volume increases by a factor of eight. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs Time (Line) Graphs A time graph shows behavior over time. Time is always on the horizontal axis. Look for an overall pattern (trend). Look for patterns that repeat at known regular intervals (seasonal variations). Look for any striking deviations that might indicate unusual occurrences. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Misleading Graphs Changing the scale of a line graph or a bar graph can make increases or decreases appear more rapid. Both graphs plot the same data. Which one makes the increase in cancer deaths appear more rapid? Which graph would a cancer advocate use? Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 Salaries of People with Bachelor’s Degrees and with High School Diplomas Which graph is misleading? $40,500 $40,500 $40,000 $40,000 35,000 30,000 $24,400 30,000 20,000 $24,400 page 11 of text Graphs whose vertical scales do not start at 0 will give a misleading representation of the differences in heights of the bars. 25,000 10,000 20,000 Bachelor High School Degree Diploma Bachelor High School Degree Diploma (a) (b) Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs Important skills Computer programs will construct plots and calculate summary statistics automatically. The important skills for people are: knowing what to use when. Interpretation. The tools used to analyze and summarize data depend upon the type of variable one is interested in. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Principles for plots The way plots are used depends upon the purpose for which they are being used: Exploration Principle: Look at the data in as many different ways as possible searching for its important features. Communication to others (follows exploration) Principle : Be selective. Choose the displays that best show to a reader features you have observed. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Making Good Graphs Title your graph. Make sure labels and legends describe variables and their measurement units. Be careful with the scales used. Make the data stand out. Avoid distracting grids, artwork, etc. Pay attention to what the eye sees. Avoid pictograms and tacky effects. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 Key Concepts Categorical and Quantitative Variables Distributions Pie Charts Bar Graphs Line Graphs Techniques for Making Good Graphs Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Graphs for Quantitative Variables Math 161 Spring 2008 Graphs for Quantitative Variables Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Stemplots (Stem-and-Leaf Plots) For quantitative variables. Separate each observation into a stem (first part of the number) and a leaf (the remaining part of the number). Usually, the last digit is used as the leaf and the remaining digits form the stem. If using the last digits as they are results in a lot of stem values, we could round the numbers to more convenient values. Write the stems in a vertical column; draw a vertical line to the right of the stems. Write each leaf in the row to the right of its stem; order leaves if desired. Chapter 1: Picturing Distributions with Graphs
Weight Data Weights (in pounds) for a group of 40 students. Chapter 1: Picturing Distributions with Graphs
Weight Data: Stemplot (Stem and Leaf Plot) Math 161 Spring 2008 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Weight Data: Stemplot (Stem and Leaf Plot) 5 2 570 Key 20|3 means 203 pounds Stems = 10’s Leaves = 1’s 2 Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs Stem-and-Leaf Plots Double stemmed (expanded) stem-and-leaf: If there are a lot of leaves on one stem, we could break it up into two stems one for the digits 0-4 and the other for the digits 5-9. Back to back : Used to compare two different sets of data. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Histogram A histogram is a bar graph in which the horizontal axis represents the items or classes. the vertical axis represents the frequencies. the height of the bars are proportional to the frequencies. There are usually no gaps between the bars (unless some classes have 0 frequencies). To draw a histogram, we first need to construct a frequency table. Example: draw a histogram for our weights example. The number of classes can affect the shape of the histogram. http://www.stat.sc.edu/~west/javahtml/Histogram.html Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Same data set Not summarized enough Too summarized Chapter 1: Picturing Distributions with Graphs
Weight Data: Frequency Table Math 161 Spring 2008 Weight Data: Frequency Table * Left endpoint is included in the group, right endpoint is not. Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Weight Data: Histogram Math 161 Spring 2008 Weight Data: Histogram * Left endpoint is included in the group, right endpoint is not. Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 Shape of the Data Symmetric bell-shaped other symmetric shapes Asymmetric skewed to the right skewed to the left Unimodal, bimodal Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Symmetric Distributions Math 161 Spring 2008 Symmetric Distributions Mound-Shaped Bell-Shaped Uniform Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Asymmetric Distributions Math 161 Spring 2008 Asymmetric Distributions Skewed to the Left Skewed to the Right Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Most common distribution shapes Symmetric distribution A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other. A distribution is skewed to the right if the right side of the histogram (side with larger values) extends much farther out than the left side. It is skewed to the left if the left side of the histogram extends much farther out than the right side. Right Skewed distribution Complex, multimodal distribution Not all distributions have a simple overall shape, especially when there are few observations. Chapter 1: Picturing Distributions with Graphs
Chapter 1: Picturing Distributions with Graphs
Number of Books Read for Pleasure Math 161 Spring 2008 Number of Books Read for Pleasure Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 Outliers Extreme values, far from the rest of the data. May occur naturally. May occur due to error in recording. May occur due to error in measuring. Observational unit may be fundamentally different. Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.
Chapter 1: Picturing Distributions with Graphs Math 161 Spring 2008 Key Concepts Displays (Stemplots & Histograms) Graph Shapes Symmetric Skewed to the Right Skewed to the Left Outliers Chapter 1: Picturing Distributions with Graphs Chapters 10-11 31st Jan.