Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Organizing and Summarizing Data 2.

Similar presentations


Presentation on theme: "Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Organizing and Summarizing Data 2."— Presentation transcript:

1 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Organizing and Summarizing Data 2

2 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Organizing Qualitative Data 2.1

3 2-3 When data is collected from a survey or experiment, they must be organized into a manageable form. Data that is not organized is referred to as raw data. Ways to Organize Data Tables (“Frequency Distributions”) Graphs (“Bar Graphs”, “Pie Charts”) Numerical Summaries (Chapter 3)

4 2-4 A frequency distribution is a Table that lists each category of data (Class) and the number of occurrences (frequency) in each Class of data.

5 2-5 EXAMPLE Organizing Qualitative Data into a Frequency Distribution The data on the next slide represent the color of M&M candies in a bag of plain M&Ms. Construct a frequency distribution of the color of plain M&Ms.

6 2-6 Frequency Distribution Class (Color) TallyFrequency Brown||||| ||||| ||12 Yellow||||| 10 Red||||| ||||9 Orange||||| |6 Blue|||3 Green|||||5

7 2-7 The relative frequency is the proportion (ratio) of observations within a class and is found using the formula: A relative frequency distribution lists each class of data and its relative frequency.

8 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Organizing Qualitative Data 2.1

9 2-9 ClassTallyFrequencyRelative Frequency Brown||||| ||||| ||1212/45 ≈ 0.2667 Yellow||||| 100.2222 Red||||| ||||90.2 Orange||||| |60.1333 Blue|||30.0667 Green|||||5 ----- 45 0.1111 -------- ~ 1.0

10 2-10 A bar graph is constructed by labeling each class of data on the horizontal axis and the frequency/relative frequency of that class on the vertical axis. Rectangles of equal width are drawn for each class. The height of each rectangle represents the class’s frequency/relative frequency.

11 2-11

12 2-12

13 2-13 “Pareto” Chart

14 2-14 EXAMPLEComparing Two Data Sets The following data represent the marital status of U.S. residents (millions) in 1990 and 2006. Draw a “side-by-side” relative frequency bar graph of the data. Marital Status19902006 Never married40.455.3 Married112.6127.7 Widowed13.813.9 Divorced15.1 181.9 22.8 219.7

15 2-15 Relative Frequency Marital Status Marital Status in 1990 vs. 2006 1990 2006

16 2-16 EXAMPLE Constructing a “Pie Chart” The following data represent the marital status of U.S. residents (M) in 2006. Draw a pie chart of the data. Marital StatusFrequency Never married55.3 Married127.7 Widowed13.9 Divorced22.8 219.7 M

17 2-17 EXAMPLEConstructing a Pie Chart

18 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Organizing Quantitative Data: The Popular Displays 2.2

19 2-19 The first step in summarizing quantitative data is to determine whether the data are discrete or continuous. If the data are discrete and there are relatively few different values of the variable, the classes of data will be the individual observations. If the data are discrete, but there are many different values of the variables, or if the data are continuous, the classes of data must be created using intervals of numbers (like Ages of Students).

20 2-20 The following data represent the number of available cars in a household based on a random sample of 50 households. Construct a frequency and relative frequency distribution. EXAMPLE Constructing Frequency /Relative Frequency Distribution from Discrete Data 3012111202422212202411324121223321220322232122113530121112024222122024113241212233212203222321221135 Data based on results reported by the United States Bureau of the Census.

21 2-21 # of Cars TallyFrequencyRelative Frequency 0||||44/50 = 0.08 1||||| ||||| |||1313/50 = 0.26 2||||| ||||| ||||| ||||| || 220.44 3||||| ||70.14 4|||30.06 5|10.02

22 2-22 A Histogram (like a Bar Chart) is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency/relative frequency of the class. The width of each rectangle is the same and the rectangles touch each other.

23 2-23

24 2-24

25 2-25 When a data set consists of a large number of different discrete data values or when a data set consists of continuous data, we must create classes by using intervals of numbers.

26 2-26 The following data represents the number of persons aged 25 - 64 who are currently work-disabled. The lower class limit (LCL) of a class is the smallest value within the class while the upper class limit (UCL) of a class is the largest value within the class. The class width (CW) is the difference between consecutive LCLs. The class width of the data given above is 35 – 25 = 10.

27 2-27 EXAMPLEOrganizing Continuous Data into a Frequency/Relative Frequency Distribution The following data represent the time (sec) between eruptions for a random sample of 45 eruptions at Yellowstone’s “ Old Faithful” geyser in Wyoming. Construct a frequency/relative frequency data distribution. Source: Ladonna Hansen, Park Curator

28 2-28 The smallest data value is 672 and the largest data value is 738, so the Range of the data is 738-672 = 66 sec. We will create 7 classes and set the lower class limit of the first class to be 670. (66/7 = 9.4 so round up to 10 to get Class Width)

29 2-29 Time between Eruptions (seconds) TallyFreqRel Freq 670 – 679||22/45 = 0.044 680 - 68900 690 - 699||||| ||70.1556 700 - 709||||| ||||90.2 710 - 719||||| ||||90.2 720 - 729||||| ||||| |110.2444 730 - 739||||| ||7 --- 45 0.1556 -------- ~ 1.0

30 2-30 Guidelines for Determining Class Width Decide on the number of classes. Generally, there should be between 5 and 20 classes. The smaller the data set, the fewer classes you should have. CW = Range / # Classes (rounded up to nearest Integer)

31 2-31 Constructing a Freq/Rel Freq Histogram for Continuous Data Using class width of 10

32 2-32

33 2-33 A stem-and-leaf plot uses digits to the left of the rightmost digit to form the stem. Each rightmost digit forms a leaf. For example, a data value of 147 would have 14 as the stem and 7 as the leaf. Displayed as: 14 | 7 So: 147, 147, and 149  14 | 779

34 2-34 EXAMPLEConstructing a Stem-and-Leaf Plot An individual is considered to be unemployed if they do not have a job, but are actively seeking employment. The following data represent the unemployment rate in each of the 50 States plus the District of Columbia in June, 2008.

35 2-35 StateUnemployment Rate % StateUnemployment Rate % StateUnemployment Rate % Alabama4.7Kentucky6.3North Dakota3.2 Alaska6.8Louisiana3.8Ohio6.6 Arizona4.8Maine5.3Oklahoma3.9 Arkansas5.0Maryland4.0Oregon5.5 California6.9Mass5.2Penn5.2 Colorado5.1Michigan8.5Rhode Island7.5 Conn5.4Minnesota5.3South Carolina6.2 Delaware4.2Mississippi6.9South Dakota2.8 Dist Col6.4Missouri5.7Tenn6.5 Florida5.5Montana4.1Texas4.4 Georgia5.7Nebraska3.3Utah3.2 Hawaii3.8Nevada6.4Vermont4.7 Idaho3.8New Hamp4.0Virginia4.0 Illinois6.8New Jersey5.3Washington5.5 Indiana5.8New Mexico3.9W. Virginia5.3 Iowa4.0New York5.3Wisconsin4.6 Kansas4.3North Carolina6.0Wyoming3.2

36 2-36 We let the stem represent the integer portion of the number and the leaf will be the decimal portion. For example, the stem of Alabama (4.7%) will be 4 and the leaf will be 7 or: 4|7

37 2-37 2 8 3 222388899 4 000012346778 5 0122333334555778 6 02344568899 7 5 8 5

38 2-38 A split stem-and-leaf plot: 2 8 3 2223 3 88899 4 00001234 4 6778 5 0122333334 5 555778 6 02344 6 568899 7 7 5 8 8 5 This stem represents 3.0 – 3.4 This stem represents 3.5 – 3.9

39 2-39 Once a frequency distribution or histogram of continuous data is created, the raw data is no longer available, so some accuracy (true mean, etc.) may also be lost. However, the raw data can be retrieved from an accompanying stem-and-leaf plot. of Stem-and-Leaf Diagrams over Histograms Advantage of Stem-and-Leaf Diagrams over Histograms

40 2-40 DOT PLOT A dot plot is drawn by sorting each data point (observation) horizontally, in ascending (increasing) order, and then placing a dot above the observation each time the data point is observed. This is similar to a Stem & Leaf plot in that it displays the pattern of all the actual data points.

41 2-41

42 2-42 Shapes of Distributions Uniform distribution: the frequency of each value of the variable is constant. Bell-shaped (Normal) distribution: the highest frequency occurs in the middle. Frequencies tail off symmetrically to the left and right. Skewed right : the tail to the right of the peak is longer than the tail to the left. Skewed left : the tail to the left of the peak is longer than the tail to the right.

43 2-43

44 2-44 EXAMPLEIdentifying the Shape of the Distribution Identify the shape of the following histogram which represents the time between eruptions at Old Faithful.

45 Chap 245

46 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Additional Displays of Quantitative Data 2.3

47 2-47 A class midpoint (MP) is the sum of (LCL + UCL) / 2. A frequency polygon is a graph that uses points, connected by lines (connect the dots) to display the freq/rel freq for the classes. Plot a point above each class MP at a height equal to the class frequency, Next, draw straight lines between the points. Two additional lines are drawn connecting each end MP with the horizontal axis.

48 2-48 Time between Eruptions (seconds) Class Midpoint FrequencyRelative Frequency 670 – 679674.520.0444 680 – 689684.500 690 – 699694.570.1556 700 – 709704.590.2 710 – 719714.590.2 720 – 729724.5110.2444 730 – 739734.570.1556

49 2-49 Time (seconds) Frequency Polygon

50 2-50 Ogive A cumulative frequency distribution displays the aggregate freq/rel freq of the classses displayed so far. For discrete data, it displays the total number of observations less than/equal to that class. For continuous data, it displays the total number of observations less than/equal to the UCL of that class. The graph of this distribution is known as an “ogive” (read as “oh-jive”)

51 2-51 Time between Eruptions (seconds) Frequency Relative Frequency Cumulative Frequency Cumulative Relative Frequency 670 – 67920.04442 680 – 6890020.0444 690 – 69970.155690.2 700 – 70990.2180.4 710 – 71990.2270.6 720 – 729110.2444380.8444 730 – 7397 ---- 45 0.155 -------- ~ 1.0 45=n1=100%

52 2-52 Time (seconds) Frequency Ogive

53 2-53 Time (seconds) Relative Frequency Ogive

54 2-54 If the variable changes over time, the data are referred to as “time series data”. A “time-series plot” is obtained by plotting the time on the horizontal axis and the corresponding value of the variable on the vertical axis. Line segments are then drawn connecting the points.

55 2-55 YearClosing Value 19902753.2 19912633.66 19923168.83 19933301.11 19943834.44 19955117.12 19966448.27 19977908.25 19989212.84 19999,181.43 200011,497.12 200110021.71 20028342.38 200310452.74 200410783.75 200510,783.01 200610,717.50 200713264.82 The data to the right shows the closing prices (31 Dec) of the Dow Jones Industrial Average (DJIA = stock market) for the years: 1990 – 2007.

56 2-56 Year Closing Value Time-Series Plot Year-End DJIA (1990 – 2007)

57 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Graphical Misrepresentations of Data 2.4

58 2-58 Statistics “The only science that enables different experts to draw different conclusions using the same numbers. ” – Evan Esar

59 2-59 EXAMPLEMisrepresentation of Data The data in the following table represent the life expectancies (years) of residents of the United States. Year, xLife Expectancy, y 195068.2 196069.7 197070.8 198073.7 199075.4 200077.0 Source: National Center for Health Statistics

60 2-60 EXAMPLEMisrepresentation of Data (a) (b)

61 2-61 EXAMPLEMisrepresentation of Data The National Survey of Student Engagement is a 2007 survey that asked US college freshman students how many hours they spend preparing for class each week.

62 2-62 EXAMPLEMisrepresentation of Data HoursRelative Frequency 00 1 – 50.13 6 – 100.25 11 – 150.23 16 – 200.18 21 – 250.10 26 – 300.06 31 – 350.05

63 2-63 (a)

64 2-64 (b)

65 65Chap 2 Key Concepts Always remember, that graphs and plots can distort the way you perceive what you are seeing. This happens because of the way the visual-cortex (your brain) receives and re-transmits inputs from the sensor (eye). Always remember, that graphs and plots can distort the way you perceive what you are seeing. This happens because of the way the visual-cortex (your brain) receives and re-transmits inputs from the sensor (eye). People (salesmen) who purposely want this distortion to occur know how to use this feature of brain processing to leave a false impression in your mind. People (salesmen) who purposely want this distortion to occur know how to use this feature of brain processing to leave a false impression in your mind.

66 66Chap 2 Here Endeth Chapter 2 (Applause !) Next, comes Midterm Exam #1 (boo, hiss…), so here are a few pointers…

67 Chap 267


Download ppt "Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Organizing and Summarizing Data 2."

Similar presentations


Ads by Google