Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Similar presentations


Presentation on theme: "Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data."— Presentation transcript:

1 Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

2 Today in Prob & Stat

3 6-2 Stem-and-Leaf Diagrams Steps for Constructing a Stem-and-Leaf Diagram

4 6-2 Stem-and-Leaf Diagrams

5 Example 6-4

6 Figure 6-4 Stem-and-leaf diagram for the compressive strength data in Table 6-2.

7 Figure 6-5 25 observations on batch yields Stem-and-leaf displays for Example 6-5. Stem: Tens digits. Leaf: Ones digits. too few too many just right

8 Figure 6-6 Stem-and-leaf diagram from Minitab. Number of observations In the middle stem

9 6-4 Box Plots The box plot is a graphical display that simultaneously describes several important features of a data set, such as center, spread, departure from symmetry, and identification of observations that lie unusually far from the bulk of the data. Whisker Outlier Extreme outlier

10 Figure 6-13 Description of a box plot.

11 Figure 6-14 Box plot for compressive strength data in Table 6- 2.

12 Figure 6-15 Comparative box plots of a quality index at three plants.

13 6-5 Time Sequence Plots A time series or time sequence is a data set in which the observations are recorded in the order in which they occur. A time series plot is a graph in which the vertical axis denotes the observed value of the variable (say x ) and the horizontal axis denotes the time (which could be minutes, days, years, etc.). When measurements are plotted as a time series, we often see trends, cycles, or other broad features of the data

14 Figure 6-16 Company sales by year (a) and by quarter (b).

15 Figure 6-17 gosh! – a stem and leaf diagram combined with a time series plot A digidot plot of the compressive strength data in Table 6-2.

16 Figure 6-18 A digidot plot of chemical process concentration readings, observed hourly.

17 6-6 Probability Plots Probability plotting is a graphical method for determining whether sample data conform to a hypothesized distribution based on a subjective visual examination of the data. Probability plotting typically uses special graph paper, known as probability paper, that has been designed for the hypothesized distribution. Probability paper is widely available for the normal, lognormal, Weibull, and various chi-square and gamma distributions.

18 Probability (Q-Q) * Plots Forget ‘normal probability paper’ Plot the z score versus the ranked observations, x (j) Subjective, visual technique usually applied to test normality. Can also be adapted to other distributions. Method (for normal distribution): Rank the observations x (1), x (2), …, x (n) from smallest to largest Compute the (j-1/2)/n value for each x (j) Plot z j =F -1 ((j-1/2)/n) versus x (j) Parentheses usually indicate ordering of data.

19 Computing z j, where z j =  -1 (j – ½)/n xj values are ordered least to greatest

20 Example in EXCEL – Table 6-6, pp. 214 z j is the function NORMSINV

21 Example in EXCEL – Table 6-6, cont’d z j is the function NORMSINV

22 Example 6-7

23 Example 6-7 (continued)

24 Figure 6-19 Normal probability plot for battery life.

25 Figure 6-20 Normal probability plot obtained from standardized normal scores.

26 Figure 6-21 Normal probability plots indicating a nonnormal distribution. (a) Light-tailed distribution. (b) Heavy-tailed distribution. (c ) A distribution with positive (or right) skew.

27 The Beginning of a Comprehensive Example Descriptive Statistics in Action see real numbers, real data watch as they are manipulated in perverse ways be thrilled as they are sorted and be amazed as they are compressed into a single numbers

28 The Raw Data As part of a life span study of a particular type of lithium polymer rechargable battery, 120 batteries were operated and their life span in operating hours determined. Data generated from a Weibull distribution with  = 2.8 and  = 2000

29 Descriptive Statistics - Minitab Variable N Mean Median TrMean StDev SE Mean Battery Life 120 1789.4 1813.4 1773.9 661.5 60.4 Variable Minimum Maximum Q1 Q3 Battery Life 543.4 4300.8 1348.7 2210.3 trimmed mean

30 More Minitab

31

32 Stem and Leaf Plot Leaf Unit = 100 21 0 555677778888889999999 36 1 000222333333334 (40) 1 5555556666666677777777888888888899999999 44 2 0000000000111222223333333444444 13 2 56777888999 2 3 3 1 3 1 4 3

33

34 More Minitab

35

36 Time Series Plot Based upon the order that the data was generated

37 Time Series Plot Sorted by failure time

38

39

40

41 Computer Support This is easy if you use the computer. hang on, we are going to Excel…

42 A Recap … Population – the totality of observations with which we are concerned. Issue: conceptual vs. actual. Sample – subset of observations selected from a population. Statistic – any function of the observations in a sample. Sample range – If the n observations in a sample are denoted by x 1, x 2, …,x n, then the sample range is r = max(x i ) – min(x i ). Sample mean and variance. Note that these are functions of the observations in a sample and are, therefore, statistics.

43 More Recapping … Note difference in denominators Sample variance uses an estimate of the mean (xbar) in its calculation. If divided by n, the sample variance would be a biased estimate – biased low. Note terminology – ‘population parameter’ vs. ‘sample statistic’

44 Sampling Process X a random variable that represents one selection from a population. Each observation in the sample is obtained under identical conditions. The population does not change during sampling. The probability distribution of values does not change during sampling. f(x 1,x 2,…,x n ) = f(x 1 )f(x 2 )…f(x n ) if the sample is independent. Notation X 1, X 2,…, X n are the random variables. x 1, x 2,…, x n are the values of the random variables.

45 A Final Recap… A probability distribution is often a model for a population. This is often the case when the population is conceptual or infinite. The histogram should resemble to distribution of population values. The bigger the sample the stronger the resemblance.

46 Our Work Here Today is Done Next Week: The Glorious Midterm Prob/Stat students Discussing stem and leaf plots


Download ppt "Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data."

Similar presentations


Ads by Google