Basic Statistics for Engineers. Collection, presentation, interpretation and decision making. Prof. Dudley S. Finch
Statistics Four steps: – Data collection including sampling techniques – Data presentation – Data analysis – Conclusions and decisions based on the analysis
Data types Discrete – Defined as: A variable consisting of separate values; for example the number of bolts in a packet. There may be 8 or 9 but there cannot be 8.5 Continuous – Defined as: A variable which may have any value; for example the diameter of steel bars after machining. Any diameter is possible within the allowable tolerance to which the machine is set.
Sampling Often not practical to examine every component therefore sampling techniques are used. Sample should be representative of the complete set (the population) of values from which it has been chosen. Although not guaranteed, we attempt to chose an unbiased sample. To be unbiased every possible sample must have an equal chance of being chosen. Satisfied if sample is chosen at random; that is, if there is no order in the way the sample is chosen. This is called a random sample.
Random samples The larger the random sample the more representative of the population it is likely to be. Random sampling can be carried out by allocating a number to each member of the population and then drawing numbered balls from a bag or using a random number generator. Sampling techniques involve probability theory (will be dealt with later).
Data presentation Measured weights of a casting (lbs).
Frequency distribution Mass of casting Number of castings (frequency)f The class interval should be one that emphasizes any pattern in the data. Typically between 8 and 15 class intervals should be chosen. In the example used, a class interval of 1lb is chosen. 50lbs therefore includes 49.5 to 50.4lbs. We can therefore compile a frequency distribution table.
Bar chart
Histogram Variable x (lbs) Frequency (f)
Frequency polygon
Frequency curve
Pie chart showing relative frequency Relative frequency = class frequency / total frequency of the sample e.g. the relative frequency of the 53lb class is 8/66 or 0.121
Numerical methods of a distribution A frequency distribution can be represented by two numerical quantities: – Central tendency or average value of the distribution – Dispersion or scatter of variables about the average value
Numerical measures of central tendency Mid point of range: – Difference between the largest and smallest values of the variable Generally poor measure of central tendency since it depends only on the extreme values of the variable and is not influenced by the form of the distribution. Mode: – The most frequently occurring value of the variable Easily obtained from frequency table. For the casting the mode = 55lbs.
Arithmetic mean – Determined by adding all the values of the variable and dividing this by the total number of values. If x 1, x 2, x 3, ….x n are the N values then…
For frequency distribution tables :
To calculate standard deviation:
Estimation Applies to the difficulty of obtaining data about the population from which the sample was drawn and in setting up a mathematical model to describe this population. Two components: estimation and testing of hypotheses about the chosen model.
Two types of estimates: Point estimate – Estimate of a population parameter expressed as a single number This method gives no indication as to the accuracy of the estimate Interval estimate – Estimate of a population parameter expressed as two numbers This method is preferable as it gives an indication as to where the population parameter is expected to lie
Confidence intervals In practice, the true standard deviation, , is unknown and that the sample standard deviation, s, is used to estimate . If a random sample size n is drawn, an estimate of the standard error of the sample mean is given by Need to determine the confidence interval for the true mean, . For n>30 a good approximation can be obtained. For small samples a wider interval is used.
Use of Student t-distribution tables Look up value for (n-1) and use desired confidence limits (0.01= 98%, = 99%, = 99.8%, etc.). Find The true mean = sample mean t ½ ,n-1
For castings example: Sample mean = 54.3lbs Standard deviation, s = 1.83lbs n = 66 Using t 0.005, 65 the true mean is given by: 54.3 2.66 x = Thus we can be 99% confident that the true mean lies between 53.7 and 54.9