Download presentation
Presentation is loading. Please wait.
Published byLaurel Chambers Modified over 8 years ago
7
Discrete vs. Continuous Variables Quantitative variables can be further classified as discrete or continuous. If a variable can take on any value between two specified values, it is called a continuous variable; otherwise, it is called a discrete variable. Some examples will clarify the difference between discrete and continuous variables.
8
Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds. Suppose we flip a coin and count the number of heads. The number of heads could be any integer value between 0 and plus infinity. However, it could not be any number between 0 and plus infinity. We could not, for example, get 2.5 heads. Therefore, the number of heads must be a discrete variable.
9
Statistical data is often classified according to the number of variables being studied. Univariate data. When we conduct a study that looks at only one variable, we say that we are working with univariate data. Suppose, for example, that we conducted a survey to estimate the average weight of high school students. Since we are only working with one variable (weight), we would be working with univariate data. Bivariate data. When we conduct a study that examines the relationship between two variables, we are working with bivariate data. Suppose we conducted a study to see if there were a relationship between the height and weight of high school students. Since we are working with two variables (height and weight), we would be working with bivariate data
10
Our Heights… SOCSSOCS What about if we use numbers…
11
Statisticians use summary measures to describe patterns of data. Measures of central tendency refer to the summary measures used to describe the most "typical" value in a set of values. The most common of these is the MEDIAN and the MEAN
12
As measures of central tendency, the mean and the median each have advantages and disadvantages. Some pros and cons of each measure are summarized below. To illustrate these points, consider the following example... Suppose we examine a sample of 10 households to estimate the typical family income. Nine of the households have incomes between $20,000 and $100,000; but the tenth household has an annual income of $1,000,000,000. That tenth household is an outlier. If we choose a measure to estimate the income of a typical household, the mean will greatly over-estimate family income (because of the outlier); while the median will not. Thus, we say the Median is RESISTANT and the Mean is NOT RESISTANT…
13
The range is the difference between the largest and smallest values in a set of values.set For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11. For this set of numbers, the range would be 11 - 1 or 10.
14
The interquartile range (IQR) is the difference between the largest and smallest values in the middle 50% of a set of data. For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11.
15
1, 3, 4, 5, 5, 6, 7, 11
16
In a population, variance is the average squared deviation from the population mean, as defined by the following formula: where σ 2 is the population variance, μ is the population mean, X i is the i th element from the population, and N is the number of elements in the population. The variance of a sample, is defined by slightly different formula, and uses a slightly different notation: where s 2 is the sample variance, x is the sample mean, x i is the ith element from the sample, and n is the number of elements in the sample. Using this formula, the sample variance can be considered an unbiased estimate of the true population variance. Therefore, if you need to estimate an unknown population variance, based on data from a sample, this is the formula to use.
17
The standard deviation is the square root of the variance…
18
5-Number Summary and Boxplots… Boxplot splits the data set into quartiles. The body of the boxplot consists of a "box" (hence, the name), which goes from the first quartile (Q1) to the third quartile (Q3).quartiles Within the box, a vertical line is drawn at the Q2, the median of the data set. Two horizontal lines, called whiskers, extend from the front and back of the box. The front whisker goes from Q1 to the smallest non-outlier in the data set, and the back whisker goes from Q3 to the largest non-outlier.median How do we determine the outliers?
20
Range- IQR- Median-
21
Sometimes, researchers change units (minutes to hours, feet to meters, etc.). Here is how measures of variability are affected when we change units. CHANGING UNITS If you add a constant to every value, the distance between values does not change. As a result, all of the measures of variability (range, interquartile range, standard deviation, and variance) remain the same. On the other hand, suppose you multiply every value by a constant. This has the effect of multiplying the range, interquartile range (IQR), and standard deviation by that constant. It has an even greater effect on the variance. It multiplies the variance by the square of the constant.
22
Choosing a summary of your data…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.