Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.

Descriptive Statistics for one Variable

Variables and measurements A variable is a characteristic of an individual or object in which the researcher is interested. For example the SAT score for a college student. For a particular individual or object the variable will take a value called measurement. For example, John’s SAT is 720.

Different Types of Variables Some variables are quantitative variable, like the time for a person to finish a task or the person’s age. Other variables are qualitative variables as the person’s nationality or the person’s preferred sport. In this note we will work with quantitative variables. All the measurement collected from individuals about a particular data is referred a “data”. Our data will contain the measurement for only one variable.

Statistics has two major chapters: Descriptive Statistics Inferential statistics

Statistics Descriptive Statistics Provides numerical and graphic procedures to summarize the information of the data in a clear and understandable way Inferential Statistics Provides procedures to draw inferences about a population from a sample

Population and Samples The Population under study is the set off all individuals of interest for the research. We will see that, in practice, the variable is measured only for a part of the population. That part of the population for which we collect measurements is called sample. The number of individuals in a sample is denoted by n. In this notes and examples we will assume that our data correspond to a sample of the population under study.

Descriptive Measures Central Tendency measures. They are computed in order to give a “center” around which the measurements in the data are distributed. Variation or Variability measures. They describe “data spread” or how far away the measurements are from the center. Relative Standing measures. They describe the relative position of a specific measurement in the data.

Measures of Central Tendency Mean: Sum of all measurements in the data divided by the number of measurements. Median: A number such that at most half of the measurements are below it and at most half of the measurements are above it. Mode: The most frequent measurement in the data.

Example of Mean MEAN = 40/10 = 4 Notice that the sum of the “deviations” is 0. Notice that every single observation intervenes in the computation of the mean.

Example of Median Median: (4+5)/2 = 4.5 Notice that only the two central values are used in the computation. The median is not sensible to extreme values

Example of Mode In this case the data have two modes: 5 and 7 Both measurements are repeated twice

Example of Mode Mode: 3 Notice that it is possible for a data not to have any mode.

Measures of Variability Range Variance Standard Deviation

The Range Definition: The range of a data is the difference between the largest and the smallest measurements in the data. To find the range, first order the data from least to greatest. Then subtract the smallest value from the largest value in the set. Example: A marathon race was completed by 7 participants. What is the range of times given in hours below? 2.3 hr, 8.7 hr, 3.5 hr, 5.1 hr, 4.9 hr, 7.1 hr, 4.2 hs Ordering the data from least to greatest, we get: 2.3, 3.5, 4.2, 4.9, 5.1, 7.1, 8.7. So highest - lowest = 8.7 hr - 2.3 hr = 6.4 hr Answer: The range of swim times is 6.4 hr.

The Range is not Enough Consider the following examples of data 1,1,1,1,8 1,2,4,6,8 1,8,1,8,1 In the three cases the Range is the same: Range = 7 However, the three series exhibit completely different distributions of values along the range of values

The sample variance The variance takes into account the deviation around the mean of the Data. The formula for the sample variance is as follows

The Standard Deviation consists of the square root of the Variance Notice that the mean and the standard deviation have the same unit as the one of the measurements

Variance (for a sample) Steps: –Compute each deviation –Square each deviation –Sum all the squares –Divide by the data size (sample size) minus one: n-1

Example of Variance Variance = 54/9 = 6 It is a measure of “spread”. Notice that the larger the deviations (positive or negative) the larger the variance

The standard deviation It is defined as the square root of the variance In the previous example Variance = 6 Standard deviation = Square root of the variance = Square root of 6 = 2.45 The standard deviation summarizes the deviations in one number

Percentiles The p-th percentile is a number such that at most p% of the measurements are below it and at most 100 – p percent of the data are above it. Example, if in a certain data the 85 th percentile is 340 means that 15% of the measurements in the data are above 340. It also means that 85% of the measurements are below 340 Notice that the median is the 50 th percentile

Tchebichev’s Rule The standard deviation can be used to construct an interval enclosing an important percent of the data. In fact, this rule says that for any data set: At least 75% of the measurements differ from the mean less than twice the standard deviation. At least 89% of the measurements differ from the mean less than three times the standard deviation. Note: This is a general property and it is called Tchebichev’s Rule: At least 1-1/k 2 of the observation falls within k standard deviations from the mean. It is true for every dataset.

Example of Tchebichev’s Rule Suppose that for a certain data is : Mean = 20 Standard deviation =3 Then: A least 75% of the measurements are between 14 and 26 At least 89% of the measurements are between 11 and 29

Further Notes When the Mean is greater than the Median the data distribution is skewed to the Right. When the Median is greater than the Mean the data distribution is skewed to the Left. When Mean and Median are very close to each other the data distribution is approximately symmetric.

Empirical Rule (68-95-99.7 Rule ) For “Normal Distributions” (Data sets whose histograms are bell or mount shaped): Approx. 68% of values are within 1 standard deviation of the mean Approx. 95% of values are within 2 standard deviations of the mean Approx. 99.7% of values are within 3 standard deviations of the mean

Example of Empirical Rule Suppose that the hourly wages of certain type of workers have a “normal distribution” ( bell shaped histogram). Assume also that the mean is $16 with a standard deviation of $1.5 The we have: 1 standard deviation = $1.5 2 standard deviations = $3.0 3 standard deviations = $4.5 What does the empirical rule allow us to say?

Solution The empirical rule allows us to say that: Approx. 68% of workers in this occupation earn wages that are within 1 standard deviation of the mean : –Between 14 – 1.5 and 14 + 1.5 –Between $12.5 and $15.5 Approx. 95% of workers in this occupation earn wages that are within 2 standard deviation of the mean : –Between 14 – 3 and 14 + 3 –Between $11.0 and $17.0 Approx. 99.7% of workers in this occupation earn wages that are within 3 standard deviation of the mean : –Between 14 – 4.5 and 14 + 4.5 –Between $9.5 and $18.5

Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.

Similar presentations

Presentation on theme: "Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.

Similar presentations

Presentation on theme: "Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher."— Presentation transcript:

Similar presentations

About project

Feedback