Download presentation
1
Statistics
2
Some Stats Quotes There are three kinds of lies: lies, damned lies, and statistics. Benjamin Disraeli The statistics on sanity are that one out of every four Americans is suffering from some form of mental illness. Think of your three best friends. If they're okay, then it's you. Rita Mae Brown Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital. Aaron Levenstein
3
What is Statistics What is statistics? Why it is important?
It is the science of learning from data. Why it is important? It is everywhere! It can be used widely in different areas such as medicine, psychology, politics, business , etc.
4
Definition Data: Information in raw or unorganized form (such as alphabets, numbers, or symbols) that refer to, or represent, conditions, ideas, or objects. Population: a population is a complete set of items that share at least one property in common that is the subject of a statistical analysis. Sample: A subset of the population
5
How can we learn from data?
Collecting data Survey, interview, census, experiments Primary data (you collect the data yourself) or Secondary data( you collect the data from other sources) Unbiased and random Analyzing data Drawing conclusion from data
6
Types of Data Data Quantitative (test score, no.of students) Discrete
(height, weight, temperature) Continuous Qualitative Nominal Ordinal
7
Types of data Nominal Variable: A qualitative variable that categorizes (or describes, or names) an element of a population. Ordinal Variable: A qualitative variable that incorporates an ordered position, or ranking. Discrete Variable: A quantitative variable that can assume a countable number of values. Intuitively, a discrete variable can assume values corresponding to isolated points along a line interval. That is, there is a gap between any two values. Continuous Variable: A quantitative variable that can assume an uncountable number of values. Intuitively, a continuous variable can assume any value along a line interval, including every possible value between any two values.
8
Exercise Example: Identify each of the following as examples of (1) nominal, (2) ordinal, (3) discrete, or (4) continuous variables: 1. The length of time until a pain reliever begins to work. 2. The number of chocolate chips in a cookie. 3. The number of colors used in a statistics textbook. 4. The brand of refrigerator in a home. 5. The overall satisfaction rating of a new car. 6. The number of files on a computer’s hard disk. 7. The pH level of the water in a swimming pool. 8. The number of staples in a stapler.
9
Data There are 26 children of ages 1-6. The data are as following:
2,3,1,4,5,1,3,4,5,3,6,1,3,3,5,3,1,4,6,2,1,4,5,3,3,4 We need to rearrange it: 1,1,1,1,1,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,5,5,5,5,6,6
10
Frequency Distribution
Frequency: how many times a number occurs The frequency distribution of variable ‘age’ can be tabulated as follows: Frequency Distribution of Age Age 1 2 3 4 5 6 Frequency 7 Grouped Frequency Distribution of Age: Age Group 1-2 3-4 5-6 Frequency 8 12 6
11
Cumulative Frequency The total of a frequency and all frequencies so far in a frequency distribution. Cumulative frequency of data in previous page Age 1 2 3 4 5 6 Frequency 7 Cumulative Frequency 8 15 20 24 26 Age Group 1-2 3-4 5-6 Frequency 8 12 6 Cumulative Frequency 20 26
12
Data Presentation Two types of statistical presentation of data - graphical and numerical. Graphical Presentation: We look for the overall pattern and for striking deviations from that pattern. Over all pattern usually described by shape, center, and spread of the data. Bar diagram and Pie charts are used for categorical variables. Histogram, stem and leaf and Box-plot are used for numerical variable.
13
Data Presentation –Categorical Variable
Bar Diagram: Lists the categories and presents the percent or count of individuals who fall in each category. Treatment Group Frequency Proportion Percent (%) 1 15 (15/60)=0.25 25.0 2 25 (25/60)=0.333 41.7 3 20 (20/60)=0.417 33.3 Total 60 1.00 100
14
Data Presentation –Categorical Variable
Pie Chart: Lists the categories and presents the percent or count of individuals who fall in each category. Treatment Group Frequency Proportion Percent (%) 1 15 (15/60)=0.25 25.0 2 25 (25/60)=0.333 41.7 3 20 (20/60)=0.417 33.3 Total 60 1.00 100
15
Graphical Presentation –Numerical Variable
Histogram: Overall pattern can be described by its shape, center, and spread. The following age distribution is right skewed. The center lies between 80 to 100. Mean Standard Error Median 84 Mode Standard Deviation Sample Variance Kurtosis Skewness Range 95 Minimum 48 Maximum 143 Sum 5425 Count 60
16
Graphical Presentation –Numerical Variable
Box-Plot: Describes the five-number summary
17
Numerical Presentation
Find the center value of the whole set of observations: Measures for center measurement: Mean Median Mode Find the dispersion (e.g., average distance from the mean) to indicate how well the central value characterizes the data as a whole: Methods of Variability Measurement Variance Standard deviation Range
18
Definition Mean: The average of the data
Median: The middle number of an ordered set Mode: The number which appears most often in a set of numbers. Variance: measures how far a set of numbers is spread out. Standard deviation: A measure of the dispersion of a set of data from its mean. Range: the difference between the largest and smallest value in a set.
19
Mean Mean is the average of the data
To calculate mean, just add up all the numbers and then divide by how many numbers there are E.g. Find the mean for 2,3,5,7,8 ( )/5=5
20
Median The ‘middle number’ of a set of numbers
In order to find the median, the list of number should be rearranged into numerical order. E.g. 13, 18, 13, 14, 13, 16, 14, 21, 13 Rearrange : 13, 13, 13, 13, 14, 14, 16, 18, 21 There are nine numbers in the list, so the middle one will be the (9 + 1) ÷ 2 = 5th number:13, 13, 13, 13, 14, 14, 16, 18, 21 Medium=14
21
Median What if the total number is even?
Choose the middle pair and then take the average E.g. 13, 13, 13, 13, 14, 14, 16, 18 There are eight numbers in the list, so the middle pair will be 4th and 5th number which is 13 and 14 Median= (13+14)/2=13.5
22
Mean or Median? Outlier: The very extreme number in a set
The median is less sensitive to outliers (extreme scores) than the mean and thus a better measure than the mean for highly skewed distributions, e.g. family income. For example mean of 20, 30, 40, and 990 is ( )/4 =270. The median of these four observations is (30+40)/2 =35. Here 3 observations out of 4 lie between So, the mean 270 really fails to give a realistic picture of the major part of the data. It is influenced by extreme value 990.
23
Exercise1 A student has gotten the following grades on his tests: 87, 95, 76, and 88. He wants an 85 or better overall. What is the minimum grade he must get on the last test in order to achieve that average?
24
Solution The unknown score is "x". Then the desired average is:
Multiplying through by 5 and simplifying, we get: x = 425 346 + x = 425 x = 79 He needs to get at least a 79 on the last test.
25
Variance The average of the squared differences from the Mean.
Work out the Mean (the simple average of the numbers) Then for each number: subtract the Mean and square the result (the squared difference). Then work out the average of those squared differences. (Why Square?) E.g. Find the variance for 2,3,5,7,8 Mean= ( )/5=5 Variance= ((-3)2+(-2) )/5=5.2
26
Standard deviation & Range
Standard deviation is calculated as the square root of variance. The more spread apart the data, the higher the deviation. Range is a crude measure of variability.
27
Exercise 2 Find the mean, median, mode, variance, standard deviation and range for the following sets of numbers. 11,8,10,5,12,11,10,11,13,9
28
Five Number Summary Five Number Summary: The five number summary of a distribution consists of the smallest (Minimum) observation, the first quartile (Q1),The median(Q2), the third quartile, and the largest (Maximum) observation written in order from smallest to largest.
29
The Box Plot Box Plot: A box plot is a graph of the five number summary. The central box spans the quartiles. A line within the box marks the median. Lines extending above and below the box mark the smallest and the largest observations (i.e., the range). Outlying samples may be additionally plotted outside the range.
30
Choosing a Summary The five number summary is usually better than the mean and standard deviation for describing a skewed distribution or a distribution with extreme outliers. The mean and standard deviation are reasonable for symmetric distributions that are free of outliers.
31
Skewness Measures asymmetry of data
Positive or right skewed: Longer right tail Negative or left skewed: Longer left tail
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.