Chapter 5 STATISTICS (PART 1)
What is statistics? Numerical facts The science of learning from data. Numerical facts Collection of methods for planning experiments, obtaining data and organizing, analyzing, interpreting and drawing the conclusions or making a decision
Basic Terms in Statistics Population: A collection, or set, of individuals or objects or events whose properties are to be analyzed. Sample: A subset of the population. Element: Entities on which data are collected. Observation: Value of variable for an element. Data Set: A collection of observation on one or more variables. Population Sample
Grouped Data Vs Ungrouped Data Ungrouped data – Data that has not been organized into groups. Also called as raw data. Grouped data - Data that has been organized into groups (into a frequency distribution). Data Frequency 2 8 3 4 5 6 7 9 Data Frequency 2 – 4 5 5 – 7 6 8 – 10 10 11 – 13 8 14 – 16 4 17 – 19 3
Data type VARIABLES QUALITATIVE NOMINAL Example: gender, color ORDINAL Example: Pass/Fail, Good, Bad QUANTITATIVE DISCRETE Example: Counts- number of items/integers CONTINUOUS Example: Measurement- Length, weight
Example 6.1 Identify each of the following examples as qualitative or quantitative variables. 1. The residence hall for each student in a statistics class. (qualitative) 2. The amount of gasoline pumped by the next 10 customers at the local Unimart. (quantitative ) 3. The amount of radon in the basement of each of 25 homes in a new development. (quantitative ) 4. The color of the baseball cap worn by each of 20 students. (qualitative) 5. The length of time to complete a mathematics homework assignment. (quantitative ) 6. The state in which each truck is registered when stopped and inspected at a weigh station. (qualitative
Discrete & coNTINUOUS Discrete data is data which can only take certain values, or can be counted. The number of people in a room can only be 1, 2, 3, … and not 1.23, 1.57, 10.22. Example: - Number of car on a road - Number of children in a family Continuous data cannot assume exact values but can assume any values between two given values. The data is acquired through the process of measuring. For example, the height 175 cm (correct to the nearest cm) could have arisen from any values in the range. - Weight of people - Speeds of motor boats at a particular part of a race - The times taken by each of student to run 100m
STATISTICS Descriptive Inferential Provide simple summaries about the sample and the measures Trying to reach conclusion that extend beyond the immediate data alone STATISTICS Descriptive Inferential - Measurement of central tendency - Measurement of dispersion Confidence Interval Hypothesis testing
Descriptive Statistics A study on data summary or describes a collection, data organization (presentation of data in a more informative way such as graphical, diagrams and charts). In general divided by two categories :- - Data presentation (display) - Tabular - Charts/graphs
Inferential Statistics Branch of statistics: using a sample to draw conclusions about a population (basic tool: probability). Consists of generalizing from samples to population, performing estimations and hypothesis tests, determining relationships among variables, and making predictions. Area statistics which are deal with decision making procedures. Population – consists of all subjects (human or otherwise) that are being studied. Sample – is a group of subjects selected from a population.
Weight of 100 male students in XYZ university Constructing Frequency Distribution When summarizing large quantities of raw data, it is often useful to distribute the data into classes. A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class. Weight Frequency 60-62 5 63-65 18 66-68 42 69-71 27 72-74 8 Total 100 Weight of 100 male students in XYZ university
Table 6.1: Weight of 100 male students in XYZ university For quantitative data, an interval that includes all the values that fall within two numbers; the lower and upper class which is called class. Class is in first column for frequency distribution table. *Classes always represent a variable, non-overlapping; each value is belong to one and only one class. The numbers listed in second column are called frequencies, which gives the number of values that belong to different classes. Frequencies denoted by f. Table 6.1: Weight of 100 male students in XYZ university Variable Weight Frequency 60-62 5 63-65 18 66-68 42 69-71 27 72-74 8 Total 100 Frequency column Frequency of the third class. Third class (Interval Class)
The class boundary is given by the midpoint of the upper limit of one class and the lower limit of the next class. The difference between the two boundaries of a class gives the class width; also called class size. Formula: - Class Midpoint or Mark Class midpoint or mark = (Lower Limit + Upper Limit)/2 - Class Width / Class Size class width , c =Upper Limit– Lower Limit
Cumulative Frequency Distributions A cumulative frequency distribution gives the total number of values that fall below the upper boundary of each class. In cumulative frequency distribution table, each class has the same lower limit but a different upper limit. Table 6.2: Class Limit, Class Boundaries, Class Width , Cumulative Frequency Weight (Class Interval) Number of Students, f Class Boundaries Cumulative Frequency 60-62 5 59.5-62.5 63-65 18 62.5-65.5 5 + 18 = 23 66-68 42 65.5-68.5 23 + 42 = 65 69-71 27 68.5-71.5 65 + 27 =92 72-74 8 71.5-74.5 92 + 8 = 100 TOTAL 100
Example 6.9: From Table 6.1: Class Boundary Weight (Class Interval) Class Boundary Frequency 60-62 59.5-62.5 5 63-65 62.5-65.5 18 66-68 65.5-68.5 42 69-71 68.5-71.5 27 72-74 71.5-74.5 8 Total 100
Data summary - Variance - Standard deviation Measures of Central Tendency - Mean - Median - Mode Measures of Dispersion - Variance - Standard deviation Measures of average are also called measures of central tendency and include the mean, median, mode, and midrange. After know about average, you must know how the data values are dispersed. That is, do the data values cluster around the mean.
Measures of Central Tendency Mean Mean of a sample is the sum of the sample data divided by the total number sample. GROUPED DATA: When the data has been grouped into intervals and the mid-points of the intervals are denoted by
Exercise 6.2: Consider data set of weights of 30 students. Find the mean of grouped data. Answer: 46.5 kg Weight(kg) Frequency (f) 20-29 1 30-39 8 40-49 10 50-59 6 60-69 5
Median The median is the middle value of a set of numbers arranged in order of magnitude and normally is denoted by, GROUPED DATA: The median of frequency distribution data can be described as:
Exercise 6.3: Find the median of the following data: 1. Answer: 15.5 2. Answer: 11 Class Frequency 1-5 6-10 11-15 16-20 21-25 26-30 2 4 9 7 5 3 Total 30 Class interval 1 – 3 4 – 6 7 – 9 10 – 12 13 – 15 16 – 18 Frequency 5 3 2 1 6 4
Mode The mode of a set of numbers is the value which occurs most often and denoted by , GROUPED DATA: The median of frequency distribution data can be described as: NOTE: Class with the highest frequency is called MODAL CLASS
Exercise 6.4: Find the mode of the following data: 1. Answer: 14.07 2. Class Frequency 1-5 6-10 11-15 16-20 21-25 26-30 2 4 9 7 5 3 Total 30 Class interval 1 – 3 4 – 6 7 – 9 10 – 12 13 – 15 16 – 18 Frequency 5 3 2 1 6 4
relationships between the measurements When the mean, median and mode are all equal, the distribution of the data set has a bell-shaped curve. The distribution is then said to be symmetric. If Mode < Median < Mean, then the distribution is said to be positive/right skewed, meaning there are a few unusual large values. If Mean < Median < Mode, then the distribution is said to be negative/left skewed, that is there are some unusual small values.
Measures of Dispersion The standard deviation from the mean is used widely in statistics to indicate the measure of dispersion. Small standard deviation tells that most of the data is close to the mean. While large standard deviation shows that much of the data is far from the mean.
Variance & standard deviation GROUPED DATA:
Weight (Class Interval) Cumulative Frequency, F Example 6.10 (Grouped data) Find the variance and standard deviation of the sample data below: Answer : s2=8.61;s=2.93 Weight (Class Interval) Frequency, f Class Mark, x fx Cumulative Frequency, F Class Boundary 60-62 63-65 66-68 69-71 72-74 5 18 42 27 8 61 64 67 70 73 305 1152 2814 1890 584 23 65 92 100 59.5-62.5 62.5-65.5 65.5-68.5 68.5-71.5 71.5-74.5 Total 6745
Exercise 6.45: Consider data set of weights of 30 students. Find the standard deviation. Answer: Weight(kg) Frequency (f) 20-29 1 30-39 8 40-49 10 50-59 6 60-69 5