Business Statistics
Outline Dealing with decision problem when the face of uncertainty are important. Descriptive Statistics Sampling and Sampling Distributions Point and Interval Estimation Hypothesis Testing Non-parametric Test - Chi-square Test Analysis of Variance
Outline (cont.) Time Series and Forecasting Survey and sampling methods Multivariate Analysis Bayesian Statistics and Decision Analysis
Descriptive Statistics Session 1
Population and sample Measures of Central Tendency Mean, Median, Mode Measures of Dispersion Variance, Standard deviation Percentile, Inter-quartile range Grouped data and histogram Other data representations Descriptive Statistics
Population and Sample Population The population consists of the set of all measurements in which the investigator is interested. The population is also called the universe. Sample A sample is a subset of measurements selected from the population. Sampling from the population is often done randomly i.e. such that every possible sample of n elements will have equal chance of being selected. A sample created in this way is called simple random sample or random sample.
A medical manufacturer interested in marketing a new drug may be required the Food and Drug Administration (FDA) to prove that the drug does not cause any serious side effect. The sampling was made by selecting a sample of people randomly, the result of tests of drug using on this sample may then be used in a statistical inference about the entire population of people who may use the drug if it will be introduced. Example 1.1.
Illustration for simple random sampling
Measures of Central Tendency Mean Arithmetic Mean - AM Given a set of data, the arithmetic mean is defined as follows: ModeThe mode of a data set is the value that occurs most frequently This kind of mean is the most frequently used.
Measures of Central Tendency Harmonic Mean - HM This kind of mean is used when dealing with velocity.
Population Mean Sample Mean Median The median of a set of observations is a special point, it lies in position that half of the data lie below it and half above it. Measures of Central Tendency
Set 1: Ordering 7, 9, 15, 18, 20; median is 15 Set 2: Ordering Median = ( )/2 = 21.8 Example 1.2. Find median of the following two sets of data. Set 1: (n=5) Set 2: (n=6)
Measurements of Dispersion The variance of a set of observations is the average squared deviation of the data points from their mean. Variance and Standard Deviation Sample Variance Note The denominator is of (n-1)
Population Variance The standard deviation of a set of observations is the square root of the variance of the set Measurements of Dispersion Variance and Standard Deviation
Percentiles The P th percentile of a group of numbers is that value below which lie P% (P percent) of the numbers in the group. The position is given by (n+1)* P /100 where n is the number of data points. (GRE, GMAT Test) Measurements of Dispersion
Quartiles The percentage points that break the data set into 4 groups by the quarters-1st quarter, 2nd quarter and 3rd quarter 1st quartile Q 1 is the 25 th percentile. 2nd quartile Q 2 is the 50 th percentile. 3rd quartile Q 3 is the 75 th percentile. Inter-Quartile Range IQR = Q 3 - Q 1 Measurements of Dispersion
Example 1.3. Given a data set including 22 points: 88, 56, 64, 45, 52, 76, 54, 79, 38, 98, 69, 77, 71, 45, 60, 78, 90, 81, 87, 44, 80, 41. Find the 20th, 30th and 90th percentiles. Also find the IQR. What are mean, mode and median? What is the variance of the set ? SPSS Measurements of Dispersion
Grouped Data and Histogram Classes We divide the data values into classes which have the same length and cover all data points. Each class represents for a m i observation value. Frequencies f i The number of observations in each class. Total frequencies is number of observations N. The relative frequency of each class is the ratio of individual frequency and N. Histogram
Mean and Variance of grouped data Population Variance Mean Sample Mean Variance Where K is number of classes, n is number observations of sample. Grouped Data and Histogram
The number of errors in a text books was found. Number of errors per page is placed in column (mi) while column (fi) shows the number of pages contains errors. The following table and charts show histogram of errors distribution: Example1.4 Grouped Data and Histogram
Example1.4
Other Descriptive Statistics Index numbers Simple index numbers A index number is a number that measures the relative change in a set of measurements over time. Index number for period i = 100 (value in period i / value in base period )
Other Descriptive Statistics
Consumer Price Index - Laspeyres Index Laspeyres Index gives us a measurement for a change of quantity and price of items. Other Descriptive Statistics
Items Price QuantityPrice QuantityPrice Quantity Beef Pork Eggs Milk Bread Potatoes Tomatoes Oranges Other Descriptive Statistics
Compute the Laspeyres Index: –Select year 1993 as a base year For 1993: Sum of quantity x price = For 1994: Sum of quantity x price = For 1995: Sum of quantity x price = –Laspeyres Index: For 1993:100 For 1994: For 1993: Other Descriptive Statistics
Stem-and-Leaf Displays A way for re-arranging data to allow the data “speak for themselves”. Given the data set: 11, 12, 12, 13, 14, 15, 15, 16, 20, 21, 21, 21, 21, 22, 25, 25, 26, 27, 28, 29, 29, 31, 32, 34, 35, 36, 38, 41, 42, 45, 47, 50, 52, 55, 60, 62 Example Other Descriptive Statistics
The Stem-and-leaf display Other Descriptive Statistics
Box-Whiskers plot Other Descriptive Statistics
Examples for Box-Whiskers plot
Box-Whisker plot (or Box plot) are useful for the following purposes. To identify the spread of data set. To identify the location of data set based on median. To identify possible skewness of the distribution. To identify suspected outlier and outlier. To quickly compare data sets. Look at example in SPSS