Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.

Business Statistics for Managerial Decision Farideh Dehkordi-Vakil

Comparing Two Proportions We often want to compare the proportions of two groups (such as men and women) that have some characteristics. We call the two groups being compared Population 1 and population 2. The two population proportions of “Successes” P 1 and P 2. The data consist of two independent SRS The sample sizes are n 1 from population 1 and n 2 from population 2.

Comparing Two Proportions The proportion of successes in each sample estimates the corresponding population proportion. Here is the notation we will use populationpopulationSample Count of Sample proportion size successes proportion 1 P 1 n 1 X 1 2 P 2 n 2 X 2

Sampling Distribution of Choose independent SRS of sizes n 1 and n 2 from two populations with proportions P 1 and P 2 of successes. Let be the difference between the two sample proportions of successes. Then as both sample sizes increase, the sampling distribution of D becomes approximately Normal. The mean of the sampling distribution is. The standard deviation of the sampling distribution is

Sampling Distribution of The sampling distribution of the difference of two sample proportions is approximately Normal. The mean and standard deviation are found from the two population proportions of successes, P 1 and P 2

Confidence Interval Just as in the case of estimating a single proportion, a small modification of the sample proportions greatly improves the accuracy of confidence intervals. The Wilson estimates of the two population proportions are

Confidence Interval The standard deviation of is approximately To obtain a confidence interval for P 1 -P 2, we replace the unknown parameters in the standard deviation by estimates to obtain an estimated standard deviation, or standard error.

Confidence Interval for Comparing Two Proportions

Example:”No Sweat” Garment Labels Following complaints about the working conditions in some apparel factories both in the United States and Abroad, a joint government and industry commission recommended in 1998 that companies that monitor and enforce proper standards be allowed to display a “No Sweat” label on their product. A survey of U.S. residents aged 18 or older asked a series of questions about how likely they would be to purchase a garment under various conditions.

Example:”No Sweat” Garment Labels For some conditions, it was stated that the garment had a “No Sweat” label; for others, there was no mention of such label. On the basis of of the responses, each person was classified as a “label user” or “ a “label nonuser.” About 16.5% of those surveyed were label users. One purpose of the study was to describe the demographic characteristics of users and nonusers.

Example:”No Sweat” Garment Labels The study suggested that there is a gender difference in the proportion of label users. Here is a summary of the data. Let X denote the number of label users. populationnX 1 (women)296630.2130.215 2 (men)251270.1080.111

Example:”No Sweat” Garment Labels First calculate the standard error of the observed difference. The 95% confidence interval is

Example:”No Sweat” Garment Labels With 95% confidence we can say that the difference in the proportions is between 0.04 and 0.16. Alternatively, we can report that the women are about 10% more likely to be label users than men, with a 95% margin of error of 6%. In this example we chose women to be the first population. Had we chosen men as the first population, the estimate of the difference would be negative (-0.104). Because it is easier to discuss positive numbers, we generally choose the first population to be the one with the higher proportion. The choice does not affect the substance of the analysis.

Significance Tests It is sometimes useful to test the null hypothesis that the two population proportions are the same. We standardize by subtracting its mean P 1 -P 2 and then dividing by its standard deviation If n 1 and n 2 are large, the standardized difference is approximately N(0, 1). To estimate  D we take into account the null hypothesis that P 1 = P 2.

Significance Tests If these two proportions are equal, we can view all of the data as coming from a single population. Let P denote the common value of P 1 and P 2. The standard deviation of is then

Significance Tests We estimate the common value of P by the overall proportion of successes in the two samples. This estimate of P is called the pooled estimate. To estimate the standard deviation of D, substitute for P in the expression for  DP. The result is a standard error for D under the condition that the null hypothesis H 0 : P 1 = P 1 is true. The test statistic uses this standard error to standardize the difference between the two sample proportions.

Significance Tests for Comparing Two Proportions

Example:men, women, and garment labels. The previous example presented the survey data on whether consumers are “label users” who pay attention to label details when buying a shirt. Are men and women equally likely to be label users? Here is the data summary: PopulationnX 1 (women)296630.213 2 (men)251270.108

Example:men, women, and garment labels We compare the proportions of label users in the two populations (women and men) by testing the hypotheses H 0 :P 1 = P 2 H a :P 1  P 2 The pooled estimate of the common value of P is: This is the proportion of label users in the entire sample.

Example:men, women, and garment labels The test statistic is calculated as follows: The observed difference is more than 3 standard deviation away from zero.

Example:men, women, and garment labels The P-value is: Conclusion: 21% of women are label users versus only 11% of men; the difference is statistically significant.

Simple Regression Simple regression analysis is a statistical tool That gives us the ability to estimate the mathematical relationship between a dependent variable (usually called y) and an independent variable (usually called x). The dependent variable is the variable for which we want to make a prediction. While various non-linear forms may be used, simple linear regression models are the most common.

Introduction The primary goal of quantitative analysis is to use current information about a phenomenon to predict its future behavior. Current information is usually in the form of a set of data. In a simple case, when the data form a set of pairs of numbers, we may interpret them as representing the observed values of an independent (or predictor ) variable X and a dependent ( or response) variable Y.

Introduction The goal of the analyst who studies the data is to find a functional relation between the response variable y and the predictor variable x.

Regression Function The statement that the relation between X and Y is statistical should be interpreted as providing the following guidelines: 1.Regard Y as a random variable. 2.For each X, take f (x) to be the expected value (i.e., mean value) of y. 3.Given that E (Y) denotes the expected value of Y, call the equation the regression function.

Historical Origin of Regression Regression Analysis was first developed by Sir Francis Galton, who studied the relation between heights of sons and fathers. Heights of sons of both tall and short fathers appeared to “revert” or “regress” to the mean of the group.

Basic Assumptions of a Regression Model A regression model is based on the following assumptions: 1. There is a probability distribution of Y for each level of X. 2. Given that y is the mean value of Y, the standard form of the model is where  is a random variable with a normal distribution.

Statistical relation between Lot Size and number of man-Hours-Westwood Company Example

Pictorial Presentation of Linear Regression Model

Construction of Regression Models Selection of independent variables Functional form of regression relation Scope of model

Uses of Regression Analysis Regression analysis serves Three major purposes. 1. Description 2. Control 3. Prediction The several purposes of regression analysis frequently overlap in practice

Formal Statement of the Model General regression model 1.  0, and  1 are parameters 2. X is a known constant 3. Deviations  are independent N(o,  2 )

Meaning of Regression Coefficients The values of the regression parameters  0, and  1 are not known.We estimate them from data.  1 indicates the change in the mean response per unit increase in X.

Regression Line If the scatter plot of our sample data suggests a linear relationship between two variables i.e. we can summarize the relationship by drawing a straight line on the plot. Least squares method give us the “best” estimated line for our set of sample data.

Regression Line We will write an estimated regression line based on sample data as The method of least squares chooses the values for b 0, and b 1 to minimize the sum of squared errors

Regression Line Using calculus, we obtain estimating formulas:

Estimation of Mean Response Fitted regression line can be used to estimate the mean value of y for a given value of x. Example The weekly advertising expenditure (x) and weekly sales (y) are presented in the following table.

Point Estimation of Mean Response From previous table we have: The least squares estimates of the regression coefficients are:

Point Estimation of Mean Response The estimated regression function is: This means that if the weekly advertising expenditure is increased by $1 we would expect the weekly sales to increase by $10.8.

Point Estimation of Mean Response Fitted values for the sample data are obtained by substituting the x value into the estimated regression function. For example if the advertising expenditure is $50, then the estimated Sales is: This is called the point estimate of the mean response (sales).

Residual The difference between the observed value y i and the corresponding fitted value. Residuals are highly useful for studying whether a given regression model is appropriate for the data at hand.

Example: weekly advertising expenditure

Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.

Similar presentations

Presentation on theme: "Business Statistics for Managerial Decision Farideh Dehkordi-Vakil."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.

Similar presentations

Presentation on theme: "Business Statistics for Managerial Decision Farideh Dehkordi-Vakil."— Presentation transcript:

Similar presentations

About project

Feedback