Statistical Analysis Of Population Prepared by, Sushruth Puttaswamy.

Slides:



Advertisements
Similar presentations
Introduction Simple Random Sampling Stratified Random Sampling
Advertisements

General Linear Model With correlated error terms  =  2 V ≠  2 I.
Hypothesis Testing and Comparing Two Proportions Hypothesis Testing: Deciding whether your data shows a “real” effect, or could have happened by chance.
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
Measures of Dispersion and Standard Scores
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Chapter 7 Introduction to Sampling Distributions
Variance Math 115b Mathematics for Business Decisions, part II
The standard error of the sample mean and confidence intervals
The standard error of the sample mean and confidence intervals
Suppose we are interested in the digits in people’s phone numbers. There is some population mean (μ) and standard deviation (σ) Now suppose we take a sample.
Topics: Inferential Statistics
Variability Measures of spread of scores range: highest - lowest standard deviation: average difference from mean variance: average squared difference.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Chapter 7 Estimation: Single Population
Previous Lecture: Analysis of Variance
1 Some terminology Population - the set of all cases about which we have some interest. Sample - the cases we have selected from the population (randomly)
How confident are we that our sample means make sense? Confidence intervals.
Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
Estimation 1.Appreciate the importance of random sampling 2.Understand the concept of estimation from samples 3.Understand the Central Limit Theorem 4.Be.
QUIZ CHAPTER Seven Psy302 Quantitative Methods. 1. A distribution of all sample means or sample variances that could be obtained in samples of a given.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Mean, Variance, and Standard Deviation for Grouped Data Section 3.3.
Sampling error Error that occurs in data due to the errors inherent in sampling from a population –Population: the group of interest (e.g., all students.
Chapter 6: Sampling Distributions
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Statistical Inference: Estimation and Hypothesis Testing chapter.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
Introduction The previous lesson discussed how to calculate a sample proportion and how to calculate the standard error of the population proportion. This.
Chapter 7 Estimation: Single Population
64 HR (bpm) Does posture affect HR? Effects of posture lyingsittingstanding.
Ch 11 – Inference for Distributions YMS Inference for the Mean of a Population.
Lecture 5 slides on Central Limit Theorem Stratified Sampling How to acquire random sample Prepared by Amrita Tamrakar.
Confidence Intervals and Two Proportions Presentation 9.4.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
Lesson 2. ANOVA Analysis of Variance One Way ANOVA One variable is measured for many different treatments (population)_ Null Hypothesis: all population.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
The Central Limit Theorem and the Normal Distribution.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Statistics for Political Science Levin and Fox Chapter Seven
Review of Statistical Terms Population Sample Parameter Statistic.
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
THE CENTRAL LIMIT THEOREM. Sampling Distribution of Sample Means Definition: A distribution obtained by using the means computed from random samples of.
Chapter 6: Sampling Distributions
GOVT 201: Statistics for Political Science
Statistics 200 Objectives:
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 11 Chi-Square Tests.
Chapter 6: Sampling Distributions
Sampling and Sampling Distributions
Section 6-4 – Confidence Intervals for the Population Variance and Standard Deviation Estimating Population Parameters.
2) Using the data in the table above, compute the sample mean.
Variance Variance: Standard deviation:
Chi-square test or c2 test
Chapter 11 Chi-Square Tests.
Inference on Categorical Data
Homogeneity of Variance
Confidence intervals for the difference between two means: Independent samples Section 10.1.
Facts from figures Having obtained the results of an investigation, a scientist is faced with the prospect of trying to interpret them. In some cases the.
Inference for Two Way Tables
Chapter 11 Chi-Square Tests.
CSE 6392 – Data Exploration and Analysis in Relational Databases
Presentation transcript:

Statistical Analysis Of Population Prepared by, Sushruth Puttaswamy

Contents Population Sampling a Population Relation between Mean of a Sample & Mean of the population Estimation of Error Sample Query

Population Basically a bunch of Numbers. P= {y1, y2, y3 ………………yn} Objective is to do some statistical analysis on the population.

Sampling a Population Consider x to be a random number from the population. Each element of P has an equal opportunity to be selected. A sample is a subset of random numbers from P. Sampling in this situation is assumed to be done with replacement.

Relation between Mu & Y Let the Average/Mean of ‘P’ be ‘Y’. Suppose we sample ‘k’ numbers out of ‘P’ (with replacement). Let ‘Mu’ be the mean of the sample ‘Y’. Objective is to find a relation between ‘Mu’ & ‘Y’. Estimate mod( ‘Mu’ – ‘Y’). Instead of mod, ( ‘Mu’ – ‘Y’) is better. 2

Standard Error Formula The Standard Error Formula gives us an estimate of the error in the sampling process. It is given by E[ ( ‘Mu’ – ‘Y’) ] = (Var) / k. Var is the variance of the population ‘P’. The RHS in the formula gives us the standard error of the sampling process. The formula does not depend on ‘n’, the number of elements in the population. 22

Sampling Methods Sampling must get all columns of a row from the database. The aim is to reduce the error of the estimate. The estimate should be unbiased each time, that is E[ ‘Mu’ ]= ‘Y’. Random Sampling doesn’t give a good estimate when the query has low selectivity.

Sample Query Let us apply Random Sampling to a Database Query. Let ‘Emp’ be a DB table which has Gender as 1 of the columns along with 100,000 records. How many female employees are there? The SQL Query for this is “SELECT COUNT(*) FROM Emp WHERE gender=‘F’.

Query Using Random Sampling Let us select a sample of size 100 (Emp_sam) & assume that no extra overhead is required for getting the samples. Now the query on the sample is “SELECT COUNT(*)*n/k FROM Emp_sam WHERE gender=‘F’; To find this value lets assume a hypothetical column in the DB which has a 0 for Male & 1 for Female. Now adding all 1’s in the result, find the average & multiplying by n gives us the number of females. Let the number of females got by this be 20,000, which means there are 80,000 males.

Estimation of Error To find the error we need to find the variance. From previous result, number of females=20,000 which means there are ’s. Mean of the sample ‘Mu’=20000/100000=0.2. Var= [(0-0.2) * (1-0.2) *20000]/ We get the Variance as From the Standard Error formula we have E=Var / k, that is =

Estimation of Error is the square of the error when trying to estimate ratio of females to the population. By taking the Square Root we get the value as Multiplying this value by ‘n’ we get the value 670. This tells us the error. This means the number of females is / The error in our calculation is 670.

Conclusion The error can be reduced by increasing the sample size. According to the formula, reducing the variance also lessens the error. Without going through all the records we could find the result of the query along with the level of error associated with it.