Introduction to Statistics Elan Ding Clemson University
What is statistics? Statistics answers questions about our world Data collection Data summarization (descriptive statistics) Data analysis (inferential statistics) Interpretation of results
Example Your friend claimed that among 1000 tosses of a fair coin, she obtained 535 heads. Is her claim plausible?
List of Topics 1. Descriptive Statistics 2. Probability Overview 3. Sampling Distributions 4. Hypothesis Testing 5. Common Statistical Tests
1. Descriptive Statistics Graphical Methods
Perceived Risk in Smoking
Comparative Bar Graph
Comparative Bar Graph
Life Insurance for Cartoon Character
Should doctors get auto insurance discount?
Stem-leaf plot
Math SAT score in 2005
Frequency Histogram
GPA report errors
GPA report errors
1. Descriptive Statistics Numerical Methods
Centrality
Variability
Quartiles
Boxplot
2003-2004 NBA Salaries
The Central Limit Theorem 2. Probability The Central Limit Theorem
Example Two players play a game until one player wins two games in a row. Identify the following: Experiment Outcome Event Sample Space
Probability Distribution
Probability Density
Probability Density
Mean of discrete random variable
Mean of continuous random variable
Variance of discrete random variable
Variance of continuous random variable
Binomial Distribution The probability of k success in n Bernoulli trials is: What does it look like? Web app
Normal Distribution
Standard Normal Distribution
Z-Table
Standardization
The Empirical Rule
Example Suppose 𝑋= the height of a randomly selected 5-year old child follows a normal distribution with 𝜇=100 cm and 𝜎=6 cm. What proportion of height is between 94 cm and 112 cm?
The Central Limit Theorem Web app
3. Sampling Distribution The Central Limit Theorem
Population vs sample
What is a statistic?
What is a sampling distribution? Suppose a random variable 𝑋 has a Bernoulli distribution. Such that 𝑋=1 with probability 0.4 and 𝑋=0 with probability 0.6.
What is a sampling distribution? Suppose we take a sample of 2 from the population, and call them 𝑋 1 and 𝑋 2 . Define the statistic 𝑇= 𝑋 1 + sin 𝑋 2 . What is the distribution of 𝑇? We draw 1000 random samples of size 2 and obtain an approximation:
Why sampling distribution? Populations parameters such as 𝜇 and 𝜎 2 are often unknown. Sample mean 𝑋 and sample variance 𝑆 2 are called the unbiased estimator for 𝜇 and 𝜎 2 : 𝑋 = 1 𝑛 𝑖=1 𝑛 𝑋 𝑖 𝑆 2 = 1 𝑛−1 𝑖=1 𝑛 ( 𝑋 𝑖 − 𝑋 )
Why is sampling distribution useful? The Central Limit Theorem! The most important statistic is 𝑋 , which is approximately normal when the sample size is large! The CLT can be safely applied when 𝑛 exceeds 30. Web app
Example Revisited Your friend claimed that among 1000 tosses of a fair coin, she obtained 535 heads. Is her claim plausible?
Solution
Solution
Solution
Application of sampling distribution 4. Hypothesis Testing Application of sampling distribution
Forming Hypothesis Let’s look at the previous problem in a different light. Suppose now your friend INDEED got 535 heads in 1000 tosses. Can you say something about whether the coin is fair or not? To do that we set up the following hypothesis:
Forming Hypothesis Let’s look at the previous problem in a different light. Suppose now your friend INDEED got 535 heads in 1000 tosses. Can you say something about whether the coin is fair or not? To do that we set up the following hypothesis:
SUPPOSE 𝐻 0 is true
5. Common Statistical Tests Application of hypothesis testing
1. Z-test for sample proportion Researchers at the University of Luton conducted a survey of 321 faculty members at a variety of academic institutions. It was reported that 36% of those surveyed said they occasionally used online searches with key words from student work to check for plagiarism. Assuming it is reasonable to regard this sample as representative of university faculty members, does the sample provide convincing evidence that more than one-third of faculty members occasionally use key word searches to check student work?
1. Z-test for sample proportion
1. Z-test for sample proportion
1. Z-test for sample proportion
1. Z-test for sample proportion
1. Z-test for sample proportion
𝐻 1 :𝑝≠ 𝑝 0 (two-tailed)
𝐻 1 :𝑝> 𝑝 0 (one-tailed)
2. Z-test for sample mean A study investigated whether time perception is impaired during nicotine withdrawal. After a 24-hr smoking abstinence, 20 smokers were asked to estimate how much time had passed during a 45-sec period. Suppose the resulting data on perceived elapsed time (in seconds) were as shown: Researchers want to know whether smoking abstinence can cause overestimation of elapsed time.
2. Z-test for sample mean
2. Z-test for sample mean
The t-distribution
t-table
Statistical Significance does not imply practical importance
3. t-test for two sample means (paired)
3. t-test for two sample means (paired)
4. t-test for two sample means (unpaired)
4. t-test for two sample means (unpaired)
Thank you! yirending@gmail.com