Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Inferential Statistics
1 Hypothesis testing. 2 A common aim in many studies is to check whether the data agree with certain predictions. These predictions are hypotheses about.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Chapter 7 Sampling and Sampling Distributions
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Topic 2: Statistical Concepts and Market Returns
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
BHS Methods in Behavioral Sciences I
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Chapter 9 Hypothesis Testing.
Statistics for Managers Using Microsoft® Excel 5th Edition
PSY 307 – Statistics for the Behavioral Sciences
AM Recitation 2/10/11.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Overview Definition Hypothesis
Fundamentals of Hypothesis Testing: One-Sample Tests
Review of Basic Statistics. Definitions Population - The set of all items of interest in a statistical problem e.g. - Houses in Sacramento Parameter -
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
The Argument for Using Statistics Weighing the Evidence Statistical Inference: An Overview Applying Statistical Inference: An Example Going Beyond Testing.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 10. Hypothesis Testing II: Single-Sample Hypothesis Tests: Establishing the Representativeness.
Essential Statistics Chapter 131 Introduction to Inference.
Confidence intervals are one of the two most common types of statistical inference. Use a confidence interval when your goal is to estimate a population.
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Large sample CI for μ Small sample CI for μ Large sample CI for p
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
26134 Business Statistics Tutorial 11: Hypothesis Testing Introduction: Key concepts in this tutorial are listed below 1. Difference.
Chapter Eight: Using Statistics to Answer Questions.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 6 Putting Statistics to Work.
Copyright © 2005 Pearson Education, Inc. Slide 6-1.
RESEARCH & DATA ANALYSIS
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
T tests comparing two means t tests comparing two means.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
Lecture 9-I Data Analysis: Bivariate Analysis and Hypothesis Testing
AP Biology Intro to Statistics
Statistics.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
AP Biology Intro to Statistics
Chapter 9 Hypothesis Testing.
Statistics: The Interpretation of Data
Analyzing the Association Between Categorical Variables
Chapter Nine: Using Statistics to Answer Questions
Advanced Algebra Unit 1 Vocabulary
Statistical Inference for the Mean: t-test
How Confident Are You?.
Presentation transcript:

Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/ Design: Planning and carrying out research studies; 2.Description: Summarizing and exploring data; 3.Inference: Making predictions and generalizing about phenomena represented by the data. Homer SimpsonHomer Simpson: Aw, you can come up with statistics to prove anything, Kent. 14 percent of all people know that.

Population - the collection of all individuals or items under consideration in a statistical study Sample - that part of the population from which information is collected Parameter – statistical description of the population PopulationSample Statistical Data Analysis

Variable – characteristic that varies from one item to another Quantitative (numerical) Qualitative (categorical) DiscreteContinuous

Observing the values of the variables yield data Observation – individual piece of data Data set/Data matrix – collection of observations for variable Data matrix k variables measured in sample with the size of n

Presenting data Relative frequency = Frequency / total # of observations Sample and population distributions:

Measures of center (averages) 1.The mode: the value that occurs with the highest frequency Example: 4, 2, 5, 2, 6, 1, 2:2 occurs with a greatest frequency If greatest freq == 1: no mode Can be more than 1 mode 2. The median: Arrange the observed values of variable in a data in increasing order. a. # of observations is odd: the value in the middle. b. # of observations is even: the number halfway between the two middle values Example: 2, 5, 7, 8, 9, 11:Median = 7.5 (len = 6) 3. Sample mean: the sum of observed values in a data divided by the number of observations

Measures of variability 1.Range: Range = max – min 2.Standard deviation: For a variable x, the sample standard deviation, denoted by s x or σ x (for sample), or σ (for population), is: SamplePopulation

Z-Score (Standard score) How many standard deviations a value lies above or below the mean of the set of data; For normal distribution probability of the event (area under the curve) can be found in the tables by z. SamplePopulation Empirical rule for symmetrical normal distribution: 68% of the values lie within x ± s x, 95% of the values lie within x ± 2s x, 99.7% of the values lie within x ± 3s x.

Z-Score (Standard score) Z α : value of Z for which the area under the standard normal curve to its right is equal to α. If we want to take both ends of the distribution into account, we consider Z α/2

Sampling of the population Random sample - a sample from a finite population random of it is chosen in such a way that each of the possible samples has the same probability of being selected. For random sample of size n of population N: Sampling distribution mean = population mean μ = μ x Standard deviation (standard error of the mean): Infinite populationFinite population Standard deviation correction factor

Central Limit Theorem For large samples the sample distribution of the mean can be approximated closely with a normal distribution. Large: sample size n >= 30 μ = μ x

Z α denotes the value of z for which the area under the standard normal curve to its right is equal to α Z α/2 is such value that area under the standard normal curve between -Z α/2 and +Z α/2 is equal to 1 - α μ = μ x When we use μ x as an estimate of μ, the probability is 1 - α that this estimate will be “off” either way by at most E = Z α/2 * (σ / √n) (standard error) Probability and Confidence of Statements In general, we make probability statements about future values of random variables (e.g. potential error of an estimate) and confidence statements once the data has been obtained.

Confidence intervals The probability is (1 – α) that a random variable having the normal distribution will take on a value between -Z α/2 and +Z α/2 : -Z α/2 < Z < Z α/2 -Z α/2 < < Z α/2 Confidence interval X - Z α/2 * σ / √n < μ < X + Z α/2 * σ / √n As we increase the degree of certainty, namely the degree of confidence (1 – α), the confidence interval becomes wider and thus tells us less about the quantity we are trying to estimate. For large samples (n >= 30) and σ is known

Student’s t-test Also good for small samples (<30) and/or when standard dev is unknown; distribution is roughly the shape of normal distribution Degrees of freedom: df = n – 1 Small sample confidence interval: X - t α/2 * s / √n < μ < X + t α/2 * s / √n t α/2 can be found in corresponding tables by df and α t-score

Error Bars - graphical representation of the variability of data and are used on graphs to indicate the error, or uncertainty in a reported measurement Common Error Bars

Test of Hypotheses A statistical hypothesis is an assertion about the parameter(s) of a population. Null hypothesis (H 0 ) – any hypothesis set up primarily to see whether it can be rejected (is directly tested); Alternative hypothesis (H A ) – the hypothesis that we accept when the null hypothesis can be rejected. A significance test is a way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis. Data that fall far from the predicted values provide evidence against the hypothesis. If the difference between what we expect and what we observe is so small that it may well be attributed to chance, the results are not statistically significant. The test statistics is a statistic calculated from the sample data to test the null hypothesis. This statistic typically involves a point estimate of the parameter to which the hypotheses refer.

p-value - the probability, when H 0 is true, of a test statistic value at least as contradictory to H 0 as the value actually observed. The smaller the p-value, the more strongly the data contradict H 0. The primarily reported result of a significance test. The p-value summarizes the evidence in the data about the null hypothesis. A moderate to large p-value means that the data are consistent with H 0. Most studies require very small p-value, such as p 0.05, before concluding that the data sufficiently contradict H 0 to reject it. In such cases, results are said to be significant at the 0.05 level. This means that if the null hypothesis were true, the chance of getting such extreme results as in the sample data would be no greater than 5%.