Statistics.

Slides:



Advertisements
Similar presentations
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Advertisements

CHI-SQUARE(X2) DISTRIBUTION
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
QUANTITATIVE DATA ANALYSIS
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Introduction to Educational Statistics
Today Concepts underlying inferential statistics
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Inferential Statistics
Inferential Statistics
Basic Statistics Michael Hylin. Scientific Method Start w/ a question Gather information and resources (observe) Form hypothesis Perform experiment and.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Chapter Twelve Copyright © 2006 John Wiley & Sons, Inc. Data Processing, Fundamental Data Analysis, and Statistical Testing of Differences.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Academic Research Academic Research Dr Kishor Bhanushali M
Chapter Eight: Using Statistics to Answer Questions.
Chapter 6: Analyzing and Interpreting Quantitative Data
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Agenda n Probability n Sampling error n Hypothesis Testing n Significance level.
Chi Square Test Dr. Asif Rehman.
Analysis and Interpretation: Exposition of Data
Statistics & Evidence-Based Practice
Descriptive and Inferential Statistics
I. ANOVA revisited & reviewed
Introduction to Marketing Research
The Chi-square Statistic
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 9: Non-parametric Tests
Presentation 12 Chi-Square test.
AP Biology Intro to Statistics
STATISTICS FOR SCIENCE RESEARCH
Statistical tests for quantitative variables
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 16: Research with Categorical Data.
Hypothesis Testing Review
Hypothesis testing. Chi-square test
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Chapter 5 STATISTICS (PART 1).
AP Biology Intro to Statistics
12 Inferential Analysis.
Introduction to Inferential Statistics
Inferential statistics,
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Chapter 11 Goodness-of-Fit and Contingency Tables
Chi-Square Test Dr Kishor Bhanushali.
Correlation and Regression
Analysis and Interpretation: Exposition of Data
Elementary Statistics
Introduction to Statistics
Discrete Event Simulation - 4
Hypothesis testing. Chi-square test
Association, correlation and regression in biomedical research
12 Inferential Analysis.
Unit XI: Data Analysis in nursing research
Product moment correlation
15.1 The Role of Statistics in the Research Process
Parametric versus Nonparametric (Chi-square)
Section 11-1 Review and Preview
Chapter Nine: Using Statistics to Answer Questions
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and
PSY 250 Hunter College Spring 2018
Testing Hypotheses I Lesson 9.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Presentation transcript:

Statistics

Definitions The science of collecting, summarizing, and analyzing numerical data. Statistics makes it possible to predict the likelihood of events. The mathematics of the collection, organization, and interpretation of numerical data. Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. (distribution of the data: frequency; central tendency: mean, median, mode; Dispersion: standar deviation, variance) Inferential statistics, is used to try to reach conclusions that extend beyond the immediate data alone. Inferential statistics uses patterns in the sample data to draw inferences about the population represented, accounting for randomness. Thus, we use inferential statistics to make inferences from our data to more general conditions (and in some way to predict); we use descriptive statistics simply to describe what's going on in our data. Inference is a vital element of scientific advance, since it provides a prediction (based in data) for where a theory logically leads. To further prove the guiding theory, these predictions are tested as well, as part of the scientific method.

- Descriptive Statistics identify patterns leads to hypothesis generating - Inferential Statistics distinguish true differences from random variation allows hypothesis testing

Step in a statistic study I. Identifying the question * What is the question? (What are my hypotheses?) * Is it possible to answer the question with statistics? * Is the data obtainable? (birth weight, socio economic, drugs, alcohol) * Is it ethical to obtain such data? * If not, is there a reasonable substitute? II. Designing a Study * Identify the population of interest * Survey * Observational Studies * Designing an Experiment * EDA – Exploratory Data Analysis (trends, relationships, differences) * Pilot Study III. Collecting Data * Identify variables * Identify types of variables * Identify Limits of measurement or observation IV. Analyze the data * Use proper procedures and techniques. * Check the assumptions behind the procedures and techniques. V. Make Conclusions and Discuss Limitations * What are the answers to the original hypotheses? * What are the limitations of the study? * What conclusions does the study not make? * What new questions arise from this study?

Variables A variable is a characteristic that can be observed and vary among all the individual of a given population Class of variables: Qualitative = can not be expressed with number (nationality). Qualitative variable are those that express a qualitative attribute such as hair color, eye color, religion. Qualitative variables are sometimes referred to as categorical variables. Values on qualitative variables do not imply order, they are simply categories. b) Quantitative = can be expressed with number Quantitative variables are those variables that are measured in terms of numbers. Some examples of quantitative variables are height, weight, and shoe size. Discrete = integral number Continuous = the scale is continuous and not made up of discrete steps

Frequency table A frequency table is a way of summarizing a set of data. It is a record of how often each value (or set of values) of the variable in question occurs. It may be enhanced by the addition of percentages that fall into each category. Absolute frequency = number of individual or observation for each category. Relative frequency = relative frequency or percentage of time in which each category occurs (occurrence of an event is the score divided by the total number of observations.) A frequency table is used to summarize categorical, nominal, and ordinal data. It may also be used to summarize continuous data once the data set has been divided up into sensible groups. When we have more than one categorical variable in our data set, a frequency table is sometimes called a contingency table because the figures found in the rows are contingent upon (dependent upon) those found in the columns. Example Suppose that in thirty shots at a target, a marksman makes the following scores: 5 2 2 3 4 4 3 2 0 3 0 3 2 1 5 1 3 1 5 5 2 4 0 0 4 5 4 4 5 5 The frequencies of the different scores can be summarised as: Score Frequency (absolute) Frequency (%)(relative) 0 4 13% 1 3 10% 2 5 17% 3 5 17% 4 6 20% 5 7 23%

Contingency table A contingency table (also referred to as cross tabulation or cross tab) is often used to record and analyze the relation between two or more categorical variables. It displays the (multivariate) frequency distribution of the variables in a matrix format. Right-handed Left-handed TOTALS Males 43 9 52 Females 44 4 48 TOTALS 87 13 100

Hypothesis A hypothesis is a proposed explanation for an observable phenomenon. For a hypothesis to be put forward as a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous observations that cannot satisfactorily be explained with the available scientific theories. The null hypothesis (H0) typically proposes a general or default position, such as that there is no relationship between two measured phenomena,or that a potential treatment has no effect. It is typically paired with a second hypothesis, the alternative hypothesis (H1), which asserts a particular relationship between the phenomena. Hypothesis testing works by collecting data and measuring how probable the data is, assuming the null hypothesis is true. If the data is very improbable (usually defined as observed less than 5% of the time), then the experimenter concludes that the null hypothesis is false. If the data do not contradict the null hypothesis, then no conclusion is made. In this case, the null hypothesis could be true or false; the data gives insufficient evidence to make any conclusion. You must chose the level of significance α (or critic value) usually 0.05 (5%).

Chi-square or χ²-distribution In probability theory and statistics, the chi-square distribution (also chi-squared or χ²-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is one of the most widely used probability distributions in inferential statistics, e.g. in hypothesis testing, or in construction of confidence intervals. The best-known situations in which the chi-square distribution is used are the common chi-square tests for goodness of fit.

Chi-square test χ2 = Σ (O – E )2, O= observed -A statistical test to determine the probability that an observed deviation from the expected event or outcome occurs solely by chance. -It is used to know if the data of your sampled population agree to the theoretic distribution we suspect to be true and if the differences between the observed and the expected are real or depends on random variation. χ2 = Σ (O – E )2, O= observed E E= expected degree of fredom = n-1; where n=# classes Degrees of freedom = the number of values in the final calculation of a statistic that are free to vary. Pearson's chi-square (χ2) test is the best-known of several chi-square tests Yates correction: to reduce the error in approximation that the discrete probability (a random variable that can be counted) of observed binomial frequencies in the table can be approximated by the continuous chi-squared distribution. The effect of Yates' correction is to prevent overestimation of statistical significance for small data. This formula is used when at least one cell of the table has an expected count smaller than 5. To reduce the error in approximation, Frank Yates, an English statistician, suggested a correction for continuity that adjusts the formula for Pearson's chi-square test by subtracting 0.5 from χ2 = Σ [(O – E )-0.5]2, O= observed E E= expected degree of fredom = n-1; where n=# classes

Probability: table of χ² value and P value Do not reject H0 Reject H0 0.05

Chi-square test Data used in a chi-square analysis has to satisfy the following conditions 1. Randomly drawn from the population, 2. reported in raw counts of frequency, 3. measured variables must be independent, 4. observed frequencies cannot be too small, and 5. values of independent and dependent variables must be mutually exclusive. There are two types of chi-square test that we will use:. * The Chi-square test for goodness of fit which compares the expected and observed values to determine how well an experimenter's predictions fit the data. Chi-Square goodness of fit test is a non-parametric test that is used to find out how the observed value of a given phenomena is significantly different from the expected value. In Chi-Square goodness of fit test, the term goodness of fit is used to compare the observed sample distribution with the expected probability distribution. Chi-Square goodness of fit test determines how well theoretical distribution (such as normal, binomial, or Poisson) fits the empirical distribution. * Chi-square test of contingency: use to test the hypothesis that the frequencies of occurrence in the various categories of one variable are independent of the frequencies in the second variable.

Example of Chi square goodness of fit Calculation of chi square goodness of fit, of data consisting of the Colors of 100 flowers, to a hypothesized color ratio of 3:1 (Yy x Yy) χ2 = Σ (O – E )2 E

(YyRr x YyRr)

(250/16)9

(250/16)9 (250/16)3

(250/16)9 (250/16)3 (250/16)3 (250/16)1