بسم الله الرّحمن الرّحيم

Slides:



Advertisements
Similar presentations
Richard M. Jacobs, OSA, Ph.D.
Advertisements

Statistics It is the science of planning studies and experiments, obtaining sample data, and then organizing, summarizing, analyzing, interpreting data,
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
1 Chapter 1: Sampling and Descriptive Statistics.
Descriptive Statistics
Statistical Tests Karen H. Hagglund, M.S.
Analysis of Research Data
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Measures of Central Tendency
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Chapter 1: Introduction to Statistics
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Lies, damned lies & statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Chapter 3 Statistical Concepts.
Statistics. Question Tell whether the following statement is true or false: Nominal measurement is the ranking of objects based on their relative standing.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Numerical Descriptive Techniques
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION © 2012 The McGraw-Hill Companies, Inc.
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.
Descriptive Statistics
6-1 Numerical Summaries Definition: Sample Mean.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
INVESTIGATION 1.
Chapter 8 Making Sense of Data in Six Sigma and Lean
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Chapter Eight: Using Statistics to Answer Questions.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Introduction to statistics I Sophia King Rm. P24 HWB
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
Descriptive Statistics(Summary and Variability measures)
PXGZ6102 BASIC STATISTICS FOR RESEARCH IN EDUCATION
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Exploratory Data Analysis
Methods for Describing Sets of Data
Chapter 12 Understanding Research Results: Description and Correlation
Chapter 2: Methods for Describing Data Sets
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Basic Statistics Overview
Description of Data (Summary and Variability measures)
Basic Statistical Terms
NURS 790: Methods for Research and Evidence Based Practice
Welcome!.
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

بسم الله الرّحمن الرّحيم www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 Biostatistics Academic Preview Descriptive Statistics www.biostat.ir

What Is Statistics? Statistics is the science of describing or making inferences about the world from a sample of data. Descriptive statistics are numerical estimates that organize and sum up or present the data. Inferential statistics is the process of inferring from a sample to the population. www.biostat.ir

Statistics has two major chapters: Descriptive Statistics Inferential statistics www.biostat.ir

Two types of Statistics Biostatistics Academic Preview (2006): Session 2 08/28/06 Two types of Statistics Descriptive statistics Used to summarize, organize and simplify data What was the average height score? What was the highest and lowest score? What is the most common response to a question? Inferential statistics Techniques that allow us to study samples and then make generalizations about the populations from which they were selected Are 5th grade boys taller than 5th grade girls? Does a treatment suitable? We will cover methods of descriptive data in the beginning of the semester and in the middle and the end of the semester we will be working with inferential stats. Honestly, as a researcher, I use both kinds daily. www.biostat.ir

Population and Samples The Population under study is the set off all individuals of interest for the research. That part of the population for which we collect measurements is called sample. The number of individuals in a sample is denoted by n. www.biostat.ir

Variables www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 Definitions Variable: a characteristic that changes or varies over time and/or different subjects under consideration. Changing over time Blood pressure, height, weight Changing across a population gender, race www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 Types of variables www.biostat.ir

Types of variables : Definitions Biostatistics Academic Preview (2006): Session 2 08/28/06 Types of variables : Definitions Quantitative variables (numeric): measure a numerical quantity of amount on each experimental unit Qualitative variables (categorical): measure a non numeric quality or characteristic on each experimental unity by classifying each subject into a category www.biostat.ir

Types of variables : Quantitative variables Biostatistics Academic Preview (2006): Session 2 08/28/06 Types of variables : Quantitative variables Discrete variables: can only take values from a list of possible values Number of brushing per day Continuous variables: can assume the infinitely many values corresponding to the points on a line interval weight, height www.biostat.ir

Types of variables : Categorical variables Biostatistics Academic Preview (2006): Session 2 08/28/06 Types of variables : Categorical variables Nominal: unordered categories Race Gender Ordinal: ordered categories likert scales( disagree, neutral, agree ) Income categories www.biostat.ir

Types of Variables A discrete variable has gaps between its values. For example, number of brushing per day is a discrete variable. A continuous variable has no gaps between its values. All values or fractions of values have meaning. Age is an example of continuous variable. www.biostat.ir

Levels of Measurement Reflects type of information measured and helps determine what descriptive statistics and which statistical test can be used. www.biostat.ir

Four Levels of Measurement Nominal lowest level, categories, no rank Ordinal second lowest, ranked categories Interval next to highest, ranked categories with known units between rankings Ratio highest level, ranked categories with known intervals and an absolute zero www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 Scales of Measurement Temperature Men/Women Good/Better/Best Weight Republicans/Democrats/ Independents Volume IQ Not at all/A little/A lot Interval Nominal Ordinal Ratio - IQ: can’t have an absence of IQ. Also, we can’t say that a person with an IQ of www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 www.biostat.ir

Descriptive Measures Central Tendency measures. They are computed in order to give a “center” around which the measurements in the data are distributed. Relative Standing measures. They describe the relative position of a specific measurement in the data. Variation or Variability measures. They describe “data spread” or how far away the measurements are from the center. www.biostat.ir

Measures of Central Tendency Mean: Sum of all measurements in the data divided by the number of measurements. Median: A number such that at most half of the measurements are below it and at most half of the measurements are above it. Mode: The most frequent measurement in the data. www.biostat.ir

Summary Statistics: Measures of central tendency (location) Biostatistics Academic Preview (2006): Session 2 08/28/06 Summary Statistics: Measures of central tendency (location) Mean: The mean of a data set is the sum of the observations divided by the number of observation Population mean: Sample mean: Median: The median of a data set is the “middle value” For an odd number of observations, the median is the observation exactly in the middle of the ordered list For an even number of observation, the median is the mean of the two middle observation is the ordered list Mode: The mode is the single most frequently occurring data value www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 Skewness 08/28/06 The skewness of a distribution is measured by comparing the relative positions of the mean, median and mode. Distribution is symmetrical Mean = Median = Mode Distribution skewed right Median lies between mode and mean, and mode is less than mean Distribution skewed left Median lies between mode and mean, and mode is greater than mean www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 Relative positions of the mean and median for (a) right-skewed, (b) symmetric, and (c) left-skewed distributions Note: The mean assumes that the data is normally distributed. If this is not the case it is better to report the median as the measure of location. www.biostat.ir

Frequency Distributions and Histograms Histograms for symmetric and skewed distributions. www.biostat.ir

Normal curves same mean but different standard deviation Biostatistics Academic Preview (2006): Session 2 08/28/06 Normal curves same mean but different standard deviation www.biostat.ir

Further Notes When the Mean is greater than the Median the data distribution is skewed to the Right. When the Median is greater than the Mean the data distribution is skewed to the Left. When Mean and Median are very close to each other the data distribution is approximately symmetric. www.biostat.ir

Summary statistics Measures of spread (scale) Biostatistics Academic Preview (2006): Session 2 08/28/06 Summary statistics Measures of spread (scale) Variance: The average of the squared deviations of each sample value from the sample mean, except that instead of dividing the sum of the squared deviations by the sample size N, the sum is divided by N-1. Standard deviation: The square root of the sample variance Range: the difference between the maximum and minimum values in the sample. www.biostat.ir

Summary statistics: measures of spread (scale) Biostatistics Academic Preview (2006): Session 2 08/28/06 Summary statistics: measures of spread (scale) We can describe the spread of a distribution by using percentiles. The pth percentile of a distribution is the value such that p percent of the observations fall at or below it. Median=50th percentile Quartiles divide data into four equal parts. First quartile—Q1 25% of observations are below Q1 and 75% above Q1 Second quartile—Q2 50% of observations are below Q2 and 50% above Q2 Third quartile—Q3 75% of observations are below Q3 and 25% above Q3 www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 Quartiles 25% Q3 Q2 Q1 www.biostat.ir 18

Biostatistics Academic Preview (2006): Session 2 08/28/06 Five number system Maximum Minimum Median=50th percentile Lower quartile Q1=25th percentile Upper quartile Q3=75th percentile www.biostat.ir

Graphical display of numerical variables (histogram) Biostatistics Academic Preview (2006): Session 2 08/28/06 Graphical display of numerical variables (histogram) Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1 www.biostat.ir 18

Frequency Distributions and Histograms A histogram of the compressive strength data with 17 bins. www.biostat.ir

Frequency Distributions and Histograms A histogram of the compressive strength data with nine bins. www.biostat.ir

Frequency Distributions and Histograms Histogram of compressive strength data. www.biostat.ir

Graphical display of numerical variables (box plot) Biostatistics Academic Preview (2006): Session 2 08/28/06 Graphical display of numerical variables (box plot) Q1 Q3 Q2 Minimum Maximum Median www.biostat.ir 53

Graphical display of numerical variables (box plot) Biostatistics Academic Preview (2006): Session 2 08/28/06 Graphical display of numerical variables (box plot) Negatively Skewed Positively Symmetric (Not Skewed) S < 0 S = 0 S > 0 www.biostat.ir 54

Univariate statistics (categorical variables) Biostatistics Academic Preview (2006): Session 2 08/28/06 Univariate statistics (categorical variables) Summary measures Count=frequency Percent=frequency/total sample The distribution of a categorical variable lists the categories and gives either a count or a percent of individuals who fall in each category www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 Displaying categorical variables Rank Cause of Death Frequency (%) 1 Heart Disease 710,760 (43%) 2 Cancer 553,091 (33%) 3 Stroke 167,661 (11%) 4 CLRD 122,009 ( 7%) 5 Accidents 97,900 ( 6%) Total All five causes 1,651,421 www.biostat.ir

Response and explanatory variables Biostatistics Academic Preview (2006): Session 2 08/28/06 Response and explanatory variables Response variable: the variable which we intend to model. we intend to explain through statistical modeling Explanatory variable: the variable or variables which may be used to model the response variable values may be related to the response variable www.biostat.ir

Bivariate relationships Biostatistics Academic Preview (2006): Session 2 08/28/06 Bivariate relationships An extension of univariate descriptive statistics Used to detect evidence of association in the sample Two variables are said to be associated if the distribution of one variable differs across groups or values defined by the other variable www.biostat.ir

Bivariate Relationships Biostatistics Academic Preview (2006): Session 2 08/28/06 Bivariate Relationships Two quantitative variables Scatter plot Side by side stem and leaf plots Two qualitative variables Tables Bar charts One quantitative and one qualitative variable Side by side box plots Bar chart www.biostat.ir

Two quantitative variables Correlation Biostatistics Academic Preview (2006): Session 2 08/28/06 Two quantitative variables Correlation A relationship between two variables. Explanatory (Independent)Variable Response (Dependent)Variable x y Hours of Training Number of Accidents Shoe Size Height Cigarettes smoked per day Lung Capacity In this chapter we will be concerned with linear correlation. (How the points fit to a straight line) In more advanced courses you may study other types of correlation. Height IQ What type of relationship exists between the two variables and is the correlation significant? www.biostat.ir

Scatter Plots and Types of Correlation Biostatistics Academic Preview (2006): Session 2 08/28/06 Scatter Plots and Types of Correlation x = hours of training y = number of accidents Accidents Start with a scatter plot. It can give a picture of the relationship between the two variables. Negative Correlation as x increases, y decreases www.biostat.ir

Scatter Plots and Types of Correlation Biostatistics Academic Preview (2006): Session 2 08/28/06 Scatter Plots and Types of Correlation x = SAT score y = GPA GPA Positive Correlation as x increases y increases www.biostat.ir

Scatter Plots and Types of Correlation Biostatistics Academic Preview (2006): Session 2 08/28/06 Scatter Plots and Types of Correlation x = height y = IQ IQ There is no particular pattern here. No linear correlation www.biostat.ir

Correlation Coefficient Biostatistics Academic Preview (2006): Session 2 08/28/06 Correlation Coefficient A measure of the strength and direction of a linear relationship between two variables The range of r is from -1 to 1. -1 1 If r is close to -1 there is a strong negative correlation If r is close to 1 there is a strong positive correlation Give several examples r = -0.97, r = 0.02 and ask for the strength of the correlation. For values like 0.63 a hypothesis test is necessary to determine whether it is strong or not. If r is close to 0 there is no linear correlation www.biostat.ir

Positive and negative correlation Biostatistics Academic Preview (2006): Session 2 08/28/06 Positive and negative correlation 1 If two variables x and y are positively correlated this means that: large values of x are associated with large values of y, and small values of x are associated with small values of y 2 If two variables x and y are negatively correlated this means that: large values of x are associated with small values of y, and small values of x are associated with large values of y www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 Positive correlation www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 Negative correlation www.biostat.ir

Two qualitative variables (Contingency Tables) Biostatistics Academic Preview (2006): Session 2 08/28/06 Two qualitative variables (Contingency Tables) Categorical data is usually displayed using a contingency table, which shows the frequency of each combination of categories observed in the data value The rows correspond to the categories of the explanatory variable The columns correspond the categories of the response variable www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 Example Aspirin and Heart Attacks Explanatory variable=drug received placebo Aspirin Response variable=heart attach status yes no www.biostat.ir

Contingency table: heart attack example Biostatistics Academic Preview (2006): Session 2 08/28/06 Contingency table: heart attack example Heart Attack No Heart Attack Total Aspirin 104 10,933 11,037 placebo 189 10,845 11,034 293 21,778 22,071 www.biostat.ir

Two qualitative variables Biostatistics Academic Preview (2006): Session 2 08/28/06 Two qualitative variables Marijuana Use in College: x=parental use, y=student use Both Neither One Never 17 141 68 226 Occasional 11 54 44 109 Regular 19 40 51 110 Total 47 235 163 445 www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 One quantitative, One qualitative Box plot of age by low birth weight Mean age by low birth weight low birth weight www.biostat.ir

Trivariate Relationships Biostatistics Academic Preview (2006): Session 2 08/28/06 Trivariate Relationships An extension of bivariate descriptive statistics We focus on description that helps us decide about the role variables might play in the ultimate statistical analyses Identify variables that can increase the precision of the data analysis used to answer associations between two other variables www.biostat.ir

Confounding and effect modification Biostatistics Academic Preview (2006): Session 2 08/28/06 Confounding and effect modification A factor, Z, is said to confound a relationship between a risk factor, X, and an outcome, Y, if it is not an effect modifier and the unadjusted strength of the relationship between X and Y differs from the common strength of the relationship between X and Y for each level of Z. A factor, Z, is said to be an effect modifier of a relationship between a risk factor, X, and an outcome measure, Y, if the strength of the relationship between the risk factor, X, and the outcome, Y, varies among the levels of Z. www.biostat.ir

Biostatistics Academic Preview (2006): Session 2 08/28/06 Example: confounding In our low birth weight data suppose we wish to investigate the association between race and low birth weight. Our ability to detect this association might be affected by: Smoking status being associated with low birth weight Smoking status being associated with race www.biostat.ir

Multiple Models Allows one to calculated the association between and response and outcome of interest, after controlling for potential confounders. Allows for one to assess the association between an outcome and multiple response variables of interest. www.biostat.ir

Time Sequence Plots A time series or time sequence is a data set in which the observations are recorded in the order in which they occur. A time series plot is a graph in which the vertical axis denotes the observed value of the variable (say x) and the horizontal axis denotes the time (which could be minutes, days, years, etc.). When measurements are plotted as a time series, we often see trends, cycles, or other broad features of the data www.biostat.ir

Company sales by year (a) and by quarter (b). Time Sequence Plots Company sales by year (a) and by quarter (b). www.biostat.ir

Tests comparing difference between 2 or more groups Dependent variable Independent variable Paired (dependent t-test) Interval/ratio pre and post tests Nominal Unpaired (independent t-test) Interval/ratio Nominal (2 grps) ANOVA F-test Nominal (>2 grps) Chi-Square (Nonparametric) Nominal (Dichotomous) www.biostat.ir

Tests demonstrating association between two groups Dependent var. Independent var. Spearman rho Ordinal Mann-Whitney U Non-parametric Nominal Pearson’s r Interval/ratio www.biostat.ir

Tests demonstrating association between two groups, controlling for third variable Dependent Independent Logistic regression Nominal Linear regression Interval/ratio Pearson partial r Kendall’s partial r Ordinal www.biostat.ir