Univariate Data Exploration

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

Stem and leaf diagrams and box plots Statistics. Draw a stem and leaf diagram using the set of data below
1 Frequency Distributions & Graphing Nomenclature  Frequency: number of cases or subjects or occurrences  represented with f  i.e. f = 12 for a score.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Creating a Histogram Using Chart Builder PowerPoint.
DATA VISUALIZATION UNIVARIATE (no review- self study) STEM & LEAF BOXPLOT BIVARIATE SCATTERPLOT (review correlation) Overlays; jittering Regression line.
Chapter 13 Conducting & Reading Research Baumgartner et al Data Analysis.
The goal of data analysis is to gain information from the data. Exploratory data analysis: set of methods to display and summarize the data. Data on just.
Sociology 690 SPSS Introduction. Using SPSS The Statistical Package for the Social Sciences (SPSS) started at Stanford University in the late 1960’s.
Examining Univariate Distributions Chapter 2 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using.
Stats 95 Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures.
Description and measurement
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Variability.
Data from OpenIntro Statistics, exercise 1.36 The infant mortality rate is defined as the number of infant deaths per 1,000 live births. The data we consider.
Sampling Distributions. What is a sampling distribution? Grab a sample of size N Compute a statistic (mean, variance, etc.) Record it Do it again (until.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Measures of Dispersion
Normal Distribution The Bell Curve. Questions What are the parameters that drive the normal distribution? What does each control? Draw a picture to illustrate.
Describing Data: Graphical Methods ● So far we have been concerned with moving from asking a research question to collecting good quality empirical data.
Statistics with TI-Nspire™ Technology Module E. Lesson 2: Properties Statistics with TI-Nspire™ Technology Module E.
Chapter 3 – Graphical Displays of Univariate Data Math 22 Introductory Statistics.
Lecture 2.  A descriptive technique  An organized tabulation showing exactly how many individuals are located in each category on the scale of measurement.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Descriptive Statistics. Outline of Today’s Discussion 1.Central Tendency 2.Dispersion 3.Graphs 4.Excel Practice: Computing the S.D. 5.SPSS: Existing Files.
Sociology 680 SPSS Introduction. Using SPSS The Statistical Package for the Social Sciences (SPSS) started at Stanford University in the late 1960’s.
4.2 Displays of Quantitative Data. Stem and Leaf Plot A stem-and-leaf plot shows data arranged by place value. You can use a stem-and-leaf plot when you.
Frequency Distributions Chapter 2. Descriptive Statistics Distributions are part of descriptive statistics…we are learning how to describe some data by.
Exploratory Data Analysis
SPSS: Using statistical software — a primer
Statistics in SPSS Lecture 3
Quantitative Data Continued
CHAPTER 1 Exploring Data
MAT 446 Supplementary Note for Ch 1
M.Sc. In Financial Analysis
Stem and leaf diagrams and box plots
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Lecture 2: Homework 2 Tutorial
This Week Review of estimation and hypothesis testing
STAT 206: Chapter 6 Normal Distribution.
Summary Statistics in R Commander
R Assignment #4: Making Plots with R (Due – by ) BIOL
Description of Data (Summary and Variability measures)
Univariate Descriptive Statistics
Univariate Descriptive Statistics
Laugh, and the world laughs with you. Weep and you weep alone
recap Individuals Variables (two types) Distribution
Chapter 2b.
Lab 2 Data Manipulation and Descriptive Stats in R
Lecture 8: Descriptive Statistics and Statistical Inference on SPSS
Unit 2 Research and Methods.
Introduction to Descriptive Statistics
Describing Distributions of Data
Drill Construct a Histogram to represent the data of test score averages in 20 cities using 5 Bars. Test Averages {62, 68, 72, 58, 83, 91, 70, 82, 68,
The Shape and Spread of Data
POPULATION VS. SAMPLE Population: a collection of ALL outcomes, responses, measurements or counts that are of interest. Sample: a subset of a population.
1.1 Cont’d.
Measuring Variation 2 Lecture 17 Sec Mon, Oct 3, 2005.
Program This course will be dived into 3 parts: Part 1 Descriptive statistics and introduction to continuous outcome variables Part 2 Continuous outcome.
Describing distributions with numbers
Part I Review Highlights, Chap 1, 2
CHAPTER 1 Exploring Data
(Approximately) Bivariate Normal Data and Inference Based on Hotelling’s T2 WNBA Regular Season Home Point Spread and Over/Under Differentials
Chapter 1: Exploring Data
Chapter 1: Exploring Data
The Normal Distribution
Simulate Multiple Dice
Chapter 1: Exploring Data
Presentation transcript:

Univariate Data Exploration R packages and functions Moments Mean, SD, skew, kurtosis Other descriptors N, min, max Simple graphs Histogram, kernal (density) Stem-leaf, boxplot

Descriptive Stats in R You can download the R file from Canvas, Modules

Listing the name (e.g. Sample 1) causes the object to be printed. ‘describe’ computes the descriptive statistics

‘hist’ computes a histgram, which appears in the plot window ‘hist’ computes a histgram, which appears in the plot window. Click ‘Zoom’ to see it better. Hit ‘Export’ to save it to a file.

Blackmore Data Blackmore dataset from package 'cars.' Exercise histories of 138 girls hospitalized for eating disorders and 98 control subjects. The data frame has 945 rows and 4 columns. Note that there are multiple rows for each participant (but ignore for now).

Blackmore descriptives N is misleading because multiple rows per person. The SD for exercise is larger than the mean. Minimum value for exercise is zero. What do you suppose this means (also note the skew for exercise)? Group is a label for sick or control.

hist(Blackmore$age) Note how to refer to an element of an object with the $. What does this tell us about the sample?

Blackmore Exercise hist(Blackmore$exercise)

Blackmore Exercise exe.dens <- density(Blackmore$exercise) plot(exe.dens)

Blackmore Exercise

Blackmore Exercise boxplot(Blackmore$exercise, main='Exercise')

Why this can be a problem What a mess! Always plot your data!

Distribution Shapes Shape of the population can be hard to infer from the sample, especially if the sample size is small. Two different graphs showing examples of shapes. Both sampled from N(50,2) First is n = 100 Second is n = 25

Shapes of Samples from Normal (n=100) Adapted from code found here: http://www.programmingr.com/content/animations-r/

Shapes of Samples from Normal (n=25)

Exercise Create a ‘drive for thinness’ score and describe its distribution. from the DavisThin dataset in car –companion for applied regression (car manual is in Canvas). Add the items to create a scale. Run descriptive stats, histogram, stem-and-leaf, boxplot.