Lecture 2 Summarizing the Sample. WARNING: Today’s lecture may bore some of you… It’s (sort of) not my fault…I’m required to teach you about what we’re.

Slides:



Advertisements
Similar presentations
Histograms Bins are the bars Counts are the heights Relative Frequency Histograms have percents on vertical axis.
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 2 Picturing Variation with Graphs.
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Analysis of Research Data
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Histogram A frequency plot that shows the number of times a response or range of responses occurred in a data set.
Histogram A frequency plot that shows the number of times a response or range of responses occurred in a data set.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Today: Central Tendency & Dispersion
Describing Data: Numerical
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Programming in R Describing Univariate and Multivariate data.
Exploratory Data Analysis. Height and Weight 1.Data checking, identifying problems and characteristics Data exploration and Statistical analysis.
Objective To understand measures of central tendency and use them to analyze data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Numerical Descriptive Techniques
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Created by Tom Wegleitner, Centreville, Virginia Section 3-1 Review and.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Describing Data.
Chapter 2 Describing Data.
Wednesday, May 13, 2015 Report at 11:30 to Prairieview.
Categorical vs. Quantitative…
INVESTIGATION 1.
CHAPTERS 1 AND 2: DESCRIPTIVE STATISTICS STAT 241/251.
Agenda Descriptive Statistics Measures of Spread - Variability.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
To be given to you next time: Short Project, What do students drive? AP Problems.
Chapter 3 – Graphical Displays of Univariate Data Math 22 Introductory Statistics.
Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
LIS 570 Summarising and presenting data - Univariate analysis.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
Presenting Data Descriptive Statistics. Chapter- Presentation of Data Mona Kapoor.
Descriptive Statistics(Summary and Variability measures)
Chapter 0: Why Study Statistics? Chapter 1: An Introduction to Statistics and Statistical Inference 1
Graphs with SPSS Aravinda Guntupalli. Bar charts  Bar Charts are used for graphical representation of Nominal and Ordinal data  Height of the bar is.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
Descriptive Statistics ( )
UNIT ONE REVIEW Exploring Data.
Thursday, May 12, 2016 Report at 11:30 to Prairieview
EHS 655 Lecture 4: Descriptive statistics, censored data
MATH-138 Elementary Statistics
Analysis and Empirical Results
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
ISE 261 PROBABILISTIC SYSTEMS
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Chapter 1 & 3.
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Distributions and Graphical Representations
Unit 1 - Graphs and Distributions
Basics of Statistics.
Displaying Distributions with Graphs
Basic Statistical Terms
Describing Distributions
Honors Statistics Review Chapters 4 - 5
Advanced Algebra Unit 1 Vocabulary
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Introductory Statistics
Presentation transcript:

Lecture 2 Summarizing the Sample

WARNING: Today’s lecture may bore some of you… It’s (sort of) not my fault…I’m required to teach you about what we’re going to cover today.

I’ll try to make it as exciting as possible… But you’re more than welcome to fall asleep if you feel like this stuff is too easy

Lecture Summary Once we obtained our sample, we would like to summarize it. Depending on the type of the data (numerical or categorical) and the dimension (univariate, paired, etc.), there are different methods of summarizing the data. – Numerical data have two subtypes: discrete or continuous – Categorical data have two subtypes: nominal or ordinal Graphical summaries: – Histograms: Visual summary of the sample distribution – Quantile-Quantile Plot: Compare the sample to a known distribution – Scatterplot: Compare two pairs of points in X/Y axis.

Three Steps to Summarize Data 1.Classify sample into different type 2.Depending on the type, use appropriate numerical summaries 3.Depending on the type, use appropriate visual summaries

Data Classification

Data Types For each dimension… Numerical ContinuousDiscrete Categorical NominalOrdinal

Summaries for numerical data Center/location: measures the “center” of the data – Examples: sample mean and sample median Spread/Dispersion: measures the “spread” or “fatness” of the data – Examples: sample variance, interquartile range Order/Rank: measures the ordering/ranking of the data – Examples: order statistics and sample quantiles

SummaryType of Sample FormulaNotes Continuous Summarizes the “center” of the data Sensitive to outliers Continuous Summarizes the “spread” of the data Outliers may inflate this value Continuousi th largest value of the sample Summarizes the order/rank of the data Continuous Summarizes the “center” of the data Robust to outliers Continuous Summarizes the order/rank of the data Robust to outliers Sample Interquartile Range (Sample IQR) Continuous Summarizes the “spread” of the data Robust to outliers

Multivariate numerical data Each dimension in multivariate data is univariate and hence, we can use the numerical summaries from univariate data (e.g. sample mean, sample variance) However, to study two measurements and their relationship, there are numerical summaries to analyze it Sample Correlation and Sample Covariance

Sample Correlation and Covariance

How about categorical data?

Summaries for categorical data Frequency/Counts: how frequent is one category Generally use tables to count the frequency or proportions from the total Example: Stat 431 class composition a UndergradGraduateStaff Counts1712 Proportions

Are there visual summaries of the data? Histograms, boxplots, scatterplots, and QQ plots

Histograms For numerical data A method to show the “shape” of the data by tallying frequencies of the measurements in the sample Characteristics to look for: – Modality: Uniform, unimodal, bimodal, etc. – Skew: Symmetric (no skew), right/positive-skewed, left/negative-skewed distributions – Quantiles: Fat tails/skinny tails – Outliers

Boxplots For numerical data Another way to visualize the “shape” of the data. Can identify… – Symmetric, right/positive-skewed, and left/negative- skewed distributions – Fat tails/skinny tails – Outliers However, boxplots cannot identify modes (e.g. unimodal, bimodal, etc.)

Quantile-Quantile Plots (QQ Plots) For numerical data: visually compare collected data with a known distribution Most common one is the Normal QQ plots – We check to see whether the sample follows a normal distribution – This is a common assumption in statistical inference that your sample comes from a normal distribution Summary: If your scatterplot “hugs” the line, there is good reason to believe that your data follows the said distribution.

Making a Normal QQ plot

If your data is not normal… You can perform transformations to make it look normal For right/positively-skewed data: Log/square root For left/negatively-skewed data: exponential/square

Comparing the three visual techniques Histograms Advantages: – With properly-sized bins, histograms can summarize any shape of the data (modes, skew, quantiles, outliers) Disadvantages: – Difficult to compare side- by-side (takes up too much space in a plot) – Depending on the size of the bins, interpretation may be different Boxplots Advantages: – Don’t have to tweak with “graphical” parameters (i.e. bin size in histograms) – Summarize skew, quantiles, and outliers – Can compare several measurements side-by- side Disadvantages: – Cannot distinguish modes! Advantages: – Can identify whether the data came from a certain distribution – Don’t have to tweak with “graphical” parameters (i.e. bin size in histograms) – Summarize quantiles Disadvantages: – Difficult to compare side-by-side – Difficult to distinguish skews, modes, and outliers QQ Plots

Scatterplots

Lecture Summary Once we obtain a sample, we want to summarize it. There are numerical and visual summaries – Numerical summaries depend on the data type (numerical or categorical) – Graphical summaries discussed here are mostly designed for numerical data We can also look at multidimensional data and examine the relationship between two measurement – E.g. sample correlation and scatterplots

Extra Slides

Why does the QQ plot work?

Linear Interpolation in Sample Quantiles