Types of (random) variables

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

HS 67 - Intro Health Statistics Describing Distributions with Numbers
Describing Data: One Variable
Data analysis: Explore GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 9.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Statistics.
1 The Islamic University of Gaza Civil Engineering Department Statistics ECIV 2305 ‏ Chapter 6 – Descriptive Statistics.
Descriptive Statistics
Descriptive statistics (Part I)
Use of Quantile Functions in Data Analysis. In general, Quantile Functions (sometimes referred to as Inverse Density Functions or Percent Point Functions)
Describing distributions with numbers
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Objectives 1.2 Describing distributions with numbers
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Introduction to Descriptive Statistics Objectives: 1.Explain the general role of statistics in assessment & evaluation 2.Explain three methods for describing.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
Descriptive Statistics becoming familiar with the data.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 4 Describing Numerical Data.
Exploratory Data Analysis Observations of a single variable.
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Measurement Variables Describing Distributions © 2014 Project Lead The Way, Inc. Computer Science and Software Engineering.
Engineering Statistics KANCHALA SUDTACHAT. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems.
Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Unit 3: Averages and Variations Week 6 Ms. Sanchez.
Chapter 16 Exploratory data analysis: numerical summaries CIS 2033 Based on Textbook: A Modern Introduction to Probability and Statistics Instructor:
1 Chapter 2 Bivariate Data A set of data that contains information on two variables. Multivariate A set of data that contains information on more than.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Descriptive Statistics
II. Descriptive Statistics (Zar, Chapters 1 - 4).
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Chap 1: Exploring Data 1.3: Measures of Center 1.4: Quartiles, Percentiles, and Box Plots 1.7: Variance and Standard Deviation.
Introduction to Statistics
Introduction to Statistics
Parameter, Statistic and Random Samples
Methods for Describing Sets of Data
EHS 655 Lecture 4: Descriptive statistics, censored data
Chapter 16: Exploratory data analysis: numerical summaries
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis
ISE 261 PROBABILISTIC SYSTEMS
Chapter 6 – Descriptive Statistics
Data description and transformation
Module 6: Descriptive Statistics
Review Data: {2, 5, 6, 8, 5, 6, 4, 3, 2, 1, 4, 9} What is F(5)? 2 4 6
M7Plus Unit-10: Statistics CMAPP Days (Compacted Days 1 – 5 )
Numerical Measures: Centrality and Variability
Description of Data (Summary and Variability measures)
Bar graphs are used to compare things between different groups
Summary Statistics 9/23/2018 Summary Statistics
DAY 3 Sections 1.2 and 1.3.
HMI 7530– Programming in R STATISTICS MODULE: Basic Data Analysis
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Exploratory data analysis: numerical summaries
Statistics: The Interpretation of Data
Stat 251 (2009, Summer) Lab 1 TA: Yu, Chi Wai.
Continuous Statistical Distributions: A Practical Guide for Detection, Description and Sense Making Unit 3.
Chapter 1: Exploring Data
SYMMETRIC SKEWED LEFT SKEWED RIGHT
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Advanced Algebra Unit 1 Vocabulary
Biostatistics Lecture (2).
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Types of (random) variables Categorical (Nominal) Variables favorite color: red,blue, sky blue, orange ) No natural order, no clear distance measure Ordinal Variables: A natural order, but the measure of distance not defined / imprecise / arbitrary (e.g. rate on a scale of 1-10 the quality of this your teacher) Interval Variables: Usually random variables with real number values Natural order, and clear-defined measure of distance Ratio Variables: Similar to Interval variables, but with a lower value limit at 0.0 NOTE: We will not make a formal distinction between the last two, rather one should be aware of the physical meaning of the values: e.g. temperatures expressed in F or C would be Interval variables; expressed in Kelvin, they qualify for Ratio Variables But still for our statistical analysis it wouldn’t make a difference (far away from 0K)

Description of random data samples (real-valued variables) (Random data that can be sorted by size) Sample Size Center or location of the sample Measure of the range of the sample Symmetry of the sample distribution (sometimes the range of possible outcomes of the random process is well-known by physical constraints)

Measure for the center of the sample Arithmetic mean: x x Summed up i i

Measure for the range of the sample Standard deviation: x “Bessel Correction:” gives an unbiased estimate. i

Measure for the range of the sample Variance: x Again: You often find the denominator (n-1) instead for an unbiased estimate i

R-Commands mean(x) var(x) (R uses the Bessel Correction (n-1)) sd(x) summary(x) is a more general function that gives a statistical summary of the data Example: summary(c(1,2,42,3,24,52)) returns Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 2.25 13.50 20.67 37.50 52.00

R Commands Median(x) is another measure of the data’s center point When you sort the data sample in ascending order, you find the mid-point of the sample: Example: x<-c(1,2,3,2,5,62) median(x) returns 2.5 mean(x) returns 12.5 We see the mean is not robust against outliers in the sample, but the median give a robust result.

R Commands Imagine, we did a small typo in the last example, and the real data sample had seven elements x<-c(1,2,3,2,5,6,2) median(x) returns 2 mean(x) returns 3

Quartiles and Quantiles Similar to finding the median of the sample data, one can define the lower and upper quartiles of the data sample That is, you sort the data x in ascending order {x1, x2, x3, … , xn} are ordered data with xi indicating the i-th smallest data if you have for example 100 data then the lower quartile value is closest to (or interpolated between the closest ranks) the position i such that 25% of the values are lower than the quartile value xi=26 The upper quantile would be at xi=76

Quartiles and Quantiles In small sample sizes and the even and odd samples sizes require modification to the estimation of median and quartiles In general: with large sample size one can sort the data and use the probability estimate for the chances of exceeding a certain value in the sorted sample: Let n be your sample size (say n=1000) then the p-th quantile is the value (in the same units of your sample data {xi, i=1,2,…,n}) that exceeds the values of your sample with a probability of p

Quantiles x n samples p-th quantile value: qp k Rank i p=k/n Sample data sorted in ascending order 0<= p <= 1

R Commands Visualization of data samples: hist(x) Albany Airport January 2014 daily mean temperatures [F] histogram. Sample size with n=31 is small. We count the number of days with temperatures in a certain range (bins). Instructions for R: Run script albany2.R after running script execute: hist(tday) Name is tday not tavg

R Commands Visualization of data samples: boxplot(x) Albany Airport January 2014 daily mean, min. and max. temperatures [F] We count the number of days with temperatures in a certain range (bins). R instructions: (make sure you have run script albany2.R) Execute: boxplot(list(tmin=jan$MIN,tavg=tday,tmax=jan$MAX)) Note: boxplot is best used if you want to compare two or more sample distributions visually. To achieve that, boxplot is given a list of data, each data sample get’s it’s own name in the list of data, and boxplot creates for each named data set of the list it’s own boxplot diagram)

Boxplot: max(tmin) Upper quartile median Lower quartile min(tmin) Note: Different flavors of boxplots circulate around. Oftentimes ‘outliers’ are plotted as extra dots, and the boxplot symbols are caculated without the outliers.