Download presentation
Presentation is loading. Please wait.
Published byTrevor Gilmore Modified over 8 years ago
1
Descriptive Statistics Unit 6
2
Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton count Can be classifies as either: -categorical -quantitative: *discrete *continuous
3
Categorical Variable Data belongs to one of a set of categories Exs: 1.Gender (Male or Female) 2.Pets owned (dog, cat, great white…) 3.Type of food imported (beef, pork, shellfish …) 4.Engage in 30 minutes of exercise daily (Yes or No) Type of graph(s): bar, pie
4
Pie Charts Summarizes categorical variable Drawn as circle where each category is a slice The size of each slice is proportional to the percentage in that category
5
Bar Graphs Summarizes categorical variable Vertical bars for each category Height of each bar represents either counts or percentages Easier to compare categories with bar graph than with pie chart Called Pareto Charts when ordered from tallest to shortest
6
Quantitative Variable Data is given numerical values for different magnitudes. Exs: 1.Age of test subjects 2.Number of siblings 3.Seasonal changes in pH of pond water Type of graph: scatter-plot, line, stem and leaf
7
Quantitative vs. Categorical For Quantitative variables, key features are the center (a representative value) and spread (variability). For Categorical variables, a key feature is the percentage of data in each of the categories
8
Discrete Quantitative Variable Quantitative variable is discrete if its possible values form a set of separate numbers: 0,1,2,3,…. Exs: 1.Number of calico cats sold 2.Number of nests with down linings 3.Number of students who fall asleep in Stats class
9
Continuous Quantitative Variable Quantitative variable is continuous if its possible values form an interval Measurements Examples: 1.Height/Weight 2.Age 3.Blood pressure
12
Most Common Way to Describe Data Central tendency Statistical variation
13
Central Tendency Used to represent entire data set Highlights distribution of data Measures one of the following: mode, mean, and median
14
Mode Value that occurs most often Highest bar in the histogram Mode is most often used with categorical data Best if not used alone
15
12, 12, 13, 14, 14, 15, 15, 15, 15, 37, 38 2, 3, 3, 4, 5, 5- bimodal 65, 68, 69, 71, 72, 73, 75, 77- mode?
16
Mean The sum of the observations divided by the number of observations Measure of centermost point when there is a symmetrical distribution of values in a data set Mean = Σx Σ- sum n n- total number of values
17
8g/cm³, 10g/cm³, 7g/cm³, 9g/cm³ 8g/cm³ + 10g/cm³ + 7g/cm³ + 9g/cm³ 4 34g/cm³ 4 8.5g/cm³
18
Median Midpoint of the observations when ordered from least to greatest Used when there are extremes in data 1. Order observations 2. If the number of observations is: a)Odd, the median is the middle observation b)Even- the median is the average of the two middle observations
20
Central Tendency If data set has normal distribution: mean, median and mode are the same value If data set is not distributed normally: values of central tendency will vary. *requires inferential statistics: t-test, ANOVA
22
Comparing the Mean and Median Mean and median of a symmetric distribution are close In a skewed distribution: the mean is farther out than the median
23
Statistical Variation Shows how scores differ from one another AKA: variation, dispersion, spread Represent average difference from the mean Four measures of variation: range, interquartile range, standard deviation, variance
24
Range Most general measure of variation Measures difference between highest and lowest values: spread of data Ex. pH 6, 6, 6, 7, 7, 7, 7, 5, 3 range: 7-3 = pH 4
25
Range range is strongly affected by outliers.
26
Interquartile Range- IQR AKA mid-fifty or midspread Organizes data into 4 quartiles, each with 25% of data To calculate IQR: 1. Find median of entire data set 2. Find median of lower half of set- lower quartile 3. Find median of upper half of set- upper quartile
27
Quartiles
28
M = median = 3.4 Q 1 = first quartile = 2.2 Q 3 = third quartile = 4.35 Measure of Spread: Quartiles * 25% of the data at or below Q 1 and 75% above * 50% of the obs are above the median and 50% are below * 75% of the data at or below Q 3 and 25% above
29
Calculating Interquartile Range Interquartile range: distance between the third and first quartile, giving spread of middle 50% of the data: IQR = Q3 - Q1
30
Standard Deviation Each data value has an associated deviation from the mean, A deviation is positive if it falls above the mean and negative if it falls below the mean The sum of the deviations is always zero
31
Standard deviation: summarizes the deviations of each observation from the mean and calculates an adjusted average of these deviations: Standard Deviation 1. Find mean 2. Find each deviation 3. Square deviations 4. Sum squared deviations 5. Divide sum by n-1 6. Take square root
32
Outlier An outlier falls far from the rest of the data
33
Graphs for Quantitative Data 1.Dot Plot: shows a dot for each observation placed above its value on a number line 2.Stem-and-Leaf Plot: displays individual observations 3.Histogram: uses bars to portray the data
34
Which Graph? Dot-plot and stem-and-leaf plot: More useful for small data sets Data values are retained Histogram More useful for large data sets Most compact display More flexibility in defining intervals content.answers.com
35
Dot Plots To construct a dot plot 1.Draw and label horizontal line 2.Mark regular values 3.Place a dot above each value on the number line Sodium in Cereals
36
Stem-and-leaf plots Summarizes quantitative variables Separates each observation into a stem (first part of #) and a leaf (last digit) Write each leaf to the right of its stem; order leaves if desired Sodium in Cereals
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.