Download presentation
Presentation is loading. Please wait.
Published byElinor Tucker Modified over 8 years ago
2
Describing Data Week 1
3
The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where: Where was the data measured? When: When was the measurement done? HoW: How was the data measured? Why: Why was the measurement done?
4
Always Check the W’s Anytime you see data always check the W’s. This will help spot questionable statistics. ALWAYS QUESTION DATA
5
Variables (The What) Variables are characteristics that are recorded about each individual. Categorical variables are non-numeric in nature. Quantitative variables are measurements and have units
6
Displaying and Describing Categorical Data
7
Terms Frequency table: Categories and counts Distribution: lists the frequencies of each category Distribution: lists the relative frequencies of each category Contingency Table: The frequencies or relative frequencies of 2 variables.
8
Terms Marginal Distribution: the totals found on the margins of the chart. The distribution of one of the two variables Conditional distribution: the distribution of one row or column of a contingency table. Independence: two variables are independent if the conditional distribution of all the values of a variable is the same as the marginal distribution of that variable. (Huh!)
9
Three Rules of Data Analysis First, make a picture!
10
Or you could
11
Why? Pictures reveal things charts don’t. Patterns can be revealed that are not readily apparent from the numbers. Pictures are the easiest way to explain to others about the data
12
To Make a Graph Make piles. Organize the data into like groups Make a frequency table Make a relative frequency table by finding the percentages
13
Make a Graph Probably a bar chart graphing the frequencies or... A pie chart to graph the relative frequencies Beware of the area principle. Stay 2-D
14
To Make a Graph of Categorical Data Think Check W’s Identify the variables Check to see if categories overlap Data are counts
15
To Make a Graph of Categorical Data Show Select the appropriate graph to compare categories Bar Graph for frequencies Pie Chart for relative frequencies (percents) Stacked bar graph can be used instead of a pie chart
16
To Make a Graph of Categorical Data Tell Interpret the results Describe the results in the context of the problem Answers are sentences not numbers
17
Displaying Quantitative Data More Graphs
18
Histograms Think: Must be quantitative data Want to see the distribution Could be counts or percents
19
Stem and Leaf Plots Think Must be quantitative data Want to see the distribution Usually counts Relatively small sample size
20
Stem and Leaf Plot Show Scale is usually vertical Put the ‘Stems’ on the vertical scale Stems are usually the data without the last digit Might be rounded If there are a lot of leaves with one stem make dual stems and put 0-4 on one and 5-9 on the other Plot the ‘leaves’
21
Dot Plot Think Must be quantitative data Want to see the distribution Usually counts Relatively small sample size
22
Dot Plot Show Scale can be vertical or horizontal Place a dot at the appropriate location
23
Describing the Distribution Tell Shape How many humps? Unimodal Bimodal - maybe more than one group thrown together Multimodal Uniform Symmetric Skewed Gaps Clusters
24
Describing the Distribution Tell (continued) Center What is the middle value What is the middle range
25
Describing the Distribution Tell (Continued) Spread Range = Maximum value - minimum value Variation: How much does the data jump around
26
Outliers Discuss any data points that do not seem to fit the overall pattern. Is there a logical explanation for them to be that different?
27
Comparing Two Distributions Compare the centers of the two distributions Compare the shapes of the two distributions Compare the spread of the two distributions Compare any extreme values (outliers) of the two distributions.
28
Time Plot Think: Quantitative data Looking for trends Show Time is horizontal scale Plot data Connect the dots Can use calculator
29
Describing Distributions with Numbers
30
Measurements of the Center Mean: The ‘Average’ µ mean of a population mean of a sample Unique Median: The middle score Sort the data Middle score or the average of the middle two scores Unique
31
More Center Measurers Mode: The most common score Not necessarily unique Does Not necessarily exist
32
Finding Quartiles Sort the data Find the median The 1st quartile (25% mark) is the median of the smaller half of the data The 3rd quartile (75% mark) is the median of the larger half of the data
33
The Five Number Summary The minimum data point The 1st quartile The median The 3rd quartile The largest data point
34
InterQuartile Range and Outliers Outliers are data points that do not fit the pattern of the distribution. Interquartile range IQR is the difference of the 3rd quartile - the 1st quartile An outlier is a point more that one and half times the IQR below the 1st quartile number or one and half times the IQR above the 3rd quartile
35
Checking for Outliers Find the 5 number summary Calculate the Interquartile Range IQR = 3rd quartile - 1st quartile Lower cut off point = 1st quartile– 1.5(IQR) Upper cut off point = 3rd quartile+ 1.5(IQR) Check for data outside the cut off points
36
The Normal Model Density Curves and Normal Distributions
37
A Density Curve: Is always on or above the x axis Has an area of exactly 1 between the curve and the x axis Describes the overall pattern of a distribution The area under the curve above any range of values is the proportion of all the observations that fall in that range.
38
Mean vs Median The median of a density curve is the equal area point that divides the area under the curve in half The mean of a density function is the center of mass, the point where curve would balance if it were made of solid material
39
Normal Curves Bell shaped, Symmetric,Single-peaked Mean = µ Standard deviation = Notation N(µ, ) One standard deviation on either side of µ is the inflection points of the curve
40
68-95-99.7 Rule 68% of the data in a normal curve at least is within one standard deviation of the mean 95% of the data in a normal curve at least is within two standard deviations of the mean 99.7% of the data in a normal curve at least is within three standard deviations of the mean
41
Why are Normal Distributions Important? Good descriptions for many distributions of real data Good approximation to the results of many chance outcomes Many statistical inference procedures are based on normal distributions work well for other roughly symmetric distributions
42
Standard Normal Curve
43
Standardizing (z-score) If x is from a normal population with mean equal to µ and standard deviation, then the standardized value z is the number of standard deviations x is from the mean Z = (x - µ)/ The unit on z is standard deviations
44
Standard Normal Distribution A normal distribution with µ = 0 and = 1, N(0,1) is called a Standard Normal distribution Z-scores are standard normal where z=(x-µ)/
45
Standard Normal Tables Table B (pg 552) in your book gives the percent of the data to the left of the z value. Or in your Standard Normal table Find the 1st 2 digits of the z value in the left column and move over to the column of the third digit and read off the area. To find the cut-off point given the area, find the closest value to the area ‘inside’ the chart. The row gives the first 2 digits and the column give the last digit
46
Solving a Normal Proportion State the problem in terms of a variable (say x) in the context of the problem Draw a picture and locate the required area Standardize the variable using z =(x-µ)/ Use the calculator/table and the fact that the total area under the curve = 1 to find the desired area. Answer the question.
47
Finding a Cutoff Given the Area State the problem in terms of a variable (say x) and area Draw a picture and shade the area Use the table to find the z value with the desired area Go z standard deviations from the mean in the correct direction. Answer the question.
48
Assessing Normality In order to use the previous techniques the population must be normal To assessing normality : Construct a stem plot or histogram and see if the curve is unimodal and roughly symmetric around the mean
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.