To be given to you next time: Short Project, What do students drive? AP Problems.

Slides:



Advertisements
Similar presentations
Very simple to create with each dot representing a data value. Best for non continuous data but can be made for and quantitative data 2004 US Womens Soccer.
Advertisements

Describing Quantitative Variables
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Descriptive Measures MARE 250 Dr. Jason Turner.
Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Measures of Dispersion
CHAPTER 4 Displaying and Summarizing Quantitative Data Slice up the entire span of values in piles called bins (or classes) Then count the number of values.
1 Chapter 1: Sampling and Descriptive Statistics.
Displaying & Summarizing Quantitative Data
Exploratory Data Analysis (Descriptive Statistics)
By the end of this lesson you will be able to explain/calculate the following: 1. Histogram 2. Frequency Polygons.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Chapter 1 & 3.
Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Programming in R Describing Univariate and Multivariate data.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 1 – Exploring Data YMS Displaying Distributions with Graphs xii-7.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
Describing distributions with numbers
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 4 Describing Numerical Data.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Categorical vs. Quantitative…
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
1 Chapter 4 Numerical Methods for Describing Data.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
LIS 570 Summarising and presenting data - Univariate analysis.
1 Never let time idle away aimlessly.. 2 Chapters 1, 2: Turning Data into Information Types of data Displaying distributions Describing distributions.
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Discrete vs. Continuous Variables Quantitative variables can be further classified as discrete or continuous. If a variable can take on any.
UNIT ONE REVIEW Exploring Data.
Exploratory Data Analysis
ISE 261 PROBABILISTIC SYSTEMS
Statistics Unit Test Review
Statistical Reasoning
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
CHAPTER 1 Exploring Data
Descriptive Statistics
DAY 3 Sections 1.2 and 1.3.
Chapter 5: Describing Distributions Numerically
Please take out Sec HW It is worth 20 points (2 pts
Topic 5: Exploring Quantitative data
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Displaying and Summarizing Quantitative Data
Chapter 1: Exploring Data
Honors Statistics Review Chapters 4 - 5
Advanced Algebra Unit 1 Vocabulary
Types of variables. Types of variables Categorical variables or qualitative identifies basic differentiating characteristics of the population.
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Presentation transcript:

To be given to you next time: Short Project, What do students drive? AP Problems

DATA – Who What Why

Categorical vs. Quantitative Variables

The distribution tells us what values the variables take and how often they take them…

Fundamentals… Every Graph should have:

PIE CHARTS What it is: A pie chart is used to show visually the proportions of parts of something being studied. The area of each slice of the pie shows the slice’s proportion to the entire category being studied and to the other slices. A pie chart shows data at one point in time, like a snapshot; it does not show change in data over time like a line chart does.

DOTPLOTS

STEMPLOTS ex: heights.. Would this be better split into two?

WATCH YOUR SOCS…

SHAPE The shape of a distribution is described by the following characteristics. Symmetry. When it is graphed, a symmetric distribution can be divided at the center so that each half is a mirror image of the other. Number of peaks. Distributions can have few or many peaks. Distributions with one clear peak are called unimodal, and distributions with two clear peaks are called bimodal. When a symmetric distribution has a single peak at the center, it is referred to as bell-shaped. Skewness. When they are displayed graphically, some distributions have many more observations on one side of the graph than the other. Distributions with most of their observations on the left (toward lower values) are said to be skewed right; and distributions with most of their observations on the right (toward higher values) are said to be skewed left. Uniform. When the observations in a set of data are equally spread across the range of the distribution, the distribution is called a uniform distribution. A uniform distribution has no clear peaks.

Examples of Each?

OUTLIERS Gaps. Gaps refer to areas of a distribution where there are no observations. The first figure below has a gap; there are no observations in the middle of the distribution. Outliers. Sometimes, distributions are characterized by extreme values that differ greatly from the other observations. These extreme values are called outliers. The second figure below illustrates a distribution with an outlier. Except for one lonely observation (the outlier on the extreme right), all of the observations fall between 0 and 4. As a "rule of thumb", an extreme value is often considered to be an outlier if it is at least 1.5 interquartile ranges below the first quartile (Q1), or at least 1.5 interquartile ranges above the third quartile (Q3).

CENTER Graphically, the of a distribution is located at the median of the distribution. This is the point in a graphic display where about half of the observations are on either side. In the chart to the right, the height of each column indicates the frequency of observations.

SPREAD The spread of a distribution refers to the variability of the data. If the observations cover a wide range, the spread is larger. If the observations are clustered around a single value, the spread is smaller.range

Bar Charts A bar chart is made up of columns plotted on a graph. Here is how to read a bar chart.

HISTOGRAMS Like a bar chart, a histogram is made up of columns plotted on a graph. Usually, there is no space between adjacent columns. Here is how to read a histogram. The columns are positioned over a label that represents a quantitative variable. The column label can be a single value or a range of values. The height of the column indicates the size of the group defined by the column label.

Here is the main difference between bar charts and histograms. With bar charts, each column represents a group defined by a categorical variable; and with histograms, each column represents a group defined by a quantitative variable. One implication of this distinction: it is always appropriate to talk about the skewness of a histogram; that is, the tendency of the observations to fall more on the low end or the high end of the X axis. With bar charts, however, the X axis does not have a low end or a high end; because the labels on the X axis are categorical - not quantitative. As a result, it is less appropriate to comment on the skewness of a bar chart.

Percentiles Assume that the elements in a data set are rank ordered from the smallest to the largest. The values that divide a rank-ordered set of elements into 100 equal parts are called percentiles An element having a percentile rank of Pi would have a greater value than i percent of all the elements in the set. Thus, the observation at the 50th percentile would be denoted P50, and it would be greater than 50 percent of the observations in the set. An observation at the 50th percentile would correspond to the median value in the set.

Comparing Distributions

Discrete vs. Continuous Variables Quantitative variables can be further classified as discrete or continuous. If a variable can take on any value between two specified values, it is called a continuous variable; otherwise, it is called a discrete variable. Some examples will clarify the difference between discrete and continuous variables.

Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds. Suppose we flip a coin and count the number of heads. The number of heads could be any integer value between 0 and plus infinity. However, it could not be any number between 0 and plus infinity. We could not, for example, get 2.5 heads. Therefore, the number of heads must be a discrete variable.

Statistical data is often classified according to the number of variables being studied. Univariate data. When we conduct a study that looks at only one variable, we say that we are working with univariate data. Suppose, for example, that we conducted a survey to estimate the average weight of high school students. Since we are only working with one variable (weight), we would be working with univariate data. Bivariate data. When we conduct a study that examines the relationship between two variables, we are working with bivariate data. Suppose we conducted a study to see if there were a relationship between the height and weight of high school students. Since we are working with two variables (height and weight), we would be working with bivariate data

Statisticians use summary measures to describe patterns of data. Measures of central tendency refer to the summary measures used to describe the most "typical" value in a set of values. The most common of these is the MEDIAN and the MEAN

As measures of central tendency, the mean and the median each have advantages and disadvantages. Some pros and cons of each measure are summarized below. To illustrate these points, consider the following example... Suppose we examine a sample of 10 households to estimate the typical family income. Nine of the households have incomes between $20,000 and $100,000; but the tenth household has an annual income of $1,000,000,000. That tenth household is an outlier. If we choose a measure to estimate the income of a typical household, the mean will greatly over-estimate family income (because of the outlier); while the median will not. Thus, we say the Median is RESISTANT and the Mean is NOT RESISTANT…

The range is the difference between the largest and smallest values in a set of values.set For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11. For this set of numbers, the range would be or 10.

The interquartile range (IQR) is the difference between the largest and smallest values in the middle 50% of a set of data. For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11.

1, 3, 4, 5, 5, 6, 7, 11

In a population, variance is the average squared deviation from the population mean, as defined by the following formula: where σ 2 is the population variance, μ is the population mean, X i is the i th element from the population, and N is the number of elements in the population. The variance of a sample, is defined by slightly different formula, and uses a slightly different notation: where s 2 is the sample variance, x is the sample mean, x i is the ith element from the sample, and n is the number of elements in the sample. Using this formula, the sample variance can be considered an unbiased estimate of the true population variance. Therefore, if you need to estimate an unknown population variance, based on data from a sample, this is the formula to use.

The standard deviation is the square root of the variance…

5-Number Summary and Boxplots… Boxplot splits the data set into quartiles. The body of the boxplot consists of a "box" (hence, the name), which goes from the first quartile (Q1) to the third quartile (Q3).quartiles Within the box, a vertical line is drawn at the Q2, the median of the data set. Two horizontal lines, called whiskers, extend from the front and back of the box. The front whisker goes from Q1 to the smallest non-outlier in the data set, and the back whisker goes from Q3 to the largest non-outlier.median

OUTLIERS…

Range- IQR- Median-

Sometimes, researchers change units (minutes to hours, feet to meters, etc.). Here is how measures of variability are affected when we change units. CHANGING UNITS If you add a constant to every value, the distance between values does not change. As a result, all of the measures of variability (range, interquartile range, standard deviation, and variance) remain the same. On the other hand, suppose you multiply every value by a constant. This has the effect of multiplying the range, interquartile range (IQR), and standard deviation by that constant. It has an even greater effect on the variance. It multiplies the variance by the square of the constant.

Choosing a summary of your data…