Math 3680 Lecture #1 Graphical Representation of Data.

Slides:



Advertisements
Similar presentations
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 14 Descriptive Statistics 14.1Graphical Descriptions of Data 14.2Variables.
Advertisements

Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Statistics Unit 6.
Statistics It is the science of planning studies and experiments, obtaining sample data, and then organizing, summarizing, analyzing, interpreting data,
Measures of Dispersion
1 Chapter 1: Sampling and Descriptive Statistics.
Averages and Variation
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Statistics Lecture 2. Last class began Chapter 1 (Section 1.1) Introduced main types of data: Quantitative and Qualitative (or Categorical) Discussed.
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 4: The Normal Distribution and Z-Scores.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Describing distributions with numbers
M08-Numerical Summaries 2 1  Department of ISM, University of Alabama, Lesson Objectives  Learn what percentiles are and how to calculate quartiles.
Chapter 12: Describing Distributions with Numbers We create graphs to give us a picture of the data. We also need numbers to summarize the center and spread.
REPRESENTATION OF DATA.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Copyright © Cengage Learning. All rights reserved. Averages and Variation 3.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Chapter 3 Averages and Variations
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
Describing distributions with numbers
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
14.1 Data Sets: Data Sets: Data set: collection of data values.Data set: collection of data values. Frequency: The number of times a data entry occurs.Frequency:
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Categorical vs. Quantitative…
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
To be given to you next time: Short Project, What do students drive? AP Problems.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Engineering Statistics KANCHALA SUDTACHAT. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems.
Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.
Unit 3: Averages and Variations Part 3 Statistics Mr. Evans.
Unit 2: Some Basics. The whole vs. the part population vs. sample –means (avgs) and “std devs” [defined later] of these are denoted by different letters.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU.
Descriptive Statistics ( )
Exploratory Data Analysis
Statistics 1: Statistical Measures
Exploring Data Descriptive Data
ISE 261 PROBABILISTIC SYSTEMS
PROBABILITY AND STATISTICS
Statistics Unit Test Review
CHAPTER 5 Basic Statistics
NUMERICAL DESCRIPTIVE MEASURES
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics
Percentiles and Box-and- Whisker Plots
Topic 5: Exploring Quantitative data
Numerical Measures: Skewness and Location
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Quartile Measures DCOVA
Define the following words in your own definition
. . Box and Whisker Measures of Variation Measures of Variation 8 12
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Presentation transcript:

Math 3680 Lecture #1 Graphical Representation of Data

In this first lecture, we will discuss some brief quantitative measures, which capture essential properties of a data set. This is often important in presentations: – It is often not necessary to report exactly how each subject faired in an experiment. – Instead, report succinct summaries of the data. – Your audience has a short attention span – Communicate only the most important information

Types of Variables

Population - some generalization about a class of individuals, set of measurements, either existing or conceptual Sample - subset of measurements from the population, some part of the population being examined Units/Subjects - the things/people in a population Inferences - a generalization made about a population based on a sample Parameters - numerical facts about a population that investigators want to know Statistics - numbers which can be computed from a sample. Parameters are estimated by statistics.DEFINITIONS

Variables: There are several ways to characterize data – qualitative and quantitative. Qualitative or categorical variables have answers which are descriptive words or phrases. –Ordinal : can be meaningfully ranked (e.g. survey data, grades) –Nominal : cannot be meaningfully ranked (e.g. race, gender etc.) Quantitative variables have answers which are numbers. –Discrete variables (e.g. number of home runs) have gaps between possible values –Continuous variables (e.g. household income) have no gaps between possible values

Exercise: Classify the following variables as qualitative (nominal or ordinal) or quantitative (discrete or continuous). occupation weight opinion of teaching effectiveness region of residence grade point average height number of televisions owned blood type size of wrench randomly chosen from a wrench set

Median, Interquartile Range and Box-and-Whiskers Plot

Definition: Mode The mode is the most frequently occurring value. (With rare exceptions, the mode is useless.) What is the mode for the given data (points scored in the NFL postseason, )?

Definition: Median The median is chosen so that half of the data lies above the median and half lies below. What is the median for the given data? To find the median, we first order the data, counting multiplicities:

If there is an even number of data values, the median must be constructed as above. If there is an odd number of data values, the median is simply the middle value. Short Cut: For a data set with n values, the median rank is the entry. If this rank ends in 0.5, we take the average of the data values in the adjacent positions. th n        2 1

While the median is often a useful summary for data, it is not complete by itself. In particular, it does not provide information about the spread of the data. Example: three data sets with median 60:

Definition: Range. The range of a data set is the difference between the largest element and the smallest element. That is: range = largest – smallest. While the range measures variation, it is not perfect

Definitions: First Quartile. The first quartile is chosen so that 25% of the data lie at or below it. Second Quartile.The second quartile is chosen so that 50% of the data lie at or below it. Third Quartile. The third quartile is chosen so that 75% of the data lie at or below it.

1. Rank the data from smallest to largest. 2. Find the median – it is the second quartile. 3. Take the lower half of the data. (If there are an odd number of measurements, include the median.) The median of this lower half is the first quartile, Q 1. 4.Repeat for the upper half to find the third quartile, Q The difference Q 3 - Q 1 is called the interquartile range (IQR). Computing quartiles:

Computing quartiles may be facilitated by using Microsoft Excel:

BOX-AND-WHISKER PLOTS 1. Draw a vertical scale to include the low and high values. 2. To the scale’s right, draw a box between the first and third quartiles. 3. Draw a line through the box at the median value. 4. Draw lines (whiskers) from the box to the low and high values. 5. Often the whiskers are drawn to the most extreme values within 1.5 IQR of both Q 1 and Q 3. Symbols (+, *) are used to mark each possible outlier between 1.5 and 3 IQR, and each probable outlier beyond 3 IQR of both Q 1 and Q 3, respectively.

Exercise: Draw a boxplot for the domestic gross receipts of the top 100 movies of all time: Note: Many statistical software packages (SPSS, SAS, etc.) can create boxplots automatically. Unfortunately, Excel is not one of them.

Stem-and-Leaf Plots

Histograms: Continuous Data

In a histogram: 1) A histogram is a special kind of bar chart. 2) Percentages are represented by areas, not heights. 3) The height of a block represents the percentage per horizontal unit. 4) Be sure to decide on the endpoint convention.

Ex. Construct a histogram for the 2007 salaries of the 50 U.S. governors (p. 27, from Council of State Governments): Relative Density Class Frequency frequency per $1000 $ ,000 $ ,000 $ ,000 $ ,000 $ ,000 $ ,000 $ ,000

Ex: Draw a histogram for the domestic gross receipts of all movies that grossed at least $100 million: Relative Density Class Frequency frequency per $1M $ M $ M $ M $ M $ M $ M $ M $ M $ M

How do you decide on the classes? 1. Too few classes: very undescriptive. 2. Too many classes: very choppy. 3. Sturge’s rule of thumb: for a data set of size n, k ≈ log 2 n =, rounded up to the nearest integer. 4. For long tails, use wide classes as appropriate. 5. Within these guidelines, there are no absolute rules. ln n ln 2

Histograms and Excel Doing histograms correctly with Excel is very cumbersome. This chart was generated using the Histogram toolpack, as described on p. 42 of the textbook. What’s wrong with this picture?

Conclusion: Do NOT use Excel to make histograms Other software packages (R, Minitab, SPSS etc.) can make correct histograms. For now, just draw histograms by hand.

Histograms: Discrete Data

Example: A production line inspector records the number of defective items produced each hour of an eight-hour shift: Number Relative frequency of items Frequency (Density)

Notice that the bar for 2 stretches from 1.5 to 2.5, giving that bar a width of 1.