Lecturer Dr. Veronika Alhanaqtah

Slides:



Advertisements
Similar presentations
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Advertisements

Measures of Dispersion boxplots. RANGE difference between highest and lowest value; gives us some idea of how much variation there is in the categories.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 14 Descriptive Statistics 14.1Graphical Descriptions of Data 14.2Variables.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
15-Apr-15Created by Mr. Lafferty1 Statistics Mode, Mean, Median and Range Semi-Interquartile Range ( SIQR ) Nat 5 Quartiles Boxplots.
Project Maths - Teaching and Learning Relative Frequency % Bar Chart to Relative Frequency Bar Chart What is the median height.
Describing Data: One Variable
Descriptive Statistics Summarizing data using graphs.
Chapter 5: Understanding and Comparing Distributions
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Descriptive statistics (Part I)
M08-Numerical Summaries 2 1  Department of ISM, University of Alabama, Lesson Objectives  Learn what percentiles are and how to calculate quartiles.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Descriptive Statistics Summarizing data using graphs.
Percentiles For any whole number P (between 1 and 99), the Pth percentile of a distribution is a value such that P% of the data fall at or below it. The.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Comparing Statistical Data MeanMedianMode The average of a set of scores or data. The middle score or number when they are in ascending order. The score.
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
What is a box-and-whisker plot? 5-number summary Quartile 1 st, 2 nd, and 3 rd quartiles Interquartile Range Outliers.
Module 8 Test Review. Find the following from the set of data: 6, 23, 8, 14, 21, 7, 16, 8  Five Number Summary: Answer: Min 6, Lower Quartile 7.5, Median.
Unit 3 Guided Notes. Box and Whiskers 5 Number Summary Provides a numerical Summary of a set of data The first quartile (Q 1 ) is the median of the data.
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
Chap 1: Exploring Data 1.3: Measures of Center 1.4: Quartiles, Percentiles, and Box Plots 1.7: Variance and Standard Deviation.
Descriptive Statistics
Describing Data: Two Variables
Statistics 200 Lecture #4 Thursday, September 1, 2016
Box and Whisker Plots or Boxplots
Chapter 16: Exploratory data analysis: numerical summaries
Box and Whisker Plots and the 5 number summary
Box and Whisker Plots and the 5 number summary
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Draw a Box and Whisker Plot Read and interpret a Box and Whisker Plot
Chapter 16: Exploratory data analysis: Numerical summaries
Unit 6 Day 2 Vocabulary and Graphs Review
Statistical Reasoning
Description of Data (Summary and Variability measures)
Jeopardy Final Jeopardy Chapter 1 Chapter 2 Chapter 3 Chapter 4
Chapter 3 Describing Data Using Numerical Measures
Unit 7: Statistics Key Terms
Box and Whisker Plots Algebra 2.
Topic 5: Exploring Quantitative data
Representing Quantitative Data
Box Plots and Outliers.
CS3332(01) Course Description
pencil, red pen, highlighter, GP notebook, graphing calculator
Measures of Central Tendency
Day 52 – Box-and-Whisker.
Mean As A Balancing Point
EECS3030(02) Course Description
Descriptive Statistics
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Comparing Statistical Data
MBA 510 Review for Exam 1 February 8
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Probability and Statistics
Box and Whisker Plots and the 5 number summary
Box and Whisker Plots and the 5 number summary
pencil, red pen, highlighter, GP notebook, graphing calculator
Lecturer Dr. Veronika Alhanaqtah
Review of 6th grade material to help with new Statistics unit
Descriptive Statistics Civil and Environmental Engineering Dept.
Presentation transcript:

Lecturer Dr. Veronika Alhanaqtah STATISTICS Lecturer Dr. Veronika Alhanaqtah

Topic 3. Bivariate analysis 1.1. Relationship between two categorical variables - Mosaic Plots 1.2. Relationship between two categorical variables - Contingency Tables 1.3. Relationship between one categorical and one numeric variable – Side-by-side box plots

Relationship between two variables Goal: study relationship between two variables: between two categorical variables; between one numeric and one categorical variable.

1.1. Relationship between two categorical variables – Mosaic Plot We use a mosaic plot to study relationship between two or more categorical variables. It was introduced by Hartigan and Kleiner in 1981. Mosaic plot is just a Venn diagram. * A Venn diagram (a set diagram or logic diagram) is a diagram that shows all possible logical relations between a finite collection of different sets. Venn diagrams were conceived around 1880 by John Venn. They are used to teach elementary set theory, as well as illustrate simple set relationships in probability, logic, statistics, linguistics and compute science.

Marimekko Chart Marimekko Charts are used to visualise categorical data over a pair of variables.  Only in a Marimekko Chart, both axes are a variable with a scale, that determine both the width and height of each segment. This makes it possible to detect relationships between categories and their subcategories via the two variables.

Marimekko Chart

Marimekko Chart (Mosaic plot) Disadvantages: Marimekko Charts can be hard to read, especially with a large amount of segments. It's hard to accurately make comparisons between each segment, as they are not arranged next to each other along a common baseline. Application: Marimekko Charts are more ideal for giving an overview of the data.

Mosaic Plot. Example

Mosaic Plot

Example: Dataset on movies (n=134) Name Genre Budget Studio Audience 50/50 C 8 Ind 93 Warrior A 25 LG Harry Potter F 125 WB The help D DW 91 Money ball 50 Col 89 Legend: C – Comedy, A - Action , F – Fantasy, D – Documentary Source: www.informationisbeautiful.net

Example. Data set on movies What do we study here? We want to look at the relationship between whether a film was produced either by an Independent or a Major studio. And how that relates to whether the production budget fell into the 1st, 2nd, 3rd or 4th quartile. What do we know? Of our 134 movies, 24% were independent and 76% were made by a Major studio. The quartiles are split so that 25% of the movies fall into each quartile. The question of interest is: Is the distribution of production budget into the quartiles the same for Independent movies versus Major studios?

Construction of a mosaic plot is in the lecture Question of interest Is the distribution of production budget into the quartiles the same for Independent studio movies versus Major studios movies? Construction of a mosaic plot is in the lecture Answer: Independent movies are much more likely to be in the 1st quartile than Major studio movies are. The Major studio movies are much more likely to be in the 4th quartile than Independent studio movies are.

1.2. Relationship between two categorical variables - Contingency Tables Use the information in the mosaic plot to come up with number summaries for the relationship between two categorical variable - contingency table.

Contingency Table The contingency table contains 4 sets of numbers: count, total %, column % and row %. Columns stand for the four quartiles: 1st, 2nd, 3rd , 4th. Rows show whether the studio was Independent or Major studio. The last column and the last row are the totals. We work with a data set of 134 movies. Count Total % Column % Row % Q1 Q2 Q3 Q4 Total Independent Major

Contingency Table on Movie dataset Count Total % Column % Row % Q1 Q2 Q3 Q4 Total Independent studio 17 13% 50% 53% 6 4.5% 18% 19% 7 5% 21% 22% 2 1.5% 6% 32 Major studio 17% 27 20% 82% 26.5% 79% 31 23% 94% 30% 102 34 25% 33 134 C. Table shows: how many movies fall in any particular cell.

Exemplary exam questions: What proportion of movies in our movie data set, done by Major studios, falls in the 3rd quartile? Answer: We'll be looking at the total percent. If we go to Major studio and then 3rd quartile that should be 20%. If a movie done by an Independent studio, what proportion falls in the 2nd quartile? Answer: We'll be looking at the row percentiles. And for the 2nd quartile that would be 19 %. If our movies that are in the 4th quartile what percent are Independent? Answer: We'll be looking at the column percentiles. Under the 4th quarter, we'll go to the column percentile, which will be 6%. .

1.3. Relationship between one categorical and one numeric variables – Side-by-side boxplots Side-by-side boxplot shows relationship between two variables, where one of those is numeric and the other is categorical . Remember the box represents the middle 50% (from the 25th to the 75th). Whiskers reach out to the max and the min as long as there's no outliers.

Side-by-side boxplot. Example 1

Side-by-side boxplot. Example 2

Side-by-side boxplot. Example 3

Example. Movie data set Action Comedy Drama Horror Max 93 91 78 75 % 71 85 61 50 % 51 58 72 52 25 % 45 48 59 34 Min 32 31 46 25 n 27 21 17 Mean 49 SD 18 16 15

Exemplary exam questions: Which variable has the smallest median? Answer: Action (median is 51) Which variable has the largest median? Answer: Drama (median is 72) Which of the boxes had the largest inter-quartile range? Answer: Action movies Which variable has the highest standard deviation? Answer: Action (SD is 18)

Homework Visit instructor’s web-page on Statistics. www.alveronika.wordpress.com Optional but useful: Practice with applet (mosaic plot): http://mih5.github.io/statapps/