Probability and Statistics

Slides:



Advertisements
Similar presentations
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Advertisements

Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
IB Math Studies – Topic 6 Statistics.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 1 Exploring Data
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
What is Statistics? Statistics is the science of collecting, analyzing, and drawing conclusions from data –Descriptive Statistics Organizing and summarizing.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
14.1 Data Sets: Data Sets: Data set: collection of data values.Data set: collection of data values. Frequency: The number of times a data entry occurs.Frequency:
Describing and Displaying Quantitative data. Summarizing continuous data Displaying continuous data Within-subject variability Presentation.
Categorical vs. Quantitative…
Unit 4 Statistical Analysis Data Representations.
To be given to you next time: Short Project, What do students drive? AP Problems.
Engineering Statistics KANCHALA SUDTACHAT. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems.
What’s with all those numbers?.  What are Statistics?
CCGPS Advanced Algebra UNIT QUESTION: How do we use data to draw conclusions about populations? Standard: MCC9-12.S.ID.1-3, 5-9, SP.5 Today’s Question:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
CCGPS Advanced Algebra Day 1 UNIT QUESTION: How do we use data to draw conclusions about populations? Standard: MCC9-12.S.ID.1-3, 5-9, SP.5 Today’s Question:
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
Chapter 3 EXPLORATION DATA ANALYSIS 3.1 GRAPHICAL DISPLAY OF DATA 3.2 MEASURES OF CENTRAL TENDENCY 3.3 MEASURES OF DISPERSION.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Homework solution#1 Q1: Suppose you have a sample from Palestine University and the distribution of the sample as: MedicineDentistEngineeringArtsCommerce.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
Descriptive Statistics
COMPLETE BUSINESS STATISTICS
UNIT ONE REVIEW Exploring Data.
Prof. Eric A. Suess Chapter 3
Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6
Exploratory Data Analysis
Methods for Describing Sets of Data
STATISTICS AND PROBABILITY IN CIVIL ENGINEERING
MATH-138 Elementary Statistics
ISE 261 PROBABILISTIC SYSTEMS
Chapter 3 Describing Data Using Numerical Measures
Engineering Probability and Statistics - SE-205 -Chap 6
Chapter 6 – Descriptive Statistics
Statistics Unit Test Review
Unit 4 Statistical Analysis Data Representations
Descriptive Statistics
Statistical Reasoning
Description of Data (Summary and Variability measures)
Course Contents 1. Introduction to Statistics and Data Analysis
Laugh, and the world laughs with you. Weep and you weep alone
Topic 5: Exploring Quantitative data
2-1 Data Summary and Display 2-1 Data Summary and Display.
Statistics: The Interpretation of Data
Welcome!.
Chapter 1: Exploring Data
Honors Statistics Review Chapters 4 - 5
Descriptive Statistics
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Ten things about Descriptive Statistics
Advanced Algebra Unit 1 Vocabulary
Descriptive Statistics Civil and Environmental Engineering Dept.
Presentation transcript:

Probability and Statistics Univariate Analysis @ Prof. Liping Fu, University of Waterloo

The big picture 1) Data Collection Data Population Data 2) Explanatory Data Analysis (EDA) Sample 4) Inference 3) Probability The Big Picture

On Relationship between Two Variables Exploratory data analysis (EDA) Categorical On a Single Variable EDA Quantitative On Relationship between Two Variables

Categorical Data Numerical Measures Relative Frequency Table (by category)

Graphical Summary – Visualization Bar Chart and Pie Chart

Where is the center of the data? Quantitative Data Numerical Measures Where is the center of the data? Measures of Center Numerical Measures How varied is the data? Measures of Variation

𝑥 = 𝑖=1 𝑛 𝑥 𝑖 𝑛 Measures of Center Sample mean, median, mode Sample Mean = arithmetic average = ‘average’ 𝑥 = 𝑖=1 𝑛 𝑥 𝑖 𝑛 Sample Median = ‘middle number’ Sample Mode = ‘most frequent’ All of them measure the CENTER of the data In most cases: mean ≈ median ≈ mode

Mean is NOT a good measure in this case...

Measure of Variation Range: (max - min) Quartiles: Q1: First quartile (one quarter of the data less than this value) Q2: Second quartile (median, half point) Q3: Third quartile (three quarters of the data less than this value) Inter-quartile range (IQR) = Q3 - Q1 Sample Variance/Standard Deviation Frequency distribution (relative, cumulative)

Variance (s2) and Standard Deviation (s)

Distribution Frequency

Graphical Summary - Visualization Dot Plot Histogram Distribution Charts Bar Chart Polygon Visualization 5-number Plot Box Plot

Visualization Dot Plot Clusters, groups, and outliers ?

Box Plot/Box–and-Whisker Plot (5-number plot) HoursOnInternet By Male Students Median =4.0 Q1= 2.5 Q3 = 6.4 Q1-1.5 IQR Q3 +1.5 IQR IQR = Q3 – Q1

Bar Chart (Discrete Data) Relative Frequency Table

Histogram (Continuous Data) Relative Frequency Table

Visualize Degree of Variation

Visualize Patterns of Distribution

Cumulative Distribution Polygon Cumulative Frequency Table

Summary: EDA on A Single Variable Numerical Measures Graphical Tools Categorical Relative Frequency Bar Chart Pie Chart Quantitative Mean, median, mode Variance/Stdev Quartiles Frequency Histogram Polygon Box Plot

Descriptive Statistics - A Few Basic Concepts Example 1.1(a) Suppose we have a batch of 1000 I-beams for building construction, and we want to find out the tensile strengths of these beams. In order to do so, we take at random a set of 10 beams from the batch and test their tensile strengths. The test results are 126, 128, 135, 146, 137, 142, 125, 131, 139, 141 What is the relationship between the tensile strength of the 10 I-beams and that of the 1000 I-beams? What can we say (infer) about the tensile strength of the 1000 I-beams from that of the 10 beams?

How to Summarise Data Graphically? Example 1.1(b) Suppose we have a batch of 1000 I-beams for building construction, and we want to find out the tensile strengths of these beams. In order to do so, we take at random a set of 10 beams from the batch and test their tensile strengths. The test results are 126, 128, 135, 146, 137, 142, 125, 131, 139, 141 What can we say about the test results? How are the data varied or distributed?

How to Construct a Histogram (Polygon)? Identify the smallest and largest observed values, and choose a convenient range which includes the smallest and largest values. Divide the range into convenient intervals (also called classes or bins) (What is the optimal number of intervals?) Count the number of observations (or frequency of occurrences) that follow within each interval. For relative frequency histogram, calculate the relative frequency for each interval. Draw vertical bars with heights representing the frequency (frequency histogram) or the relative frequency (relative frequency histogram) Alternatively, draw a dot at the midpoint of each interval with height matching the frequency. The dots of all intervals are then connected by lines - frequency polygon

How to Construct a Cumulative Relative Frequency Polygon? Following Step 1-3 to determine the relative frequency for each interval Calculate the cumulative frequency for each interval Draw a dot at the midpoint of each interval with height matching the cumulative frequency. The dots of all intervals are then connected by lines - cumulative relative frequency polygon

Use Cumulative Relative Frequency Polygon? Example 1.1 (c) Suppose we have a batch of 1000 I-beams for building construction, and we want to find out the tensile strengths of these beams. In order to do so, we take at random a set of 10 beams from the batch and test their tensile strengths. The test results are 126, 128, 135, 146, 137, 142, 125, 131, 139, 141 What percent of (sampled) beams have a tensile strength less than 130? What is the tensile strength that is greater than or equal to the tensile strength of 95% of the sampled beams? (What is the 95th percentile of the tensile strength?)

How to Summarise Data Numerically? Example 1.1 (d) Suppose we have a batch of 1000 I-beams for building construction, and we want to find out the tensile strengths of these beams. In order to do so, we take at random a set of 10 beams from the batch and test their tensile strengths. The test results are 126, 128, 135, 146, 137, 142, 125, 131, 139, 141 Suppose we have another batch of 1000 I-beams and we take a set of 10 beams from it for test. The test results are 126, 138, 125, 132, 127, 122, 121, 131, 129, 131 Which batch has a higher tensile strength in average? Which batch is more uniform or less varied? If the design standard stipulate that 95% of beams must have a minimum tensile strength of 122, which batch meets the standard? cumulative relative frequency polygon Percentile function

Problem with the Mean? Example 1.2: A small company employs four young engineers, who each earn $24,000, and the owner (also an engineer), who gets $114,000. Comment on the claim that on the average the company pays $42,000 to its engineers and, hence, is a good place to work.

Think About It: (for next lecture) For Example 1.1, suppose we pick at random another I-beam from the batch. What is the probability that the tensile strength of that beam is between 130 and 140? For Example 1.1, what should be the minimum number of observations (size of sample) in order to make our inferences credible? Suppose we throw a coin, what is the chance of getting head? Do we need observations in order to answer this question? How long should a left-turn bay be in order to accommodate left-turning traffic at over 95% of the signal cycles during peak period?