Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability and Statistics

Similar presentations


Presentation on theme: "Probability and Statistics"โ€” Presentation transcript:

1 Probability and Statistics
Univariate Analysis @ Prof. Liping Fu, University of Waterloo

2 The big picture 1) Data Collection Data
Population Data 2) Explanatory Data Analysis (EDA) Sample 4) Inference 3) Probability The Big Picture

3 On Relationship between Two Variables
Exploratory data analysis (EDA) Categorical On a Single Variable EDA Quantitative On Relationship between Two Variables

4 Categorical Data Numerical Measures
Relative Frequency Table (by category)

5 Graphical Summary โ€“ Visualization
Bar Chart and Pie Chart

6 Where is the center of the data?
Quantitative Data Numerical Measures Where is the center of the data? Measures of Center Numerical Measures How varied is the data? Measures of Variation

7 ๐‘ฅ = ๐‘–=1 ๐‘› ๐‘ฅ ๐‘– ๐‘› Measures of Center Sample mean, median, mode
Sample Mean = arithmetic average = โ€˜averageโ€™ ๐‘ฅ = ๐‘–=1 ๐‘› ๐‘ฅ ๐‘– ๐‘› Sample Median = โ€˜middle numberโ€™ Sample Mode = โ€˜most frequentโ€™ All of them measure the CENTER of the data In most cases: mean โ‰ˆ median โ‰ˆ mode

8 Mean is NOT a good measure in this case...

9 Measure of Variation Range: (max - min) Quartiles:
Q1: First quartile (one quarter of the data less than this value) Q2: Second quartile (median, half point) Q3: Third quartile (three quarters of the data less than this value) Inter-quartile range (IQR) = Q3 - Q1 Sample Variance/Standard Deviation Frequency distribution (relative, cumulative)

10 Variance (s2) and Standard Deviation (s)

11 Distribution Frequency

12 Graphical Summary - Visualization
Dot Plot Histogram Distribution Charts Bar Chart Polygon Visualization 5-number Plot Box Plot

13 Visualization Dot Plot Clusters, groups, and outliers ?

14 Box Plot/Boxโ€“and-Whisker Plot (5-number plot)
HoursOnInternet By Male Students Median =4.0 Q1= 2.5 Q3 = 6.4 Q1-1.5 IQR Q IQR IQR = Q3 โ€“ Q1

15 Bar Chart (Discrete Data)
Relative Frequency Table

16 Histogram (Continuous Data)
Relative Frequency Table

17 Visualize Degree of Variation

18 Visualize Patterns of Distribution

19

20 Cumulative Distribution Polygon
Cumulative Frequency Table

21 Summary: EDA on A Single Variable
Numerical Measures Graphical Tools Categorical Relative Frequency Bar Chart Pie Chart Quantitative Mean, median, mode Variance/Stdev Quartiles Frequency Histogram Polygon Box Plot

22 Descriptive Statistics - A Few Basic Concepts
Example 1.1(a) Suppose we have a batch of 1000 I-beams for building construction, and we want to find out the tensile strengths of these beams. In order to do so, we take at random a set of 10 beams from the batch and test their tensile strengths. The test results are 126, 128, 135, 146, 137, 142, 125, 131, 139, 141 What is the relationship between the tensile strength of the 10 I-beams and that of the 1000 I-beams? What can we say (infer) about the tensile strength of the 1000 I-beams from that of the 10 beams?

23 How to Summarise Data Graphically?
Example 1.1(b) Suppose we have a batch of 1000 I-beams for building construction, and we want to find out the tensile strengths of these beams. In order to do so, we take at random a set of 10 beams from the batch and test their tensile strengths. The test results are 126, 128, 135, 146, 137, 142, 125, 131, 139, 141 What can we say about the test results? How are the data varied or distributed?

24 How to Construct a Histogram (Polygon)?
Identify the smallest and largest observed values, and choose a convenient range which includes the smallest and largest values. Divide the range into convenient intervals (also called classes or bins) (What is the optimal number of intervals?) Count the number of observations (or frequency of occurrences) that follow within each interval. For relative frequency histogram, calculate the relative frequency for each interval. Draw vertical bars with heights representing the frequency (frequency histogram) or the relative frequency (relative frequency histogram) Alternatively, draw a dot at the midpoint of each interval with height matching the frequency. The dots of all intervals are then connected by lines - frequency polygon

25 How to Construct a Cumulative Relative Frequency Polygon?
Following Step 1-3 to determine the relative frequency for each interval Calculate the cumulative frequency for each interval Draw a dot at the midpoint of each interval with height matching the cumulative frequency. The dots of all intervals are then connected by lines - cumulative relative frequency polygon

26 Use Cumulative Relative Frequency Polygon?
Example 1.1 (c) Suppose we have a batch of 1000 I-beams for building construction, and we want to find out the tensile strengths of these beams. In order to do so, we take at random a set of 10 beams from the batch and test their tensile strengths. The test results are 126, 128, 135, 146, 137, 142, 125, 131, 139, 141 What percent of (sampled) beams have a tensile strength less than 130? What is the tensile strength that is greater than or equal to the tensile strength of 95% of the sampled beams? (What is the 95th percentile of the tensile strength?)

27 How to Summarise Data Numerically?
Example 1.1 (d) Suppose we have a batch of 1000 I-beams for building construction, and we want to find out the tensile strengths of these beams. In order to do so, we take at random a set of 10 beams from the batch and test their tensile strengths. The test results are 126, 128, 135, 146, 137, 142, 125, 131, 139, 141 Suppose we have another batch of 1000 I-beams and we take a set of 10 beams from it for test. The test results are 126, 138, 125, 132, 127, 122, 121, 131, 129, 131 Which batch has a higher tensile strength in average? Which batch is more uniform or less varied? If the design standard stipulate that 95% of beams must have a minimum tensile strength of 122, which batch meets the standard? cumulative relative frequency polygon Percentile function

28 Problem with the Mean? Example 1.2: A small company employs four young engineers, who each earn $24,000, and the owner (also an engineer), who gets $114,000. Comment on the claim that on the average the company pays $42,000 to its engineers and, hence, is a good place to work.

29 Think About It: (for next lecture)
For Example 1.1, suppose we pick at random another I-beam from the batch. What is the probability that the tensile strength of that beam is between and 140? For Example 1.1, what should be the minimum number of observations (size of sample) in order to make our inferences credible? Suppose we throw a coin, what is the chance of getting head? Do we need observations in order to answer this question? How long should a left-turn bay be in order to accommodate left-turning traffic at over 95% of the signal cycles during peak period?


Download ppt "Probability and Statistics"

Similar presentations


Ads by Google