STATISTICS Statistics ??? Meaning : Numerical facts

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

The mean for quantitative data is obtained by dividing the sum of all values by the number of values in the data set.
Descriptive Statistics
Chapter 1 & 3.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Objective To understand measures of central tendency and use them to analyze data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)
Rules of Data Dispersion By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval.
CHAPTER 1 Basic Statistics Statistics in Engineering
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
© Copyright McGraw-Hill CHAPTER 3 Data Description.
CHAPTER 1 Basic Statistics Statistics in Engineering
Maz Jamilah Masnan Institute of Engineering Mathematics Semester I 2015/16 EQT271 ENGINEERING STATISTICS.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
Basic Statistics  Statistics in Engineering  Collecting Engineering Data  Data Summary and Presentation  Probability Distributions - Discrete Probability.
Basic Statistics  Statistics in Engineering  Collecting Engineering Data  Data Summary and Presentation  Probability Distributions - Discrete Probability.
STATISTICS. Statistics * Statistics is the area of science that deals with collection, organization, analysis, and interpretation of data. * A collection.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Subbulakshmi Murugappan H/P:
CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)
CHAPTER 1 Basic Statistics Statistics in Engineering
FARAH ADIBAH ADNAN ENGINEERING MATHEMATICS INSTITUTE (IMK) C HAPTER 1 B ASIC S TATISTICS.
Khatijahhusna Abd Rani School Of Electrical System Engineering (PPKSE) Semester II 2014/2015 Slide was prepared by Miss Syafawati (with modification)
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
CHAPTER 1 Basic Statistics Statistics in Engineering
Basic Statistics  Statistics in Engineering (collect, organize, analyze, interpret)  Collecting Engineering Data  Data Presentation and Summary  Types.
CHAPTER 1 EQT 271 (part 1) BASIC STATISTICS. Basic Statistics 1.1Statistics in Engineering 1.2Collecting Engineering Data 1.3Data Presentation and Summary.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Descriptive Statistics
COMPLETE BUSINESS STATISTICS
Prof. Eric A. Suess Chapter 3
ORGANIZING AND GRAPHING DATA
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Exploratory Data Analysis
Methods for Describing Sets of Data
Basic Statistics Statistics in Engineering (collect, organize, analyze, interpret) Collecting Engineering Data Data Presentation and Summary Types of.
Measurements Statistics
CHAPTER 2 : DESCRIPTIVE STATISTICS: TABULAR & GRAPHICAL PRESENTATION
ISE 261 PROBABILISTIC SYSTEMS
Chapter 3 Describing Data Using Numerical Measures
Chapter 2: Methods for Describing Data Sets
ORGANIZING AND GRAPHING DATA
Lesson 8 Introduction to Statistics
4. Interpreting sets of data
Chapter 1 & 3.
CHAPTER 5 Basic Statistics
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Distributions and Graphical Representations
PROBABILITY AND STATISTICS
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics
An Introduction to Statistics
Numerical Descriptive Statistics
Honors Statistics Review Chapters 4 - 5
Frequency Distribution and Graphs
NUMERICAL DESCRIPTIVE MEASURES
Presentation transcript:

STATISTICS Statistics ??? Meaning : Numerical facts Field or discipline of study Collection of methods for planning experiments, obtaining data and organizing, analyzing, interpreting and drawing the conclusions or making a decision. Because many aspects of engineering practice involve working with data, obviously some knowledge of statistics is important to an engineer.

Chapter 1: Statistics in Engineering (collect, organize, analyze, interpret) Collecting Engineering Data Data Presentation and Summary Types of Data Graphical Data Presentation Numerical Data Presentation

Collecting Data Direct observation The simplest method of obtaining data but difficult to produce useful information since it does not consider all aspects regarding the issues. Experiments More expensive methods but better way to produce data Surveys depends on the response rate Personal Interview: higher expected response rate and fewer incorrect respondents

BASIC TERMS IN STATISTICS Population Entire collection of individuals which are characteristic being studied. Sample - Subset of population. Population Sample

2001 Sales (millions of dollars) Variable Company 2001 Sales (millions of dollars) Wal-Mart Stores 217,799 IBM 85,866 GENERAL MOTORS 177,260 DELL COMPUTER 31,168 JC PENNEY 32,004 An element or a member An observation or measurement

Population: the entire collection of objects or outcomes about which data are collected. Sample: subset of the population containing the observed objects or the outcomes. Parameter: Summary measure about population, . Statistics: Summary measure about sample, . Population vs Sample Parameter vs Statistics

Statistics can be divided into two. 1) Descriptive statistics: describe basic features of data by providing simple summaries about the sample and measures in a form of suitable graphical or numerical analysis. Graphical representatives: stem-and-leaf plot line chart histogram boxplot. Numerical analyses: measure of central tendency measure of dispersion measure of position. 2) Inferential statistics: draw a conclusion about sample data that would represent an actual population.

Qualitative/ Categorical Data Quantitative/ Numeric Data Types of Data Qualitative/ Categorical Data Quantitative/ Numeric Data Deals with descriptions. Data can be observed but not measured.  Deals with numbers. Data which can be measured. Defect or no defect Gender Ethnic group Colors Textures Income CGPA Diameter Weight cost Qualitative vs Quantitative The most popular charts for qualitative data : bar chart/column chart pie chart line chart. histogram frequency polygon ogive box plot stem and leaf plot

Discrete vs Continuous Quantitative variables can be further classified as discrete or continuous. Discrete variables are usually obtained by counting. There are a finite or countable number of choices available with discrete data. You can't have 2.63 people in the room. Continuous variables are usually obtained by measuring. Length, weight, and time are all examples of continous variables.

Grouped Vs Ungrouped Data Ungrouped/raw data - Data that has not been organized into groups. Grouped data - Data that has been organized into groups (into a frequency distribution). Frequency distribution: A grouping of data into mutually exclusive classes showing the number of observations in each class. Ungrouped data Group data 1.0, 1.1, 1.2, 1.0, 1.1, 1.3, 1.2, 1.1, 1.0, 1.2, 1.3, 1.4, 1.2, 1.2, 1.1, 1.0, 1.0, 1.2, 1.3, 1.4, 1.0 Class boundaries Frequency 0.95 – 1.15 10 1.15 – 1.35 9 1.35 – 1.55 2

Example: About 50 UniMAP students were asked about their background and the results are as follows. Display your data in suitable form. Respondent Gender Ethnic Group Family Income CGPA 1 1000 3.00 26 2 1900 2.82 1600 3.37 27 3.02 3 8000 3.59 28 1500 3.47 4 1360 2.50 29 2000 3.60 5 800 3.19 30 3.41 6 1250 2.96 31 3.23 7 1200 3.65 32 3.25 8 3000 3.04 33 3.39 9 4500 2.80 34 1570 3.20 10 35 7000 3.01 11 2380 3.16 36 2.98 12 3.67 37 3.45 13 3.40 38 3.13 14 3.10 39 1980 3.30 15 3.31 40 2.60 16 3500 3.80 41 2670 2.89 17 42 2.90 18 1803 2.84 43 3596 3.70 19 3.35 44 3.11 20 1400 45 5000 3.34 21 46 3.82 22 4000 47 2500 3.61 23 4780 2.78 48 24 4300 49 3.85 25 50 Code used: Gender: 1 = male, 2 = female Ethnic group: 1 = Malay, 2 = Chinese, 3 = Indian, 4 = others

graphical presentation of qualitative data Frequency table Bar Chart: used to display the frequency distribution in graphical form. graphical presentation of qualitative data Observation Frequency Malay 33 Chinese 9 Indian 6 Others 2

Pie Chart: used to display the frequency distribution Pie Chart: used to display the frequency distribution. It displays the ratio of the observations Line chart: used to display the trend of observations. It is a very popular display for the data which represent time. Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 10 7 5 39 260 316 142 11 4 9

graphical presentation of quantitative data Histogram: Looks like the bar chart except that the horizontal axis represent the data which is quantitative in nature. There is no gap between the bars. graphical presentation of quantitative data

Frequency Polygon: looks like the line chart except that the horizontal axis represent the class mark of the data which is quantitative in nature. Ogive: line graph with the horizontal axis represent the upper limit of the class interval while the vertical axis represent the cummulative frequencies.

Data Summary Summary statistics are used to summarize a set of observations. Measures of Central Tendency Mean Median Mode Measures of Dispersion Range Variance Standard deviation Measures of Position Z scores Percentiles Quartiles Outliers

Mean Measures of Central Tendency Mean of a sample is the sum of the sample data divided by the total number sample. Mean for ungrouped data is given by: Mean for group data is given by:

Example 2 (Ungrouped data): Mean for the sets of data 3,5,2,6,5,9,5,2,8,6 Solution :

Median of ungrouped data: The median depends on the number of observations in the data, n . If n is odd, then the median is the (n+1)/2 th observation of the ordered observations. But if n is even, then the median is the arithmetic mean of the n/2 th observation and the (n+1)/2 th observation. Median of grouped data:

Find the median for data 4,6,3,1,2,5,7 ( n = 7) Example 4 (Ungrouped data): n is odd Find the median for data 4,6,3,1,2,5,7 ( n = 7) Rearrange the data : 1,2,3,4,5,6,7 (median = (7+1)/2=4th place) Median = 4 n is even Find the median for data 4,6,3,2,5,7 (n = 6) Rearrange the data : 2,3,4,5,6,7 Median = (4+5)/2 = 4.5

Mode Mode of ungrouped data: The value with the highest frequency in a data set. It is important to note that there can be more than one mode and if no number occurs more than once in the set, then there is no mode for that set of numbers Find the mode for the sets of data 3, 5, 2, 6, 5, 9, 5, 2, 8, 6 Mode = number occurring most frequently = 5

Measures of Dispersion Range = Largest value – smallest value Variance= measures the variability (differences) existing in a set of data. The variance for the ungrouped data: For sample For population

The variance for the grouped data: For sample or For population

Standard deviation: the positive square root of the variance is the standard deviation A large variance means that the individual scores (data) of the sample deviate a lot from the mean. A small variance indicates the scores (data) deviate little from the mean.

Example 8 (Ungrouped data) Find the variance and standard deviation of the sample data : 3, 5, 2, 6, 5, 9, 5, 2, 8, 6

Exercise 4 (submit on Thursday) The following data give the sample number of iPads sold by a mail order company on each of 30 days. (Hint : 5 number of classes) Construct a frequency distribution table. Find the mean, variance and standard deviation, mode and median. Construct a histogram. 25 11 15 29 22 10 5 17 21 13 26 16 18 12 9 26 20 16 23 14 19 23 20 16 27 9 21 14

Rules of Data Dispersion By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval about the mean. Empirical Rule Applicable for a symmetric bell shaped distribution / normal distribution. There are 3 rules: i. 68% of the data will lie within one standard deviation of the mean, ii. 95% of the data will lie within two standard deviation of the mean, iii. 99.7% of the data will lie within three standard deviation of the mean,

Example 10 The age distribution of a sample of 5000 persons is bell shaped with a mean of 40 yrs and a standard deviation of 12 yrs. Determine the approximate percentage of people who are 16 to 64 yrs old. Solution: Approximately 68% of the measurements will fall between 28 and 52, approximately 95% of the measurements will fall between 16 and 64 and approximately 99.7% to fall into the interval 4 and 76.

Measures of Position To describe the relative position of a certain data value within the entire set of data. z scores Percentiles Quartiles Outliers

Quartiles Minimum value Q1 Q2 Q3 Maximum value Divide data sets into four equal parts where each part account about 25% of data distribution. Minimum value Q1 Q2 Q3 Maximum value 25% of data

Find Q1, Q2, and Q3 for the following data 15, 13, 6, 5, 12, 50, 22, 18   Step 1: Arrange the data in order 5, 6, 12, 13, 15, 18, 22, 50 Step 2: Find the median (Q2) ↑ Q2=(13+15)/2=14 Step 3 Find the median of the data values less than 14. 5, 6, 12, 13 Q1 = (6+12)/2=9 Step 4 Find the median of the data values greater than 14 15, 18, 22, 50 Q3=(18+22)/2=20 Hence, Q1 =9, Q2 =14, and Q3 =20

Example: 5, 8, 4, 4, 6, 3, 8 (n=7) 1. Arrange the data in order form: 3, 4, 4, 5, 6, 8, 8 2. Q1: Find the median of the data values less than 5. 3, 4, 4 Q1: Find the median of the data values greater than 5. 6,8,8 Therefore,

Exercise: The following data represent the number of inches of rain in Chicago during the month of April for 10 randomly years. 2.47 3.97 3.94 4.11 5.22 1.14 4.02 3.41 1.85 0.97 Determine the quartiles.

Exercise: The following data represent the number of inches of rain in Chicago during the month of April for 10 randomly years. 2.47 3.97 3.94 4.11 5.22 1.14 4.02 3.41 1.85 0.97 Determine the quartiles. Answer:

Outliers Extreme observations Can occur because of the error in measurement of a variable, during data entry or errors in sampling.

Checking for outliers by using Quartiles Step 1: Determine the first and third quartiles of data. Step 2: Compute the interquartile range (IQR), . Step 3: Determine the fences. Fences serve as cut off points for determining outliers. needed for identifying extreme values in the tails of the distribution: Step 4: If data value is less than the lower fence or greater than the upper fence, considered outlier. A point beyond an outer fence is considered extreme outlier.

Example 2.47 3.97 3.94 4.11 5.22 1.14 4.02 3.41 1.85 0.97 Determine whether there are outliers in data set.

Follow the steps to find quartiles Arrange data in ascending form: 0.97, 1.14, 1.85, 2.47, 3.41, 3.94, 3.97, 4.02, 4.11, 5.22 0.97, 1.14, 1.85, 2.47, 3.41 3.94, 3.97, 4.02, 4.11, 5.22 Follow the steps to find quartiles Since all the data are not less than -3.0675 and not greater than 8.2725, then there are no outliers in the data

Boxplot (Graphical presentation for quantitative data) The five-number summary can be used to create a simple graph called a boxplot. Minimum Q1 Median Q3 Maximum Form the boxplot, you can quickly detect any skewness in the shape of the distribution and see whether there are any outliers in the data set. Outlier Outlier Lower fence Upper fence

The Five Number Summary Compute the five-number summary and construct the box plot of the data 2.47 3.97 3.94 4.11 5.22 1.14 4.02 3.41 1.85 0.97

- The distribution is skewed to the left

Interpreting Boxplot - symmetric - Left skewed or negatively skewed: the tail is skewed to the left - Right skewed or positively skewed: the tail is skewed to the right

Mean/Median Versus Skewness Characteristics Of Skewed Distributions                                                                    Mean/Median Versus Skewness Mean < Median < Mode Mean > Median > Mode Mean = Median = Mode

STEM-AND-LEAF Another technique that is used to present quantitative data is the stem-and-leaf plot. An advantage of a stem-and-leaf over a frequency distribution is that by preparing stem-and-leaf, we do not lose information on individual observations. A stem-and-leaf only for quantitative data. In a stem-and-leaf display of quantitative data, each value is divided into two portions; a stem and leaf. The leaves for each stem are shown separately in a display.

Stem-and-leaf plot display a set of data usually large data set. Stem and leaf plots emphasize place value. Stem is for the largest place value(s) of a number and leaf is the smallest place value of a number in your data set. Step 1: Find the least and the greatest number in the set of data Step 2: Make two columns with titles STEM and LEAF. Step 3: Write the digits that form the stem in the STEM column Step 4: Write the digits that form the leaf for each number in the LEAF column across from the STEM of the number.

Example:: The following are the scores of 30 college students on a statistics test. 75 52 80 96 65 79 71 87 93 95 69 72 81 61 76 86 79 68 50 92 83 84 77 64 71 87 72 92 57 98 For the score of the first student, which is 75, 7 is the stem and 5 is the leaf. For the score of the second student, which 52, 5 is the stem and 2 is the leaf. Observed from data, the stems for all scores are 5,6,7,8 and 9 because all scores lie in the range 50 to 98. After we have listed the stems, we read the leaves for all scores and record them next to the corresponding stems at the right side of the vertical line.

The distribution of data seems skewed to the left tail Now we read all the scores and write the leaves on the right side of the vertical line in the rows of corresponding stems. By looking at the stem-and-leaf display of test scores, we can observed how the data values are distributed. For example, the stem 7 has the highest frequency, followed by stems 8,9,6 and 5. The leaf for each stem of the stem-and-leaf display of test scores are rank in increasing order and presented as below : Stem Leaf 5 0 2 7 6 1 4 5 8 9 7 1 1 2 2 5 6 7 9 9 8 0 1 3 4 6 7 7 9 2 2 3 5 6 8 * Analyze – There are 9 out of 30 college students score between 71 and 79. The distribution of data seems skewed to the left tail Ranked stem-and-leaf display of test scores.

What you MUST know? Define statistics and its application in engineering. Explain the concept of population and sample. Compute and interpret the measures of central tendency (MCT), measures of dispersion (MD) and measures of position (MP). Construct and interpret several graphical presentation (histogram, box plot, stem and leaf plot). Explain how graphical presentation are used to compare two or more sets of data. Compare MCT, MD and MP for two or more sets of data.