Lesson 8 Introduction to Statistics

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Statistics It is the science of planning studies and experiments, obtaining sample data, and then organizing, summarizing, analyzing, interpreting data,
Slide 1 Spring, 2005 by Dr. Lianfen Qian Lecture 2 Describing and Visualizing Data 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
Descriptive statistics (Part I)
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
CHAPTER 1 Basic Statistics Statistics in Engineering
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 1 Overview and Descriptive Statistics.
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
Dr. Asawer A. Alwasiti.  Chapter one: Introduction  Chapter two: Frequency Distribution  Chapter Three: Measures of Central Tendency  Chapter Four:
Chapter 2 Describing Data.
Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.
Basic Statistics  Statistics in Engineering  Collecting Engineering Data  Data Summary and Presentation  Probability Distributions - Discrete Probability.
6-1 Numerical Summaries Definition: Sample Mean.
STATISTICS. Statistics * Statistics is the area of science that deals with collection, organization, analysis, and interpretation of data. * A collection.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
Subbulakshmi Murugappan H/P:
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Engineering Statistics KANCHALA SUDTACHAT. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems.
CHAPTER 1 Basic Statistics Statistics in Engineering
FARAH ADIBAH ADNAN ENGINEERING MATHEMATICS INSTITUTE (IMK) C HAPTER 1 B ASIC S TATISTICS.
CHAPTER 1 Basic Statistics Statistics in Engineering
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Descriptive Statistics
COMPLETE BUSINESS STATISTICS
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Exploratory Data Analysis
Basic Statistics Statistics in Engineering (collect, organize, analyze, interpret) Collecting Engineering Data Data Presentation and Summary Types of.
Measurements Statistics
Statistics in Management
MATH-138 Elementary Statistics
ISE 261 PROBABILISTIC SYSTEMS
BUSINESS MATHEMATICS & STATISTICS.
Chapter 2: Methods for Describing Data Sets
Chapter 6 – Descriptive Statistics
Lecture 1 Sections 1.1 – 1.2 Objectives:
CHAPTER 5 Basic Statistics
Chapter 5 STATISTICS (PART 1).
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Laugh, and the world laughs with you. Weep and you weep alone
PROBABILITY AND STATISTICS
Numerical Descriptive Measures
CHAPTER 1: Picturing Distributions with Graphs
Frequency Distributions and Graphs
An Introduction to Statistics
Basic Statistical Terms
THE STAGES FOR STATISTICAL THINKING ARE:
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Sexual Activity and the Lifespan of Male Fruitflies
2-1 Data Summary and Display 2-1 Data Summary and Display.
Statistics: The Interpretation of Data
Basic Practice of Statistics - 3rd Edition
THE STAGES FOR STATISTICAL THINKING ARE:
Basic Practice of Statistics - 3rd Edition
Honors Statistics Review Chapters 4 - 5
Constructing and Interpreting Visual Displays of Data
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
DESIGN OF EXPERIMENT (DOE)
Ticket in the Door GA Milestone Practice Test
Descriptive Statistics Civil and Environmental Engineering Dept.
Presentation transcript:

Lesson 8 Introduction to Statistics

What statistics is Statistics is the branch of mathematics that examines ways to process and analyze data. Statistics, branch of mathematics that deals with the collection, organization, and analysis of numerical data and with such problems as experiment design and decision making. A Statistic is any quantity whose value can be calculated from sample data.

Activities of statistics Make interested from data Application with unrealism Sampling Relation analysis Forecasting Decision under unrealism

Populations, Sample, Statistic A population consists of all of the members of a group about which you want to draw a conclusion. A sample is the portion of the population selected for analysis. A parameter is a numerical measure that describes a characteristic of a population A statistic is a numerical measure that describes a characteristic of a sample

Example Population: all the students at a university, all the registered voters in Svay Rieng… Sample: selected from above population. 10 students selected, 500 registered voters who participated in a survey. The average grade of all the students this semester is a parameter. The average grade of 10 students selected is a statistic. Information from only 10 students is used in calculating statistic.

2 types of statistics Descriptive statistics focuses on collecting, summarizing, and presenting a set of data. These activities are also known as primary analyses. Inferential statistics uses sample data to draw conclusion about a population. These activities are also known as secondary analyses.

Descriptive Statistics Example: The final score of students are 84 49 61 40 83 67 45 66 70 69 80 58 68 60 67 72 73 70 57 63 70 78 52 67 53 67 75 61 70 81 76 79 75 76 58 95 Without any organization, it is difficult to get a sense of what a typical or representative score might be, whether the values are highly concentrated about a typical value or quite spread out, whether there are any gaps in the data, what percentage of the values…

Data Representation

Data represenatation score Stem-and-Leaf Plot Frequency Stem & Leaf 1.00 4 . 0 2.00 4 . 59 2.00 5 . 23 3.00 5 . 788 4.00 6 . 0113 7.00 6 . 6777789 6.00 7 . 000023 6.00 7 . 556689 4.00 8 . 0134 .00 8 . .00 9 . 1.00 9 . 5 Stem width: 10 Each leaf: 1 case(s)

Inferential statistics Having obtained a sample from a population, an investigator would frequently like to use sample information to draw some type of conclusion (make an inference of some sort) about the population. That is, the sample is a means to an end rather than an end in itself.

Probability - Inferential statistics Relationship Population Probability Inferential statistics Sample

Variable Variables are characteristics of items or individuals. E.g. Variables are your gender, your major field of study, the amount of money you have in your wallet… So the key aspect of variable is the idea that items differ and people differ.

Variable 2 types of variable: Discrete: if its set of possible values either is finite or else can be listed in an infinite sequence (one in which there is a first number, a second number and so on) Continuous: if its possible values consist of an entire interval on the number line.

Variable Variable is also divided into 2 types: Quantitative variable: Variable that can be presented in number like income of people, weight of boxers, etc. Qualitative variable: variable that cannot be presented in number like gender, living standard, etc.

Data Primary data: Original data collected from source – experiments, survey, etc. Secondary data: Data extracted from other reports or documents in which the data has already been collected.

Methods of Organizing data Descriptive statistics can be divided into two general subject areas: visual techniques and numerical summary measures for data sets. Visual techniques: Frequency table, histograms, pie charts, bar graphs, scatter diagrams, etc. Numerical summary measures: Mean, variance, standard deviation, etc.

Frequency distribution Frequency : The number of times something ( xi ) occurs noted by fi . Total Frequency: Sum of all frequencies noted by N or n. Total Frequency=N=n=fi Relative Frequency: the ratio of the absolute frequency to the total frequency. Relative Frequency of a category=

Frequency distribution Cumulative Frequency : the running total of the frequencies Cumulative Frequency= : m  n Relative Cumulative Frequency: is the cumulative frequency divided by total frequency. Relative cumulative frequency = Cumulative frequency /total frequency

Example The final score of students: 84 49 61 40 83 67 45 66 70 84 49 61 40 83 67 45 66 70 69 80 58 68 60 67 72 73 70 57 63 70 78 52 67 53 67 75 61 70 81 76 79 75 76 58 95 Without any arrangement it is difficult to understand. Create a table of the total frequency, relative frequency, ….

Stem-and-Leaf displays Steps for constructing a Stem-and-Leaf: Select one or more leading digits for the stem values. The trailing digits become the leaves. List possible stem values in a vertical column. Record the leaf for every observation beside the corresponding stem value. Indicate the units for stems and leaves some place in the display.

Stem-and-Leaf example Suppose salary of staffs are: 120 215 170 135 216 216 181 222 150 210 225 209 175 167 130 190 155 145 177 162 197 182 215 187 172 169 205 165 144 199

Stem… Stem Leaf Frequency Accumulated frequency 12 13 14 15 16 17 18 19 20 21 22 0,5 4,5 2,5,7,9 0,2,5,7 1,2,7 0,7,9 5,9 0,5,5,6,6 2,5 1 2 4 3 5 7 11 23 28 30 Total Stem width: 10 Leaf: one case

Stem… By using SPSS, the stem-and-leaf shows: salary Stem-and-Leaf Plot Frequency Stem & Leaf .00 1 . 3.00 1 . 233 4.00 1 . 4455 8.00 1 . 66667777 6.00 1 . 888999 7.00 2 . 0011111 2.00 2 . 22 Stem width: 100 Each leaf: 1 case(s)

Dot/Lines Counts

Class Class refers to a group of objects with some common property. Class boundary: is give by the midpoint of the upper limit of one class and the lower limit of the next class. Class width = Upper boundary - Lower boundary

Classes CLASS MIDPOINT or MARK=(Lower limit + Upper limit )/2 Number of classes: generally is given by k= Number of Classes n= Number of Observations Create a frequency distribution of student score with Class of Tens (110,1120…)

Histogram Consider data consisting of observations on a discrete variable x. the frequency of any particular x value is the number of times that the value occurs in the data set. The relative frequency of a value is the fraction or proportion of times the value occurs.

Histogram

Histogram for qualitative data Frequency distribution and histogram can be constructed when the data set is qualitative categorical in nature. Some classes have natural ordering – eg. BAC2, Bachelor, Master, Doctor – and the other case the order will be arbitrary – eg. Cambodian, England, American, French, Japanese…

Other graphs Bar chart Pie chart

Other graphs Frequency Polygon or line Stock chart ( Low-High-Close

Qualitative data A survey of student rating show: Construct frequency distribution and histogram Rating Frequency A 478 B 893 C 680 D 178 F 100 Don’t know 172

Number of Flight per Year Contingency table For two variables, we use contingency table: Age Number of Flight per Year 1-2 3-5 Over 5 Total Less than 25 25-40 40-65 65 and Over 1 (0.02) 2 (0.04) 5 (0.10) 8 (0.16) 6 (0.12) 17 (0.35) 10 (0.20) 15 (0.30) 28 (0.56) 4 (0.08) 20 (0.40) 22 (0.44) 50 (1.00)

Mean The sample mean of observations x1, x2,…….., xn is given by:

Median The sample median is obtained by first ordering the n observations from smallest to largest (with any repeated values included so that every sample observation appears in the ordered list). Then, equals Ordered value = The single middle value if n is odd Average of = The average of the two middle values if n is even

Other measures of Location Quartiles divide the data into four parts, first quartile, second quartile= median, third quartile. Trimmed mean is a compromise between and . A 10% trimmed mean would be computed by eliminating the smallest 10% and the largest 10% of the sample and then averaging what is left over.

Measures of Variability Mean and median give only partial information about data set or distribution Different samples or populations may have identical measure of center yet differ from one another in other important ways. The simplest measure of variability in a sample is the range – the smallest and the largest.

Sample variance Sample variance, denoted by s2, is given by Sample standard deviation, denoted by s, is the (positive) square root of the variance: Be noted that , 2 and  are used for population and the divisor in 2 calculation is n not n-1

Boxplots Boxplot has been used successfully to describe several of a data set’s most prominent features: center spread the extent and nature of any departure from symmetry and identification of outliers (observations that lie usually far from the main body of the data).

Boxplots Example Example 1.17 give the data of pit depth in the crude oil plate as follows: 40 52 55 60 70 75 85 85 90 90 92 94 94 95 98 100 115 125 125 The five-number summary is as follows: Smallest=40 Lower fourth=72.5 = 90 Upper fourth=96.5 Largest =125

Boxplot Example