STATISTICS Introduction

Slides:



Advertisements
Similar presentations
1 A B C
Advertisements

Variations of the Turing Machine
AP STUDY SESSION 2.
1
Lecture Slides Elementary Statistics Eleventh Edition
STATISTICS Joint and Conditional Distributions
STATISTICS Introduction
STATISTICS Linear Statistical Models
STATISTICS Sampling and Sampling Distributions
STATISTICS HYPOTHESES TEST (III) Nonparametric Goodness-of-fit (GOF) tests Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
STATISTICS Random Variables and Probability Distributions
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Detection of Hydrological Changes – Nonparametric Approaches
STATISTICS Univariate Distributions
STATISTICS Random Variables and Distribution Functions
David Burdett May 11, 2004 Package Binding for WS CDL.
CALENDAR.
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
5.1 Probability of Simple Events
Simple Linear Regression 1. review of least squares procedure 2
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
1 Combination Symbols A supplement to Greenleafs QR Text Compiled by Samuel Marateck ©2009.
5-1 Chapter 5 Probability 1.
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
Turing Machines.
PP Test Review Sections 6-1 to 6-6
MM4A6c: Apply the law of sines and the law of cosines.
MAT 103 Probability In this chapter, we will study the topic of probability which is used in many different areas including insurance, science, marketing,
Business and Economics 6th Edition
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Artificial Intelligence
When you see… Find the zeros You think….
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
12 October, 2014 St Joseph's College ADVANCED HIGHER REVISION 1 ADVANCED HIGHER MATHS REVISION AND FORMULAE UNIT 2.
: 3 00.
5 minutes.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
12 System of Linear Equations Case Study
Converting a Fraction to %
Chapter 8 Estimation Understandable Statistics Ninth Edition
CSE20 Lecture 15 Karnaugh Maps Professor CK Cheng CSE Dept. UC San Diego 1.
Clock will move after 1 minute
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
16. Mean Square Estimation
Copyright Tim Morris/St Stephen's School
9. Two Functions of Two Random Variables
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Commonly Used Distributions
Chapter 4 FUGACITY.
Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling Introduction STATISTICS Introduction.
STATISTICS Exploratory Data Analysis and Probability
STATISTICS Exploratory Data Analysis and Probability
Presentation transcript:

STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Lecture notes will be posted on class website www.rslabntu.net Supplementary material: IRSUR by Kerns Grades Homeworks (40%) [No homework copying.] Midterm (30%), Final (30%) The R language will be used for data analysis. A tutorial session is arranged on Tuesday (6:00 – 7:30 pm). Office hour: Thursday 3:30 – 4:30 pm Students who are 15 minutes or more late for the class period will not be allowed to enter the classroom. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

What is “statistics”? Statistics is a science of “reasoning” from data. A body of principles and methods for extracting useful information from data, for assessing the reliability of that information, for measuring and managing risk, and for making decisions in the face of uncertainty. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

The major difference between statistics and mathematics is that statistics always needs “observed” data, while mathematics does not. An important feature of statistical methods is the “uncertainty” involved in analysis. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Statistics is the discipline concerned with the study of variability, with the study of uncertainty and with the study of decision-making in the face of uncertainty. As these are issues that are crucial throughout the sciences and engineering, statistics is an inherently interdisciplinary science. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

One of the objectives of this course is to facilitate students with a critical way of thinking. Weather forecasting Flood forecasting Projection of rainfall extremes under certain climate change scenarios 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Sources of uncertainties Data (sampling) uncertainty Parameter uncertainty Model structure uncertainty An exemplar illustration 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

You are given a set of (x,y) data. Apparently, Y is dependent on X. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Observed data with uncertainties (Linear model) 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Observed data with uncertainties (Power model) The linear model fits the data better than the power model. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Theoretical model: Theoretical model The power model performs better than the linear model. Sum of squared errors (SSE) of estimates of the linear and power models (with respect to the theoretical model) are 12011.7 and 8950.08, respectively. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Key topics in statistics Probability Estimation Test of hypotheses Regression Forecasting Quality control Simulation … 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Deterministic vs Stochastic Models An abstract model is a description of the essential properties of a phenomenon that is formulated in mathematical terms. An abstract model is used as a theoretical approximation of reality to help us understand the world around us. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Essentially, all models are wrong, but some are useful. Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful. (George E. P. Box) Normal distribution for men’s height, grades in a statistics class, etc. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Types of abstract models Deterministic model A deterministic model describes a phenomenon whose outcome is fixed. Stochastic model A random/stochastic model describes the unpredictable variation of the outcomes of a random experiment. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Examples Deterministic model Suppose we wish to measure the area covered by a lake that, for all practical purposes, appears to have a circular shoreline. Since we know the area A=r2, where r is the radius, we would attempt to measure the radius and substitute it in the formula. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Stochastic model Consider the experiment of tossing a balanced coin and observing the upper face. It is not possible to predict with absolute accuracy what the upper face will be even if we repeat the experiment so many times. However, it is possible to predict what will happen in the long run. We can say that the probability of heads on a single toss is ½. P(more than 60 heads in 100 trials) 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Variability and Uncertainty Determinism versus stochasticism Is the real world deterministic or stochastic? Determinism We can perfectly predict future weather/climate if we know all physics of the weather system and the initial conditions of an ancient year are given. In reality, we do not know all physics of the weather system. Many models (numeric weather prediction models and general circulation models) have been developed and no models are perfect. Variabilities exist no matter the real world is deterministic or stochastic. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Variability of errors due to non-perfect models and/or incomplete initial conditions. Example of variabilities in a deterministic process. Deterministic variability (perfectly predictable under complete initial condition, prediction errors under incomplete initial condition) Stochasticism Variation due to randomness (probability) exists in one or more components of a system. Models may consist of both deterministic and stochastic components. Under stochasticism, a perfect stochastic model (then it is no longer a model) is a model that perfectly describe the deterministic and stochastic behaviors of the system. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Even if we have a perfect stochastic model, we are not able to make perfect predictions. However, we can give a perfect statistical inference about our predictions. Prediction errors are unpredictable (uncertainties), but their properties can be perfectly described. In practice, we can never have a perfect model. Prediction errors are integral of errors due to non-perfect model (in both deterministic and stochastic components) and the inherent randomness. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

An example of seemingly random deterministic variability. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

An example of deterministic variability which looks seemingly random. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Random Experiment and Sample Space An experiment that can be repeated under the same (or uniform) conditions, but whose outcome cannot be predicted in advance, even when the same experiment has been performed many times, is called a random experiment. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Examples of random experiments The tossing of a coin. The roll of a die. The selection of a numbered ball (1-50) in an urn. (selection with replacement) The time interval between the occurrences of two higher than scale 6 earthquakes. The amount of rainfalls produced by typhoons in one year (yearly typhoon rainfalls). 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

The following items are always associated with a random experiment: Sample space. The set of all possible outcomes, denoted by . Outcomes. Elements of the sample space, denoted by . These are also referred to as sample points or realizations. Events. Subsets of  for which the probability is defined. Events are denoted by capital Latin letters (e.g., A,B,C). 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Definition of Probability Classical probability Frequency probability Probability model 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Classical (or a priori) probability If a random experiment can result in n mutually exclusive and equally likely outcomes and if nA of these outcomes have an attribute A, then the probability of A is the fraction nA/n . 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Example 1. Compute the probability of getting two heads if a fair coin is tossed twice. (1/4) Example 2. The probability that a card drawn from an ordinary well-shuffled deck will be an ace or a spade. (16/52) 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Remarks The probabilities determined by the classical definition are called “a priori” probabilities since they can be derived purely by deductive reasoning. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

The “equally likely” assumption requires the experiment to be carried out in such a way that the assumption is realistic; such as, using a balanced coin, using a die that is not loaded, using a well-shuffled deck of cards, using random sampling, and so forth. This assumption also requires that the sample space is appropriately defined. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Troublesome limitations in the classical definition of probability: If the number of possible outcomes is infinite; If possible outcomes are not equally likely. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Relative frequency (or a posteriori) probability We observe outcomes of a random experiment which is repeated many times. We postulate a number p which is the probability of an event, and approximate p by the relative frequency f with which the repeated observations satisfy the event. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Suppose a random experiment is repeated n times under uniform conditions, and if event A occurred nA times, then the relative frequency for which A occurs is fn(A) = nA/n. If the limit of fn(A) as n approaches infinity exists then one can assign the probability of A by: P(A)= . 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

This method requires the existence of the limit of the relative frequencies. This property is known as statistical regularity. This property will be satisfied if the trials are independent and are performed under uniform conditions. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Example 3 A fair coin was tossed 100 times with 54 occurrences of head. The probability of head occurrence for each toss is estimated to be 0.54. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

The chain of probability definition Random experiment Sample space Event space Probability space 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Probability Model 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Event and event space An event is a subset of the sample space Event and event space An event is a subset of the sample space. The class of all events associated with a given random experiment is defined to be the event space. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Remarks 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Probability is a mapping of sets to numbers. Probability is not a mapping of the sample space to numbers. The expression is not defined. However, for a singleton event , is defined. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Probability space A probability space is the triplet (, A, P[]), where  is a sample space, A is an event space, and P[] is a probability function with domain A. A probability space constitutes a complete probabilistic description of a random experiment. The sample space  defines all of the possible outcomes, the event space A defines all possible things that could be observed as a result of an experiment, and the probability P defines the degree of belief or evidential support associated with the experiment. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Conditional probability 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Bayes’ theorem 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Multiplication rule 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Independent events 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

The property of independence of two events A and B and the property that A and B are mutually exclusive are distinct, though related, properties. If A and B are mutually exclusive events then AB=. Therefore, P(AB) = 0. Whereas, if A and B are independent events then P(AB) = P(A)P(B). Events A and B will be mutually exclusive and independent events only if P(AB)=P(A)P(B)=0, that is, at least one of A or B has zero probability. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

But if A and B are mutually exclusive events and both have nonzero probabilities then it is impossible for them to be independent events. Likewise, if A and B are independent events and both have nonzero probabilities then it is impossible for them to be mutually exclusive. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Summarizing data Qualitative data Frequency table 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Bar chart 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Quantitative data Histogram 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Boxplot 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Dealing with outliers Should the outliers be discarded or should they be retained? An example of outlier presence Typhoon Morakot 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Typhoon Morakot Cumulative rainfall (Aug 7, 0:00 – 24:00) 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Cumulative rainfall (Aug 8, 0:00 – 24:00) 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Cumulative rainfall (Aug 9, 0:00 – 24:00) 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Cumulative rainfall in mm 2009/08/07 00:00 ~ 2009/08/09 17:00 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Measures of Central Tendency Mean Sum of measurements divided by the number of measurements. Median Middle value when the data are sorted. Mode Value or category that occurs most frequently. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Measures of Variation Standard Deviation - summarizes how far away from the mean the data value typically are. Range 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Reading assignment IPSUR (Will be covered in the tutor session) Chapt. 2 Chapt. 3 3.1.1, 3.1.3, 3.1.4 3.3 3.4.3, 3.4.4, 3.4.5, 3.4.6, 3.4.7 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University