Presentation is loading. Please wait.

Presentation is loading. Please wait.

STATISTICS Exploratory Data Analysis and Probability

Similar presentations


Presentation on theme: "STATISTICS Exploratory Data Analysis and Probability"— Presentation transcript:

1 STATISTICS Exploratory Data Analysis and Probability
Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

2 What is “statistics”? Statistics is a science of “reasoning” from data. A body of principles and methods for extracting useful information from data, for assessing the reliability of that information, for measuring and managing risk, and for making decisions in the face of uncertainty. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

3 The major difference between statistics and mathematics is that statistics always needs “observed” data, while mathematics does not. An important feature of statistical methods is the “uncertainty” involved in analysis. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

4 Statistics is the discipline concerned with the study of variability, with the study of uncertainty and with the study of decision-making in the face of uncertainty. As these are issues that are crucial throughout the sciences and engineering, statistics is an inherently interdisciplinary science. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

5 Stochastic Modeling & Simulation
Building probability models for real world phenomena. No matter how sophisticated a model is, it only represents our understanding of the complicated natural systems. Generating a large number of possible realizations. Making decisions or assessing risks based on simulation results. Conducted by computers. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

6 Exploratory Data Analysis
Features of data distributions Histograms Center: mean, median Spread: variance, standard deviation, range Shape: skewness, kurtosis Order statistics and sample quantiles Clusters Extreme observations: outliers 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

7 Histogram: frequencies and relative frequencies
A sample data set X 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

8 Frequency histogram 4/16/2018
Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

9 Relative histogram 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

10 Measures of center Sample mean Sample median Sample mean = 98.26067
4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

11 One desirable property of the sample median is that it is resistant to extreme observations, in the sense that its value depends only the values of the middle observations, and is quite unaffected by the actual values of the outer observations in the ordered list. The same cannot be said for the sample mean. Any significant changes in the magnitude of an observation results in a corresponding change in the value of the mean. Hence, the sample mean is said to be sensitive to extreme observations. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

12 Measures of spread Sample variance and sample standard deviation Range
the difference between the largest and smallest values Sample variance = Sample standard deviation = Range = ( – ) 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

13 Measures of shape Sample skewness Sample kurtosis
Sample kurtosis = (or in R) 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

14 Order statistics Sample quantiles Linear interpolation 4/16/2018
Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

15 Box-and-whisker plot (or box plot)
A box-and-whisker plot includes two major parts – the box and the whiskers. A parameter range determines how far the plot whiskers extend out from the box. If range is positive, the whiskers extend to the most extreme data point which is no more than range times the interquartile range (IQR) from the box. A value of zero causes the whiskers to extend to the data extremes. Outliers are marked by points which fall beyond the whiskers. Hinges and the five-number summary 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

16 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

17 In R, a boxplot is essentially a graphical representation determined by the 5NS.
Not “linear interpolation” The summary function in R yields a list of six numbers: 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

18 Box-and-whisker plot of X
4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

19 Seasonal variation of average monthly rainfalls in CDZ, Myanmar
Boxplots are based on average monthly rainfalls of 54 rainfall stations. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

20 Random Experiment and Sample Space
An experiment that can be repeated under the same (or uniform) conditions, but whose outcome cannot be predicted in advance, even when the same experiment has been performed many times, is called a random experiment. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

21 Examples of random experiments
Tossing a coin. Rolling a die. The selection of a numbered ball (1-50) in an urn. (selection with replacement) Occurrences of earthquakes The time interval between the occurrences of two consecutive higher-than-scale 6 earthquakes. Occurrences of typhoons The amount of rainfalls produced by typhoons in one year (yearly typhoon rainfalls). 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

22 The following items are always associated with a random experiment:
Sample space. The set of all possible outcomes, denoted by . Outcomes. Elements of the sample space, denoted by . These are also referred to as sample points or realizations. Events. An event is a subsets of  for which the probability is defined. Events are denoted by capital Latin letters (e.g., A,B,C). 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

23 Definition of Probability
Classical probability Frequency probability Probability model 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

24 Classical (or a priori) probability
If a random experiment can result in n mutually exclusive and equally likely outcomes and if nA of these outcomes have an attribute A, then the probability of A is the fraction nA/n . 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

25 Example 1. Compute the probability of getting two heads if a fair coin is tossed twice. (1/4) Example 2. The probability that a card drawn from an ordinary well-shuffled deck will be an ace or a spade. (16/52) 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

26 Remarks The probabilities determined by the classical definition are called “a priori” probabilities since they can be derived purely by deductive reasoning. The “equally likely” assumption requires the experiment to be carried out in such a way that the assumption is realistic; such as, using a balanced coin, using a die that is not loaded, using a well-shuffled deck of cards, using random sampling, and so forth. This assumption also requires that the sample space is appropriately defined. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

27 Troublesome limitations in the classical definition of probability:
If the number of possible outcomes is infinite; If possible outcomes are not equally likely. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

28 Relative frequency (or a posteriori) probability
We observe outcomes of a random experiment which is repeated many times. We postulate a number p which is the probability of an event, and approximate p by the relative frequency f with which the repeated observations satisfy the event. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

29 Suppose a random experiment is repeated n times under uniform conditions, and if event A occurred nA times, then the relative frequency for which A occurs is fn(A) = nA/n. If the limit of fn(A) as n approaches infinity exists then one can assign the probability of A by: P(A)= 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

30 This method requires the existence of the limit of the relative frequencies. This property is known as statistical regularity. This property will be satisfied if the trials are independent and are performed under uniform conditions. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

31 Example 3 A fair coin was tossed 100 times with 54 occurrences of head. The probability of head occurrence for each toss is estimated to be 0.54. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

32 The chain of probability definition
Random experiment Sample space Event space Probability space 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

33 Probability Model Each outcome can be thought of as a sample point, or an element, in the sample space. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

34 Event and event space An event is a subset of the sample space. The class of all events associated with a given random experiment is defined to be the event space. An event will always be a subset of the sample space, but for sufficiently large sample spaces not all subsets will be events. Thus the class of all subsets of the sample space will not necessarily correspond to the event space. If the sample space consists of only a finite number of points, then the corresponding event space will be the class of all subsets of the sample space. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

35  (the empty set) and  (the sure event) are both subsets of .
An event A is said to occur if the experiment at hand results in an outcome that belongs to A. An event space is usually denoted by a script Latin letter such as A and B. Two events A and B are said to be mutually exclusive if and only if Events are mutually exclusive if and only if 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

36 Event space and algebra of events
Let A denote an event space, the following properties are called the Boolean algebra, or algebra of events: 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

37 Probability function Let  denote the sample space and A denote an algebra of events for some random experiment. Then, a probability function P is a set function with domain A (an algebra of events) and counter domain the interval [0, 1] which satisfies the following axioms: 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

38 Probability is a mapping (function) of sets to numbers.
Probability is not a mapping of the sample space to numbers. The expression is not defined. However, for a singleton event , is defined. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

39 Probability space A probability space is the triplet (, A, P[]), where  is a sample space, A is an event space, and P[] is a probability function with domain A. A probability space constitutes a complete probabilistic description of a random experiment. The sample space  defines all of the possible outcomes, the event space A defines all possible things that could be observed as a result of an experiment, and the probability P defines the degree of belief or evidential support associated with the experiment. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

40 Finite Sample Space A random experiment can result in a finite number of possible outcomes. A sample space with only a finite number of elements (points) is called a finite sample space. Finite sample space with equally likely points – simple sample space Finite sample space without equally likely points 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

41 Conditional probability
4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

42 Bayes’ theorem 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

43 Multiplication rule 4/16/2018
Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

44 Independent events 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

45 The property of independence of two events A and B and the property that A and B are mutually exclusive are distinct, though related, properties. If A and B are mutually exclusive events then AB=. Therefore, P(AB) = 0. Whereas, if A and B are independent events then P(AB) = P(A)P(B). Events A and B will be mutually exclusive and independent events only if P(AB)=P(A)P(B)=0, that is, at least one of A or B has zero probability. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

46 But if A and B are mutually exclusive events and both have nonzero probabilities then it is impossible for them to be independent events. Likewise, if A and B are independent events and both have nonzero probabilities then it is impossible for them to be mutually exclusive. 4/16/2018 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University


Download ppt "STATISTICS Exploratory Data Analysis and Probability"

Similar presentations


Ads by Google