Pattern Recognition Mathematic Review Hamid R. Rabiee Jafar Muhammadi Ali Jalali.

Slides:



Advertisements
Similar presentations
Chapter 3 Properties of Random Variables
Advertisements

Random Variables ECE460 Spring, 2012.
Continuous Probability Distributions.  Experiments can lead to continuous responses i.e. values that do not have to be whole numbers. For example: height.
Chapter 2: Probability Random Variable (r.v.) is a variable whose value is unknown until it is observed. The value of a random variable results from an.
FREQUENCY ANALYSIS Basic Problem: To relate the magnitude of extreme events to their frequency of occurrence through the use of probability distributions.
Review of Basic Probability and Statistics
Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
1 Def: Let and be random variables of the discrete type with the joint p.m.f. on the space S. (1) is called the mean of (2) is called the variance of (3)
Probability Theory Part 2: Random Variables. Random Variables  The Notion of a Random Variable The outcome is not always a number Assign a numerical.
Probability Densities
Chapter 6 Continuous Random Variables and Probability Distributions
Tch-prob1 Chapter 4. Multiple Random Variables Ex Select a student’s name from an urn. S In some random experiments, a number of different quantities.
Probability and Statistics Review
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Continuous Random Variables and Probability Distributions
Chapter 5 Continuous Random Variables and Probability Distributions
Review of Probability and Statistics
Random Variable and Probability Distribution
Lecture II-2: Probability Review
Modern Navigation Thomas Herring
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
NIPRL Chapter 2. Random Variables 2.1 Discrete Random Variables 2.2 Continuous Random Variables 2.3 The Expectation of a Random Variable 2.4 The Variance.
Continuous Probability Distribution  A continuous random variables (RV) has infinitely many possible outcomes  Probability is conveyed for a range of.
Sampling Distributions  A statistic is random in value … it changes from sample to sample.  The probability distribution of a statistic is called a sampling.
Hamid R. Rabiee Fall 2009 Stochastic Processes Review of Elementary Probability Lecture I.
Chapter 4 Continuous Random Variables and Probability Distributions
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Tch-prob1 Chap 3. Random Variables The outcome of a random experiment need not be a number. However, we are usually interested in some measurement or numeric.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
1 G Lect 3b G Lecture 3b Why are means and variances so useful? Recap of random variables and expectations with examples Further consideration.
CPSC 531: Probability Review1 CPSC 531:Probability & Statistics: Review II Instructor: Anirban Mahanti Office: ICT 745
1 G Lect 8b G Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey.
Normal Distribution Introduction. Probability Density Functions.
Continuous Distributions The Uniform distribution from a to b.
PROBABILITY CONCEPTS Key concepts are described Probability rules are introduced Expected values, standard deviation, covariance and correlation for individual.
1 G Lect 4a G Lecture 4a f(X) of special interest: Normal Distribution Are These Random Variables Normally Distributed? Probability Statements.
The Mean of a Discrete RV The mean of a RV is the average value the RV takes over the long-run. –The mean of a RV is analogous to the mean of a large population.
STA347 - week 51 More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
Risk Analysis & Modelling Lecture 2: Measuring Risk.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Mean, Variance, Moments and.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Statistics What is statistics? Where are statistics used?
Topic 5 - Joint distributions and the CLT
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Continuous Random Variables and Probability Distributions
1 Probability: Introduction Definitions,Definitions, Laws of ProbabilityLaws of Probability Random VariablesRandom Variables DistributionsDistributions.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
Pattern Recognition Mathematic Review Hamid R. Rabiee Jafar Muhammadi Ali Jalali.
1 Two Discrete Random Variables The probability mass function (pmf) of a single discrete rv X specifies how much probability mass is placed on each possible.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Function of a random variable Let X be a random variable in a probabilistic space with a probability distribution F(x) Sometimes we may be interested in.
Random Variables By: 1.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
Biostatistics Class 3 Probability Distributions 2/15/2000.
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
MECH 373 Instrumentation and Measurements
Chapter 7: Sampling Distributions
Lecture 13 Sections 5.4 – 5.6 Objectives:
3.1 Expectation Expectation Example
MEGN 537 – Probabilistic Biomechanics Ch.3 – Quantifying Uncertainty
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Chapter 2. Random Variables
Continuous Distributions
Continuous Random Variables: Basics
Presentation transcript:

Pattern Recognition Mathematic Review Hamid R. Rabiee Jafar Muhammadi Ali Jalali

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 2 Probability Space  A triple of (Ω, F, P)  Ω represents a nonempty set, whose elements are sometimes known as outcomes or states of nature  F represents a set, whose elements are called events. The events are subsets of Ω. F should be a “Borel Field”.  P represents the probability measure.  Fact: P(Ω) = 1

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 3 Random Variables  Random variable is a “function” (“mapping”) from a set of possible outcomes of the experiment to an interval of real (complex) numbers.  In other words:  Examples:  Mapping faces of a dice to the first six natural numbers.  Mapping height of a man to the real interval (0,3] (meter or something else).  Mapping success in an exam to the discrete interval [0,20] by quantum 0.1

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 4 Random Variables (Cont’d)  Random Variables  Discrete  Dice, Coin, Grade of a course, etc.  Continuous  Temperature, Humidity, Length, etc.  Random Variables  Real  Complex

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 5 Density/Distribution Functions  Probability Mass Function (PMF)  Discrete random variables  Summation of impulses  The magnitude of each impulse represents the probability of occurrence of the outcome  PMF values are probabilities.  Example I:  Rolling a fair dice

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 6 Density/Distribution Functions (Cont’d)  Example II:  Summation of two fair dices  Note : Summation of all probabilities should be equal to ONE. (Why?)

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 7 Density/Distribution Functions (Cont’d)  Probability Density Function (PDF)  Continuous random variables  The probability of occurrence of will be

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 8 Famous Density Functions  Some famous masses and densities  Uniform Density  Gaussian (Normal) Density  Exponential Density

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 9 Distribution Measures  Most basic type of descriptor for spatial distributions, include:  Mean Center  Variance & Standard Deviation  Covariance  Correlation Coefficient

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 10 Expected Value  Expected value ( population mean value)  Properties of Expected Value  The expected value of a constant is the constant itself. E[b]= b  If a and b are constants, then E[aX+ b]= a E[X]+ b  If X and Y are independent RVs, then E[XY]= E[X ]* E[Y]  If X is RV with PDF f(X), g(X) is any function of X, then,  if X is discrete  if X is continuous

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 11 Variance & Standard Deviation  Let X be a RV with E(X)=u, the distribution, or spread, of the X values around the expected value can be measured by the variance ( is the standard deviation of X).  The Variance Properties  The variance of a constant is zero.  If a and b are constants, then  If X and Y are independent RVs, then

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 12 Covariance  Covariance of two RVs, X and Y: Measurement of the nature of the association between the two.  Properties of Covariance:  If X, Y are two independent RVs, then  If a, b, c and d are constants, then  If X is a RV, then  Covariance Value  Cov(X,Y) is positively big = Positively strong relationship between the two.  Cov(X,Y) is negatively big = Negatively strong relationship between the two.  Cov(X,Y)=0 = No relationship between the two.

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 13 Covariance Matrices  If X is a n-Dim RV, then the covariance defined as: whose ij th element  ij is the covariance of x i and x j : then

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 14 Covariance Matrices (cont’d)  Properties of Covariance Matrix:  If the variables are statistically independent, the covariances are zero, and the covariance matrix is diagonal.  noting that the determinant of  is just the product of the variances, then we can write  This is the general form of a multivariate normal density function, where the covariance matrix is no longer required to be diagonal.

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 15 Correlation Coefficient  Correlation: Knowing about a random variable “X”, how much information will we gain about the other random variable “Y” ?  The population correlation coefficient is defined as  The Correlation Coefficient is a measure of linear association between two variables and lies between -1 and +1  -1 indicating perfect negative association  +1 indicating perfect positive association

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 16 Moments  Moments  n th order moment of a RV X is the expected value of X n :  Normalized form (Central Moment)  Mean is first moment  Variance is second moment added by the square of the mean.

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 17 Moments (cont’d)  Third Moment  Measure of asymmetry  Often referred to as skewness  Fourth Moment  Measure of flatness  Often referred to as Kurtosis If symmetric s= 0 small kurtosis large kurtosis

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 18 Sample Measures  Sample: a random selection of items from a lot or population in order to evaluate the characteristics of the lot or population  Sample Mean:  Sample Variance:  Sample Covariance:  Sample Correlation:  3th center moment  4th center moment

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 19 More on Gaussian Distribution  The Gaussian Distribution Function  Sometimes called “Normal” or “bell shaped”  Perhaps the most used distribution in all of science  Is fully defined by 2 parameters  95% of area is within 2σ  Normal distributions range from minus infinity to plus infinity Gaussian with  =0 and  =1

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 20 More on Gaussian Distribution (Cont’d)  Normal Distributions with the Same Variance but Different Means  Normal Distributions with the Same Means but Different Variances

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 21 More on Gaussian Distribution (Cont’d)  Standard Normal Distribution  A normal distribution whose mean is zero and standard deviation is one is called the standard normal distribution.  any normal distribution can be converted to a standard normal distribution with simple algebra. This makes calculations much easier.

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 22 Central Limit Theorem  Why is The Gaussian PDF is so applicable?  Central Limit Theorem  Illustrating CLT  a) 5000 Random Numbers  b) 5000 Pairs (r1+r2) of Random Numbers  c) 5000 Triplets (r1+r2+r3) of Random Numbers  d) plets (r1+r2+…+r12) of Random Numbers

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 23  Central Limit Theorem says that  we can use the standard normal approximation of the sampling distribution regardless of our data. We don’t need to know the underlying probability distribution of the data to make use of sample statistics.  This only holds in samples which are large enough to use the CLT’s “large sample properties.” So, how large is large enough?  Some books will tell you 30 is large enough.  Note: The CLT does not say: “in large samples the data is distributed normally.” Central Limit Theorem

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 24 Two Normal-like distributions  T-Student Distribution  Chi-Squared Distribution

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 25 Distances  Each clustering problem is based on some kind of “distance” between points.  A Euclidean space has some number of real-valued dimensions and “dense” points.  There is a notion of “average” of two points.  A Euclidean distance is based on the locations of points in such a space.  A Non-Euclidean distance is based on properties of points, but not their “location” in a space.  Distance Matrices  Once a distance measure is defined, we can calculate the distance between objects. These objects could be individual observations, groups of observations (samples) or populations of observations.  For N objects, we then have a symmetric distance matrix D whose elements are the distances between objects i and j.

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 26 Axioms of a Distance Measure  d is a distance measure if it is a function from pairs of points to reals such that: 1.d(x,y) > 0. 2.d(x,y) = 0 iff x = y. 3.d(x,y) = d(y,x). 4.d(x,y) < d(x,z) + d(z,y) (triangle inequality ).  For the purpose of clustering, sometimes the distance (similarity) is not required to be a metric  No Triangle Inequality  No Symmetry

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 27 Distance Measures  Euclidean Distance  L 2 Norm  L 1, L ∞ Norm  Mahalanobis Distance

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 28 Euclidean Distance (L 2 Norm)  A possible distance measure for spaces equipped with a Euclidean metric  Multivariate distances between populations  Calculates the Euclidean distance between two “points” defined by the multivariate means of two populations based on p variables.  Does not take into account differences among populations in within-population variability nor correlations among variables.

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 29 Euclidean Distance (L 2 Norm) (Cont’d)  Multivariate distances between populations X2X2 X1X1 Sample 1 Sample 2

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 30 Euclidean Distance (L 1, L ∞ Norm)  L 1 Norm: sum of the differences in each dimension.  Manhattan distance = distance if you had to travel along coordinates only.  L ∞ norm : d(x,y) = the maximum of the differences between x and y in any dimension. x = (5,5) y = (9,8) L2-norm: dist(x,y) =  (42+32) =5 L1-norm: dist(x,y) = 4+3 = L ∞ -norm: dist(x,y) = Max(4, 3) = 4

Sharif University of Technology, Department of Computer Engineering, Pattern Recognition Course 31 Mahalanobis Distance  Distances are computed based on means, variances and covariances for each of g samples (populations) based on p variables.  Mahalanobis distance “weights” the contribution of each pair of variables by the inverse of their covariance.  are samples means.  is the Covariance between samples.