Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Slides:



Advertisements
Similar presentations
Random Variables ECE460 Spring, 2012.
Advertisements

Statistics review of basic probability and statistics.
Continuous Probability Distributions.  Experiments can lead to continuous responses i.e. values that do not have to be whole numbers. For example: height.
Lecture 10 – Introduction to Probability Topics Events, sample space, random variables Examples Probability distribution function Conditional probabilities.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-4690: Experimental Networking Informal Quiz: Prob/Stat Shiv Kalyanaraman:
Review of Basic Probability and Statistics
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.
Test 2 Stock Option Pricing
1 Midterm Review Econ 240A. 2 The Big Picture The Classical Statistical Trail Descriptive Statistics Inferential Statistics Probability Discrete Random.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part II: Quantitative Work, Regression.
Prof. Bart Selman Module Probability --- Part d)
Chapter 6 Continuous Random Variables and Probability Distributions
1 Review of Probability Theory [Source: Stanford University]
Rensselaer Polytechnic Institute © Shivkumar Kalvanaraman & © Biplab Sikdar1 ECSE-4730: Computer Communication Networks (CCN) Network Layer Performance.
Summarizing Measured Data Part I Visualization (Chap 10) Part II Data Summary (Chap 12)
Probability and Statistics Review
CHAPTER 6 Statistical Analysis of Experimental Data
Probability Distributions Random Variables: Finite and Continuous A review MAT174, Spring 2004.
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
Chapter 5 Continuous Random Variables and Probability Distributions
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.
Probability Review ECS 152A Acknowledgement: slides from S. Kalyanaraman & B.Sikdar.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
CMPE 252A: Computer Networks Review Set:
Review of Probability.
Lecture 10 – Introduction to Probability Topics Events, sample space, random variables Examples Probability distribution function Conditional probabilities.
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
1 Performance Evaluation of Computer Systems By Behzad Akbari Tarbiat Modares University Spring 2009 Introduction to Probabilities: Discrete Random Variables.
Probability Theory and Random Processes
QA in Finance/ Ch 3 Probability in Finance Probability.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
Tch-prob1 Chap 3. Random Variables The outcome of a random experiment need not be a number. However, we are usually interested in some measurement or numeric.
Winter 2006EE384x1 Review of Probability Theory Review Session 1 EE384X.
Random Variables & Probability Distributions Outcomes of experiments are, in part, random E.g. Let X 7 be the gender of the 7 th randomly selected student.
Continuous Probability Distributions  Continuous Random Variable  A random variable whose space (set of possible values) is an entire interval of numbers.
Modeling and Simulation CS 313
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Ch2: Probability Theory Some basics Definition of Probability Characteristics of Probability Distributions Descriptive statistics.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Theory of Probability Statistics for Business and Economics.
Engineering Probability and Statistics Dr. Leonore Findsen Department of Statistics.
CPSC 531: Probability Review1 CPSC 531:Probability & Statistics: Review II Instructor: Anirban Mahanti Office: ICT 745
OPIM 5103-Lecture #3 Jose M. Cruz Assistant Professor.
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.
STA347 - week 31 Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5’s in the 6 rolls. Let X = number of.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
Chapter 01 Probability and Stochastic Processes References: Wolff, Stochastic Modeling and the Theory of Queues, Chapter 1 Altiok, Performance Analysis.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
1 3. Random Variables Let ( , F, P) be a probability model for an experiment, and X a function that maps every to a unique point the set of real numbers.
Probability Review-1 Probability Review. Probability Review-2 Probability Theory Mathematical description of relationships or occurrences that cannot.
Chapter 01 Probability and Stochastic Processes References: Wolff, Stochastic Modeling and the Theory of Queues, Chapter 1 Altiok, Performance Analysis.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Engineering Probability and Statistics Dr. Leonore Findsen Department of Statistics.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
1 Review of Probability and Random Processes. 2 Importance of Random Processes Random variables and processes talk about quantities and signals which.
Probability & Random Variables
Random Variables By: 1.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
Theoretical distributions: the other distributions.
MECH 373 Instrumentation and Measurements
Appendix A: Probability Theory
Review of Probability Theory
MEGN 537 – Probabilistic Biomechanics Ch.3 – Quantifying Uncertainty
Introductory Statistics
Presentation transcript:

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Based in part upon slides of Prof. Raj Jain (OSU) He uses statistics as a drunken man uses lamp-posts – for support rather than for illumination … A. Lang

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2 q Why Probability and Statistics: The Empirical Design Method… q Qualitative understanding of essential probability and statistics q Especially the notion of inference and statistical significance q Key distributions & why we care about them… q Reference: Chap 12, 13 (Jain), Chap 2-3 (Box,Hunter,Hunter), and q Overview

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3 Theory (Model) vs Experiment q A model is only as good as the NEW predictions it can make (subject to “statistical confidence”) q Physics: q Theory and experiment frequently changed roles as leaders and followers q Eg: Newton’s Gravitation theory, Quantum Mechanics, Einstein’s Relativity. q Einstein’s “thought experiments” vs real-world experiments that validate theory’s predictions q Medicine: q FDA Clinical trials on New drugs: Do they work? q “Cure worse than the disease” ? q Networking: q How does OSPF or TCP or BGP behave/perform ? q Operator or Designer: What will happen if I change …. ?

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4 Why Probability? BIG PICTURE q Humans like determinism q The real-world is unfortunately random! q => CANNOT place ANY confidence on a single measurement or simulation result that contains potential randomness q However, q We can make deterministic statements about measures or functions of underlying randomness … q … with bounds on degree of confidence q Functions of Randomness : q Probability of a random event or variable q Average (mean, median, mode), Distribution functions (pdf, cdf), joint pdfs/cdfs, conditional probability, confidence intervals, q Goal: Build “probabilistic” models of reality q Constraint: minimum # experiments q Infer to get a model (I.e. maximum information) q Statistics: how to infer models about reality (“population”) given a SMALL set of expt results (“sample”)

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5 Why Care About Statistics? How to make this empirical design process EFFICIENT?? How to avoid pitfalls in inference! Measure, simulate, experiment Model, Hypothesis, Predictions

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6 Probability q Think of probability as modeling an experiment q Eg: tossing a coin! q The set of all possible outcomes is the sample space: S q Classic “Experiment”: Tossing a die:S = {1,2,3,4,5,6} q Any subset A of S is an event: A = {the outcome is even} = {2,4,6}

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7 Probability of Events: Axioms P is the Probability Mass function if it maps each event A, into a real number P(A), and: i.) ii.) P(S) = 1 iii.)If A and B are mutually exclusive events then,

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8 Probability of Events …In fact for any sequence of pair-wise- mutually-exclusive events, we have

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9 Other Properties Can be Derived q q q q q q q q Derived by breaking up above sets into mutually exclusive pieces and comparing to fundamental axioms!!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10 Recall: Why care about Probability? q …. We can be deterministic about measures or functions of underlying randomness.. q Functions of Randomness : q Probability of a random event or variable q Even though the experiment has a RANDOM OUTCOME (eg: 1, 2, 3, 4, 5, 6 or heads/tails) or EVENTS (subsets of all outcomes) q The probability function has a DETERMINISTIC value q If you forget everything in this class, do not forget this!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11 Conditional Probability = (conditional) probability that the outcome is in A given that we know the outcome in B Example: Toss one die. Note that:

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12 Independence q Events A and B are independent if P(AB) = P(A)P(B). q Also:and q Example: A card is selected at random from an ordinary deck of cards. q A=event that the card is an ace. q B=event that the card is a diamond.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13 Random Variable as a Measurement q We cannot give an exact description of a sample space in these cases, but we can still describe specific measurements on them q The temperature change produced. q The number of photons emitted in one millisecond. q The time of arrival of the packet.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14 Random Variable as a Measurement q Thus a random variable can be thought of as a measurement on an experiment

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15 Probability Distribution Function (pdf) a.k.a. frequency histogram, p.m.f (for discrete r.v.)

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16 Probability Mass Function for a Random Variable q The probability mass function (PMF) for a (discrete valued) random variable X is: q Note thatfor q Also for a (discrete valued) random variable X

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17 PMF and CDF: Example

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18 Cumulative Distribution Function q The cumulative distribution function (CDF) for a random variable X is q Note that is non-decreasing in x, i.e. q Also and

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19 Recall: Why care about Probability? q Humans like determinism q The real-world is unfortunately random! q CANNOT place ANY confidence on a single measurement q We can be deterministic about measures or functions of underlying randomness … q Functions of Randomness : q Probability of a random event or variable q Average (mean, median, mode), Distribution functions (pdf, cdf), joint pdfs/cdfs, conditional probability, confidence intervals,

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20 Expectation (Mean) = Center of Gravity

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21 Expectation of a Random Variable q The expectation (average) of a (discrete-valued) random variable X is q Three coins example:

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22 Median, Mode q Median = F -1 (0.5), where F = CDF q Aka 50% percentile element q I.e. Order the values and pick the middle element q Used when distribution is skewed q Considered a “robust” measure q Mode: Most frequent or highest probability value q Multiple modes are possible q Need not be the “central” element q Mode may not exist (eg: uniform distribution) q Used with categorical variables

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25 Measures of Spread/Dispersion: Why Care? You can drown in a river of average depth 6 inches!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26 q Variance: second moment around the mean: q  2 = E((X-  ) 2 ) q Standard deviation =  q Coefficient of Variation (C.o.V.)=  /  q SIQR= Semi-Inter-Quartile Range (used with median = 50 th percentile) q (75 th percentile – 25 th percentile)/2 Standard Deviation, Coeff. Of Variation, SIQR

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27 Covariance and Correlation: Measures of Dependence q Covariance: = q For i = j, covariance = variance! q Independence => covariance = 0 (not vice-versa!) q Correlation (coefficient) is a normalized (or scaleless) form of covariance: q Between –1 and +1. q Zero => no correlation (uncorrelated). q Note: uncorrelated DOES NOT mean independent!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28 Recall: Why care about Probability? q Humans like determinism q The real-world is unfortunately random! q CANNOT place ANY confidence on a single measurement q We can be deterministic about measures or functions of underlying randomness … q Functions of Randomness : q Probability of a random event or variable q Average (mean, median, mode), Distribution functions (pdf, cdf), joint pdfs/cdfs, conditional probability, confidence intervals,

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29 Continuous-valued Random Variables q So far we have focused on discrete(-valued) random variables, e.g. X(s) must be an integer q Examples of discrete random variables: number of arrivals in one second, number of attempts until success q A continuous-valued random variable takes on a range of real values, e.g. X(s) ranges from 0 to as s varies. q Examples of continuous(-valued) random variables: time when a particular arrival occurs, time between consecutive arrivals.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 30 Continuous-valued Random Variables q Thus, for a continuous random variable X, we can define its probability density function (pdf) q Note that since is non-decreasing in x we have for all x.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 31 Properties of Continuous Random Variables q From the Fundamental Theorem of Calculus, we have q In particular, q More generally,

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 32 Expectation of a Continuous Random Variable q The expectation (average) of a continuous random variable X is given by q Note that this is just the continuous equivalent of the discrete expectation

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 33 Important (Discrete) Random Variable: Bernoulli q The simplest possible measurement on an experiment: q Success (X = 1) or failure (X = 0). q Usual notation: q E(X)=

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 34 Important (discrete) Random Variables: Binomial q Let X = the number of success in n independent Bernoulli experiments ( or trials). P(X=0) = P(X=1) = P(X=2)= In general, P(X = x) = Binomial Variables are useful for proportions (of successes. Failures) for a small number of repeated experiments. For larger number (n), under certain conditions (p is small), Poisson distribution is used.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 35 Binomial can be skewed or normal Depends upon p and n !

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 36 Important Random Variable: Poisson q A Poisson random variable X is defined by its PMF: Where > 0 is a constant q Exercise: Show that and E(X) = q Poisson random variables are good for counting frequency of occurrence: like the number of customers that arrive to a bank in one hour, or the number of packets that arrive to a router in one second.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 37 Important Continuous Random Variable: Exponential q Used to represent time, e.g. until the next arrival q Has PDF for some > 0 q Properties: q Need to use integration by Parts!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 38 Memoryless Property of the Exponential q An exponential random variable X has the property that “the future is independent of the past”, i.e. the fact that it hasn’t happened yet, tells us nothing about how much longer it will take. q In math terms

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 39 Recall: Why care about Probability? q Humans like determinism q The real-world is unfortunately random! q CANNOT place ANY confidence on a single measurement q We can be deterministic about measures or functions of underlying randomness … q Functions of Randomness : q Probability of a random event or variable q Average (mean, median, mode), Distribution functions (pdf, cdf), joint pdfs/cdfs, conditional probability, confidence intervals,

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 40 Gaussian/Normal Distribution: Why? Uniform distribution looks nothing like bell shaped (gaussian)! Large spread (  )! Sample mean of uniform distribution (a.k.a sampling distribution), after very few samples looks remarkably gaussian, with decreasing  ! CENTRAL LIMIT TENDENCY!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 41 Other interesting facts about Gaussian q Uncorrelated r.vs. + gaussian => INDEPENDENT! q Important in random processes (I.e. sequences of random variables) q Random variables that are independent, and have exactly the same distribution are called IID (independent & identically distributed) q IID and normal with zero mean and variance  2 => IIDN(0,  2 )

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 42 Important Random Variables: Normal

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 43 Normal Distribution: PDF & CDF q PDF: q With the transformation: (a.k.a. unit normal deviate) q z-normal-PDF: z

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 44 Height & Spread of Gaussian Can Vary!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 45 Rapidly Dropping Tail Probability! Sample mean is a gaussian r.v., with x =  & s =  /(n) 0.5  With larger number of samples, avg of sample means is an excellent estimate of true mean.  If (original)  is known, invalid mean estimates can be rejected with HIGH confidence!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 46 Recall: Why care about Probability? q Humans like determinism q The real-world is unfortunately random! q CANNOT place ANY confidence on a single measurement q We can be deterministic about measures or functions of underlying randomness … q Functions of Randomness : q Probability of a random event or variable q Average (mean, median, mode), Distribution functions (pdf, cdf), joint pdfs/cdfs, conditional probability, confidence intervals, q Goal: Build “probabilistic” models of reality q Constraint: minimum # experiments q Infer to get a model (I.e. maximum information) q Statistics: how to infer models about reality (“population”) given a SMALL set of expt results (“sample”)

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 47 Confidence Interval q Probability that a measurement will fall within a closed interval [a,b]: (mathworld definition…) = (1-  ) q Jain: the interval [a,b] = “confidence interval”; q the probability level, 100(1-  )= “confidence level”; q  = “significance level” q Sampling distribution for means leads to high confidence levels, I.e. small confidence intervals

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 48 Meaning of Confidence Interval

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 49 Statistical Inference: Is  A =  B ? Note: sample mean y A is not  A, but its estimate! Is this difference statistically significant? Is the null hypothesis y A = y B false ?

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 50 Step 1: Plot the samples

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 51 Compare to (external) reference distribution (if available) Since 1.30 is at the tail of the reference distribution, the difference between means is NOT statistically significant!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 52 Random Sampling Assumption! Under random sampling assumption, and the null hypothesis of y A = y B, we can view the 20 samples from a common population & construct a reference distributions from the samples itself !

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 53 t-distribution: Create a Reference Distribution from the Samples Itself!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 54 t-distribution

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 55 Statistical Significance with Various Inference Techniques Normal population assumption not required Random sampling assumption required Std.dev. estimated from samples itself! t-distribution an approx. for gaussian!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 56 Normal,  2 & t-distributions: Useful for Statistical Inference

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 57 Relationship between Confidence Intervals and Comparisons of Means

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 58 Amen!