Probability and Statistics Review Thursday Mar 12.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Point Estimation Notes of STAT 6205 by Dr. Fan.
CS433: Modeling and Simulation
1 Essential Probability & Statistics (Lecture for CS598CXZ Advanced Topics in Information Retrieval ) ChengXiang Zhai Department of Computer Science University.
Simulation with ArenaAppendix C – A Refresher on Probability and StatisticsSlide 1 of 33 A Refresher on Probability and Statistics Appendix C.
Laws of division of casual sizes. Binomial law of division.
Lec 18 Nov 12 Probability – definitions and simulation.
Review of Basic Probability and Statistics
Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Short review of probabilistic concepts
Joint Distributions, Marginal Distributions, and Conditional Distributions Note 7.
Background Knowledge Brief Review on Counting,Counting, Probability,Probability, Statistics,Statistics, I. TheoryI. Theory.
Statistics Lecture 18. Will begin Chapter 5 today.
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Visual Recognition Tutorial
Short review of probabilistic concepts Probability theory plays very important role in statistics. This lecture will give the short review of basic concepts.
Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex.
Probability: Review TexPoint fonts used in EMF.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Information Theory and Security
Probability Review Thursday Sep 13.
Probability and Statistics Review Thursday Sep 11.
Basic Probability (most slides borrowed, with permission, from Andrew Moore of CMU and Google)
Joint Distribution of two or More Random Variables
Sampling Distributions  A statistic is random in value … it changes from sample to sample.  The probability distribution of a statistic is called a sampling.
Probability, Bayes’ Theorem and the Monty Hall Problem
Recitation 1 Probability Review
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
2. Mathematical Foundations
OUTLINE Probability Theory Linear Algebra Probability makes extensive use of set operations, A set is a collection of objects, which are the elements.
Modeling and Simulation CS 313
Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Probability Review –Statistical Estimation (MLE) Readings: Barber 8.1, 8.2.
Theory of Probability Statistics for Business and Economics.
1 Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze Course Book.
CPS 570: Artificial Intelligence Introduction to probability Instructor: Vincent Conitzer.
1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Probability theory Petter Mostad Sample space The set of possible outcomes you consider for the problem you look at You subdivide into different.
STA347 - week 51 More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function.
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 5.2: Recap on Probability Theory Jürgen Sturm Technische Universität.
Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
Monty Hall problem. Joint probability distribution  In the study of probability, given two random variables X and Y, the joint distribution of X and.
Probability Refresher. Events Events as possible outcomes of an experiment Events define the sample space (discrete or continuous) – Single throw of a.
Probability (outcome k) = Relative Frequency of k
Basics on Probability Jingrui He 09/11/2007. Coin Flips  You flip a coin Head with probability 0.5  You flip 100 coins How many heads would you expect.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
1 Probability and Statistical Inference (9th Edition) Chapter 4 Bivariate Distributions November 4, 2015.
Presented by Minkoo Seo March, 2006
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
CS 2750: Machine Learning Probability Review Prof. Adriana Kovashka University of Pittsburgh February 29, 2016.
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Review of Probability.
Probability and Information Theory
STAT 311 REVIEW (Quick & Dirty)
Appendix A: Probability Theory
CS 2750: Machine Learning Probability Review Density Estimation
Review of Probabilities and Basic Statistics
Review of Probability and Estimators Arun Das, Jason Rebello
CSE-490DF Robotics Capstone
Advanced Artificial Intelligence
Covered only ML estimator
Statistical NLP: Lecture 4
M248: Analyzing data Block A UNIT A3 Modeling Variation.
Probability overview Event space – set of possible outcomes
Experiments, Outcomes, Events and Random Variables: A Revisit
Presentation transcript:

Probability and Statistics Review Thursday Mar 12

מודל למידה מה המשתנים החשובים? מה הטווח שלהם? מהן הקומבינציות החשובות?

חזרה מהירה על הסתברות מאורע, אוסף מאורעות משתנה מקרי הסתברות מותנית, חוק ההסתברות השלמה, חוק השרשרת, חוק בייס תלות ואי תלות שונות, תוחלת ושונות משותפת Moments

Sample space and Events  Sample Space, result of an experiment If you toss a coin twice  Event: a subset of  First toss is head = {HH,HT} S: event space, a set of events  Closed under finite union and complements Entails other binary operation: union, diff, etc. Contains the empty event and 

Probability Measure Defined over (  S  s.t. P(  ) >= 0 for all  in S P(  ) = 1 If  are disjoint, then P(  U  ) = p(  ) + p(  ) We can deduce other axioms from the above ones Ex: P(  U  ) for non-disjoint event

Visualization We can go on and define conditional probability, using the above visualization

הסתברות מותנית -P(F|H) = Fraction of worlds in which H is true that also have F true

Rule of total probability A B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7

Bayes Rule We know that P(smart) =.7 If we also know that the students grade is A+, then how this affects our belief about his intelligence? Where this comes from?

דוגמא במפעל פעולות שתי מכונות A ו-B. 10% מתוצרת המפעל מיוצרת במכונה A ו- 90% במכונה B. 1% מהמוצרים המיוצרים במכונה A ו 5% מהמוצרים המיוצרים במכונה B הם פגומים. נבחר מוצר אקראי, מה ההסתברות שהוא פגום? נמצא מוצר שהוא פגום, מה ההסתברות שיוצר במכונה A? אחרי ביקור של טכנאי שמטפל במכונה B, מוצאים ש 1.9% ממוצרי המפעל הם פגומים. מה עכשיו ההסתברות שמוצר המיוצר במכונה B יהיה פגום? פתרון: נגדיר את המאורעות הבאים: A-המוצר הנבחר יוצר במכונה A, B-המוצר הנבחר יוצר במכונה B, C-המוצר שנבחר פגום. א. עפ"י נוסחת ההסתברות השלמה מתקיים: P(C|A)*P(A)+P(C|B)*P(B) 0.046=0.01* *0.9 ב. עפ"י נוסחת בייס: P(A|C)=P(C|A)*P(A)/P(C) => P(A|C)=0.01*0.1/0.046=0.021

משתנה מקרי בדיד מ " מ הוא פונקציה ממרחב המאורעות הכללי ( העולם ) למרחב המאפיינים בעצם ניתן לייצג התפלגות חדשה על פי המשתנה המקרי Modeling students (Grade and Intelligence):  all possible students What are events Grade_A = all students with grade A Grade_B = all students with grade A Intelligence_High = … with high intelligence

Random Variables  High low A B A+ I:Intelligence G:Grade

Random Variables  High low A B A+ I:Intelligence G:Grade P(I = high) = P( {all students whose intelligence is high})

הסתברות משותפת Joint probability distributions quantify this P( X= x, Y= y) = P(x, y) How probable is it to observe these two attributes together? How can we manipulate Joint probability distributions? דוגמא:מרכיבים באקראי מס' דו סיפרתי מהספרות 1,2,3,4. יהי X מס' הספרות השונות המופיעות במס' ו Y מס' הפעמים שהספרה 1 מופיעה. מהי ההתפלגות המשותפת של הזוג (X,Y)? 21XYXY 6/163/16 0 6/ /16 2

חוק השרשרת Always true P(x,y,z) = p(x) p(y|x) p(z|x, y) = p(z) p(y|z) p(x|y, z) =… כדי לסבך קצת את העניינים נוסיף לשאלה מקודם את הנתון הבא z יהיה מס הפעמים שמופיע ספרה גדולה ממש מ 2 P(x=2,y=1,z=1)=P(x=2)*P(y=1|x=2)*P(z=1|y=1,x=2) =0.75*[(6/16)/P(x=2)]*[(4/16)/(6/16)]

Conditional Probability But we will always write it this way: events

הסתברות השולית We know P(X,Y), what is P(X=x)? We can use the low of total probability, why? A B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7

Marginalization Cont. Another example

Bayes Rule cont. You can condition on more variables

אי תלות X is independent of Y means that knowing Y does not change our belief about X. P(X|Y=y) = P(X) P(X=x, Y=y) = P(X=x) P(Y=y) Why this is true? The above should hold for all x, y It is symmetric and written as X  Y

CI: Conditional Independence X  Y | Z if once Z is observed, knowing the value of Y does not change our belief about X The following should hold for all x,y,z P(X=x | Z=z, Y=y) = P(X=x | Z=z) P(Y=y | Z=z, X=x) = P(Y=y | Z=z) P(X=x, Y=y | Z=z) = P(X=x| Z=z) P(Y=y| Z=z) We call these factors : very useful concept !!

Properties of CI Symmetry: – (X  Y | Z)  (Y  X | Z) Decomposition: – (X  Y,W | Z)  (X  Y | Z) Weak union: – (X  Y,W | Z)  (X  Y | Z,W) Contraction: – (X  W | Y,Z) & (X  Y | Z)  (X  Y,W | Z) Intersection: – (X  Y | W,Z) & (X  W | Y,Z)  (X  Y,W | Z) – Only for positive distributions! – P(  )>0, 8 ,  ;

Monty Hall Problem You're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1 The host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. Do you want to pick door No. 2 instead?

Host must reveal Goat B Host must reveal Goat A Host reveals Goat A or Host reveals Goat B

Monty Hall Problem: Bayes Rule : the car is behind door i, i = 1, 2, 3 : the host opens door j after you pick door i

Monty Hall Problem WLOG, i=1, j=3

Monty Hall Problem: Bayes Rule cont.

  You should switch!

Moments Mean (Expectation): Discrete RVs: Continuous RVs: Variance: Discrete RVs: Continuous RVs:

Properties of Moments Mean If X and Y are independent, Variance If X and Y are independent,

The Big Picture ModelData Probability Estimation/learning

Statistical Inference Given observations from a model What (conditional) independence assumptions hold? Structure learning If you know the family of the model (ex, multinomial), What are the value of the parameters: MLE, Bayesian estimation. Parameter learning

MLE Maximum Likelihood estimation Example on board Given N coin tosses, what is the coin bias (  )? Sufficient Statistics: SS Useful concept that we will make use later In solving the above estimation problem, we only cared about N h, N t, these are called the SS of this model. All coin tosses that have the same SS will result in the same value of  Why this is useful?

Statistical Inference Given observation from a model What (conditional) independence assumptions holds? Structure learning If you know the family of the model (ex, multinomial), What are the value of the parameters: MLE, Bayesian estimation. Parameter learning We need some concepts from information theory

Information Theory P(X) encodes our uncertainty about X Some variables are more uncertain that others How can we quantify this intuition? Entropy: average number of bits required to encode X P(X) P(Y) XY

Information Theory cont. Entropy: average number of bits required to encode X We can define conditional entropy similarly We can also define chain rule for entropies (not surprising)

Mutual Information: MI Remember independence? If X  Y then knowing Y won’t change our belief about X Mutual information can help quantify this! (not the only way though) MI: Symmetric I(X;Y) = 0 iff, X and Y are independent!

Continuous Random Variables What if X is continuous? Probability density function (pdf) instead of probability mass function (pmf) A pdf is any function that describes the probability density in terms of the input variable x.

PDF Properties of pdf Actual probability can be obtained by taking the integral of pdf E.g. the probability of X being between 0 and 1 is

Cumulative Distribution Function Discrete RVs Continuous RVs

Acknowledgment Andrew Moore Tutorial: Monty hall problem :