CS6825: Probability Distributions An Introduction.

Slides:



Advertisements
Similar presentations
Chapter 4 Probability and Probability Distributions
Advertisements

Statistics 1: Introduction to Probability and Statistics Section 3-3.
2- 1 Chapter Two McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Introduction to Data Analysis
Sections 4.1 and 4.2 Overview Random Variables. PROBABILITY DISTRIBUTIONS This chapter will deal with the construction of probability distributions by.
QUANTITATIVE DATA ANALYSIS
Probability. Probability Definitions and Relationships Sample space: All the possible outcomes that can occur. Simple event: one outcome in the sample.
Some Basic Concepts Schaum's Outline of Elements of Statistics I: Descriptive Statistics & Probability Chuck Tappert and Allen Stix School of Computer.
BCOR 1020 Business Statistics Lecture 15 – March 6, 2008.
Chapter 4 Probability Distributions
Frequency Distribution A Frequency Distribution organizes data into classes, or categories, with a count of the number of observations that fall into each.
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
Slide 1 Statistics Workshop Tutorial 7 Discrete Random Variables Binomial Distributions.
Ka-fu Wong © 2003 Chap 2-1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Probability (cont.). Assigning Probabilities A probability is a value between 0 and 1 and is written either as a fraction or as a proportion. For the.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Chapter 6 Probability.
Standard Error of the Mean
Hypothesis Testing:.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Week71 Discrete Random Variables A random variable (r.v.) assigns a numerical value to the outcomes in the sample space of a random phenomenon. A discrete.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
2- 1 Chapter Two McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 3 Statistical Concepts.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
Continuous Probability Distributions  Continuous Random Variable  A random variable whose space (set of possible values) is an entire interval of numbers.
10/1/20151 Math a Sample Space, Events, and Probabilities of Events.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Estimates and Sample Sizes Lecture – 7.4
Chapter Twelve Census: Population canvass - not really a “sample” Asking the entire population Budget Available: A valid factor – how much can we.
Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.
10/3/20151 PUAF 610 TA Session 4. 10/3/20152 Some words My –Things to be discussed in TA –Questions on the course and.
Theory of Probability Statistics for Business and Economics.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Managerial Decision Making Facilitator: René Cintrón MBA / 510.
Psyc 235: Introduction to Statistics DON’T FORGET TO SIGN IN FOR CREDIT!
 Statistics The Baaaasics. “For most biologists, statistics is just a useful tool, like a microscope, and knowing the detailed mathematical basis of.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
TYPES There are several TYPES of variables that reflect characteristics of the data Ratio Interval Ordinal Nominal.
CS6825: Probability An Introduction Definitions An experiment is the process of observing a phenomenon with multiple possible outcomes An experiment.
Dr. Ahmed Abdelwahab Introduction for EE420. Probability Theory Probability theory is rooted in phenomena that can be modeled by an experiment with an.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Sections 5.1 and 5.2 Review and Preview and Random Variables.
Probability You’ll probably like it!. Probability Definitions Probability assignment Complement, union, intersection of events Conditional probability.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.
Quick review of some key ideas CEE 11 Spring 2002 Dr. Amelia Regan These notes draw liberally from the class text, Probability and Statistics for Engineering.
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC CHAPTER 10 Basic Statistical Concepts.
Chapter 8: Probability: The Mathematics of Chance Probability Models and Rules 1 Probability Theory  The mathematical description of randomness.  Companies.
Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions 5-4 Mean, Variance and Standard Deviation.
Central Limit Theorem Let X 1, X 2, …, X n be n independent, identically distributed random variables with mean  and standard deviation . For large n:
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions.
Probability Distributions ( 확률분포 ) Chapter 5. 2 모든 가능한 ( 확률 ) 변수의 값에 대해 확률을 할당하는 체계 X 가 1, 2, …, 6 의 값을 가진다면 이 6 개 변수 값에 확률을 할당하는 함수 Definition.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
MECH 373 Instrumentation and Measurements
Math a - Sample Space - Events - Definition of Probabilities
Discrete Random Variables
Continuous Random Variables
Random Variable.
STATISTICS Random Variables and Distribution Functions
Week 8 Chapter 14. Random Variables.
Confidence Intervals for a Population Mean, Standard Deviation Known
Random Variable.
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Estimates and Sample Sizes Lecture – 7.4
Presentation transcript:

CS6825: Probability Distributions An Introduction

Recall Probability Properties The probability of an event, say event A, is denoted P(A). The probability of an event, say event A, is denoted P(A). All probabilities are between 0 and 1. All probabilities are between 0 and 1. (i.e. 0 < P(A) < 1) The sum of the probabilities of all possible outcomes must be 1. The sum of the probabilities of all possible outcomes must be 1.

Remember our Shoe Examples and the Probabilities P(Nike) = 46/100 =.46 P(Adidas) = 24.5/100 =.245 P(Reebok) = 18.5/100 =.185 P(Asics) = 6.5/100 =.065 P(Other) = 4.5/100 =.045

Frequency distribution To better understand the main features of the data, we portray the frequency distribution which quickly reveals the shape of the data. To better understand the main features of the data, we portray the frequency distribution which quickly reveals the shape of the data. Frequency Distribution is a grouping of data into mutually exclusive classes (categories for nominal data, ranks for ranking, numerical ranges for interval and ratio data) showing the number of observations in each Frequency Distribution is a grouping of data into mutually exclusive classes (categories for nominal data, ranks for ranking, numerical ranges for interval and ratio data) showing the number of observations in each

Probability Distribution Probability Distribution is a Frequency Distribution where the frequency is represented by the probability (scale 0 to 1.0) Probability Distribution is a Frequency Distribution where the frequency is represented by the probability (scale 0 to 1.0) So, a probability distribution gives the chance of every event (if you have a discrete variable) or the probability across all ranges of events (if you have a continuous variable). So, a probability distribution gives the chance of every event (if you have a discrete variable) or the probability across all ranges of events (if you have a continuous variable).

Probability Distribution of our Shoe Examples

Steps to build frequency distribution (Suggestions but not the rules) 1. Decide on the number of classes  Sometimes you have natural knowledge of your problem and can define the classes you care about. Like in our shoe example “Nike”, “Reebok”, etc.  Sometimes you do not know the number of classes you need: Network Router Working, Network Router Failed. But, what about Router working at X% capacity? What about Y% and Z%?  When you cant decide on your number of classes: Good rule of thumb. To estimate the number of classes for “n” observations (test samples you have), find the smallest number k, such that 2 k >n

Steps to build frequency distribution 2. Define your class: Determine the class interval or width.  Not all classes are represented by distinct values like “Nike” in our shoe example. Instead they may have a continuous range of values. For example, think about a set of classes representing colors. You might have classes like “light blue”, “medium blue” and “dark blue”. How are these blue classes separated? This is referred to as the class interval or width in this case with respect to the color wheel (or color space).  Rule of Thumb: the class interval should be greater or equal than the difference between the highest and the lowest value divided by the number of classes. With no prior information can split a range of 0 to X values between k classes by spans of X/k. Hence:  Class 1 = range 0 to X/k  Class 2 = range X/k to 2X/k  Class 3 = range 2X/k to 3X/k  Class 4 = range 3X/k to 4X/k  Class k = range (k-1)X/k to X

Steps to build frequency distribution (Suggestions but not the rules) 3. Tally the observations (test data) into the appropriate classes. 4. Count the number of tallies (items) in each class. If doing Probability distribution (rather than just frequency), you simply divide the number of tallies of each class by the total number of observations. This gives us an estimated probability for each class

EXAMPLE Professor X wishes prepare to a report showing the number of hours per week students spend studying. She selects a random sample of 30 students and determines the number of hours each student studied last week. Professor X wishes prepare to a report showing the number of hours per week students spend studying. She selects a random sample of 30 students and determines the number of hours each student studied last week. Organize the data into a frequency distribution Organize the data into a frequency distribution 15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, , 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6.

Example Step 1 Determine the number of classes Step 1 Determine the number of classes There are 30 observations so n=30.There are 30 observations so n= =32>302 5 =32>30 Lets start with the assumption of at least 5 classes, i.e., k=5 (p.s. this may not be valid but, is a place to start….there are much better ways than guessing like this and are referred to a the estimation of k or the number of modes in a multi-modal distribution…..this is an area of continued research)Lets start with the assumption of at least 5 classes, i.e., k=5 (p.s. this may not be valid but, is a place to start….there are much better ways than guessing like this and are referred to a the estimation of k or the number of modes in a multi-modal distribution…..this is an area of continued research)

Example continued Step 2 Define Classes First determine the range of our possible class values.First determine the range of our possible class values. Min = 10.3Min = 10.3 Maximum = 33.8Maximum = 33.8 Interval range i:Interval range i: round up interval to 5 round up interval to 5 Set the lower limit of the first class at 10 hours (could make it 10.3)Set the lower limit of the first class at 10 hours (could make it 10.3) Class 1 = range 10 to 15Class 1 = range 10 to 15 Class 2 = range 15 to 20Class 2 = range 15 to 20 Class 3 = range 20 to 25Class 3 = range 20 to 25 Class 4 = range 25 to 30Class 4 = range 25 to 30 Class 5 = range 30 to 35Class 5 = range 30 to 35 AGAIN this like choosing k=5 Is a GUESS and may (will often) be wrong. There are much more informed techniques than this…but, this is simple for you as a beginner to understand.

Example continued Step 3 &4 place and count observations Class 1 = 8 / 30 Class 1 = 8 / 30 Class 2 = 11/ 30 Class 2 = 11/ 30 Class 3 = 7 / 30 Class 3 = 7 / 30 Class 4 = 3 / 30 Class 4 = 3 / 30 Class 5 = 1 / 30 P(Class 1) = P(Class 2) = P(Class 3) = P(Class 4) =.1 P(Class 5) = Class 5 = 1 / 30 P(Class 1) = P(Class 2) = P(Class 3) = P(Class 4) =.1 P(Class 5) = TEST DATA: 15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6.

Normal Distribution If you have an outcome that is continuous and is caused by lots of independent factors it can probably be modeled with a Normal (aka Gaussian) distribution. The Normal distribution looks bell shaped and it is described by two values, a mean and variance. If you have an outcome that is continuous and is caused by lots of independent factors it can probably be modeled with a Normal (aka Gaussian) distribution. The Normal distribution looks bell shaped and it is described by two values, a mean and variance. Using the Normal function is commonly a technique used. Sometimes a Combination of normal functions (called a mixture of normals) is used. However, there are many techniques to automatically figure this out as well as to use other functions to model your distribution. Note –even a single point can be a normal function with 0 variance...even If this a possibly foolish model.

Normal approximation The actual distribution of baby weights at a hospital and approximated by Normal Distribution with mean=3400 grams variance= (standard deviation = 600 grams) The actual distribution of baby weights at a hospital and approximated by Normal Distribution with mean=3400 grams variance= (standard deviation = 600 grams)