Presentation is loading. Please wait.

Presentation is loading. Please wait.

OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM.

Similar presentations


Presentation on theme: "OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM."— Presentation transcript:

1 OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

2 Median

3 Measures of Central Tendency Central Tendency AverageMedianMode Geometric Mean

4 Mean (Arithmetic Mean) Mean (arithmetic mean) of data values –Sample mean –Population mean Sample Size Population Size

5 Mean (Arithmetic Mean) The most common measure of central tendency Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 100 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 5Mean = 6 Excel function: =average(range)

6 Median Robust measure of central tendency Not affected by extreme values In an ordered array, the median is the “middle” number 0 1 2 3 4 5 6 7 8 9 100 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 Excel function: =median(range)

7 Measures of Variation Variation VarianceStandard DeviationCoefficient of Variation Population Variance Sample Variance Population Standard Deviation Sample Standard Deviation Range Interquartile Range

8 Example

9

10 Range Measure of variation Difference between the largest and the smallest observations: Ignores the way in which data are distributed 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5

11 Quartiles Split Ordered Data into 4 Quarters = Median, A Measure of Central Tendency 25% Excel function: =quartile(range, number) =0: minimum value =1: Q 1 … =4: maximum value

12 Measure of spread/dispersion Also known as midspread –Spread in the middle 50% Difference between the first and third quartiles Not affected by extreme values Interquartile Range

13 Important measure of variation Shows variation about the mean –Sample variance: “Average of squared deviations from the mean” “Standard deviation” = square root of variance Variance

14 Excel functions Variance =VAR(range) Standard Deviation =STDEV(range)

15 Comparing Standard Deviations Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s =.9258 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Data C

16 Coefficient of Variation Measures relative variation Always in percentage (%) Shows variation relative to mean Is used to compare two or more sets of data measured in different units

17 Comparing Coefficient of Variation Stock A: –Average price last year = $50 –Standard deviation = $5 Stock B: –Average price last year = $100 –Standard deviation = $5 Coefficient of variation: –Stock A: –Stock B:

18 Exploratory Data Analysis Box-and-whisker plot –Graphical display of data using 5-number summary Median( ) 4 6 8 10 12 X largest X smallest

19 Coefficient of Correlation Measures the strength of the linear relationship between two quantitative variables

20 Features of Correlation Coefficient Unit free Ranges between –1 and 1 The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship

21 Scatter Plots of Data with Various Correlation Coefficients Y X Y X Y X Y X Y X r = -1 r = -.6r = 0 r =.6 r = 1

22 Producing Data Sampling methods Survey Errors

23 Probability Sampling Subjects of the sample are chosen based on known probabilities Probability Samples Simple Random SystematicStratifiedCluster

24 Simple Random Samples Every individual or item from the frame has an equal chance of being selected Selection may be with replacement or without replacement Samples obtained from table of random numbers or computer random number generators

25 Random Samples

26 Decide on sample size: n Divide frame of N individuals into groups of k individuals: k=N/n Randomly select one individual from the 1 st group Select every k-th individual thereafter Systematic Samples N = 64 n = 8 k = 8 First Group

27 Stratified Samples Population divided into two or more groups according to some common characteristic Simple random sample selected from each group The two or more samples are combined into one

28 Advantages and Disadvantages Simple random sample and systematic sample –Simple to use –May not be a good representation of the population’s underlying characteristics Stratified sample –Ensures representation of individuals across the entire population Cluster sample –More cost effective –Less efficient (need larger sample to acquire the same level of precision)

29 Key Definitions A population (universe) is the collection of things under consideration A sample is a portion of the frame selected for analysis A parameter is a summary measure computed to describe a characteristic of the population A statistic is a summary measure computed to describe a characteristic of the sample

30 Population and Sample PopulationSample Use parameters to summarize features Use statistics to summarize features Inference on the population from the sample

31 Reasons for Drawing a Sample Less time consuming than a census Less costly to administer than a census Less cumbersome and more practical to administer than a census of the targeted population

32 Evaluating Survey Worthiness What is the purpose of the survey? Is the survey based on a probability sample? Coverage error – appropriate frame Nonresponse error – follow up Measurement error – good questions elicit good responses Sampling error – always exists when sample ≠ population

33 Types of Survey Errors Coverage error Non response error Sampling error Measurement error Excluded from frame. Follow up on non responses. Chance differences from sample to sample. Bad Question!

34 Measurement Errors Question Phrasing Avoid negations Telescoping Effect “Halo” Effect Overzealous/Underzealous

35 Probability

36 Probability is the numerical measure of the likelihood that an event will occur Value is between 0 and 1 Sum of the probabilities of all mutually exclusive and collective exhaustive events is 1 Certain Impossible.5 1 0

37 (There are 2 ways to get one 6 and the other 4) e.g. P ( ) = 2/36 Computing Probabilities The probability of an event E: Each of the outcomes in the sample space is equally likely to occur

38 Empirical Probability Example: Find the probability that a randomly selected person will be struck by lightning this year. The sample space consists of two simple events: the person is struck by lightning or is not. Because these simple events are not equally likely, we can use the relative frequency approximation (Rule 1) or subjectively estimate the probability (Rule 3). Using Rule 1, we can research past events to determine that in a recent year 377 people were struck by lightning in the US, which has a population of about 274,037,295. Therefore, P(struck by lightning in a year) = 377 / 274,037,295 = 1/727,000

39 Computing Joint Probability The probability of a joint event, A and B:

40 Computing Compound Probability Probability of a compound event, A or B:

41 Compound Probability (Addition Rule) P(A or B ) = P(A) + P(B) - P(A and B) For Mutually Exclusive Events: P(A or B) = P(A) + P(B) P(A and B) P(A) P(B)

42 Computing Conditional Probability The probability of event A given that event B has occurred:

43 Conditional Probability AmericanInt’lTotal Men0.250.150.40 Women0.450.150.60 Total0.700.30 Q: What is the probability that a randomly selected student is American, knowing that the student is female?

44 Conditional Probability and Joint Probability Conditional probability: Multiplication rule for joint probability:

45 Conditional Probability and Statistical Independence Events A and B are independent if Events A and B are independent when the probability of one event, A, is not affected by another event, B (continued)

46 Example A company has two suppliers A and B. Rush orders are placed to both. If no raw material arrives in 4 days, the process shuts down. –A can deliver within 4 days with 55% probability. –B can deliver within 4 days with 35% probability. 1.What is the probability that A and B deliver within 4 days? 2.What is the probability the process shuts down? 3.What is the probability at least one delivers in 4 days?

47 Stock Trader’s Almanac 1998 stock trader’s almanac has 48 years of data (1950-1997) Stocks up in January: 31 times Stocks up in year: 36 times Stocks up in January AND year: 29 times

48 Binomial Probability Distribution ‘n’ identical trials –e.g.: 15 tosses of a coin; ten light bulbs taken from a warehouse Two mutually exclusive outcomes on each trials –e.g.: Head or tail in each toss of a coin; defective or not defective light bulb Trials are independent –The outcome of one trial does not affect the outcome of the other Constant probability for each trial –e.g.: Probability of getting a tail is the same each time we toss the coin

49 Excel’s Binomial Function =BINOMDIST(no. of successes, no. of trials, prob. of success, cumulative?) Example =BINOMDIST(2,8,0.5, FALSE)(=0.11) “Probability of tossing (exactly) two heads within 8 trials” =BINOMDIST(2,8,0.5, TRUE)(=0.14) “Probability of tossing two heads or less within 8 trials”

50 Binomial Setting Examples Number of times newspaper arrives on time (i.e., before 7:30 AM) in a week/month Number of times I roll “5” on a die in 20 rolls Number of times I toss heads within 20 trials Students pick random number between 1 and 10. Number of students who picked “7” Number of people who will vote “Republican” in a group of 20 Number of left-handed people in a group of 40

51 Service Center Staffing

52 Poisson Distribution Poisson Process: –Discrete events in an “interval” The probability of One Success in an interval is stable The probability of More than One Success in this interval is 0 –The probability of success is independent from interval to interval –e.g.: number of customers arriving in 15 minutes –e.g.: number of defects per case of light bulbs PXx x x (| !  e -

53 Excel’s Poisson Function =POISSON(no. of occurences, mean, cumulative?) Example =POISSON(5,2,FALSE)(=0.036) “Probability that (exactly) five customers arrive wihtin an hour when the overall average is two” =POISSON(5,2,TRUE)(=0.983) “Probability that five or less customers arrive wihtin an hour when the overall average is two”

54 Poisson Setting Examples Number of accidents at an intersection in 6 months Number of people entering a bank in a 30- minute interval Number of kids ringing the doorbell in 30 minutes for Halloween Number of times a Microsoft machine crashes within 24 hours Number of sewing flaws per (100) garment(s)

55 Halloween


Download ppt "OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM."

Similar presentations


Ads by Google