A REVIEW OF OCCUPANCY PROBLEMS AND THEIR APPLICATIONS WITH A MATLAB DEMO Samuel Khuvis, Undergraduate Nagaraj Neerchal, Professor of Statistics Department of Mathematics and Statistics, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD Abstract Consider an experiment of randomly distributing r balls into n cells. One can conceive several easily described probability problems related to this experiment. Obtaining the probability that no two adjacent cells are empty, finding the distribution of the number of balls occupying a given cell and deriving the distribution of the smallest number of balls over all cells are a few examples of such problems which are collectively referred to as occupancy problems. Solutions to some of these problems are non-trivial and in fact some naturally give rise to well known probability distributions such as binomial and multinomial distributions. Occupancy problems have found important applications in many areas. Distribution of Bose- Einstein and Fermi- Dirac statistics are the most celebrated examples of such applications. More recently, questions from genetics, involving non-randomness of occurrence of mutagen-induced mutations across loci, have also been connected to this general topic. In this poster, we provide a glimpse to the probability calculations underlying occupancy problems, and demonstrate them using an interactive MATLAB program. Examples Applications Fig. 2: These are four realizations generated by the MATLAB demo of an experiment in which 5 balls are thrown into 4 cells. A C D Fig. 1: This is a screenshot of the MATLAB Demo used to visualize the occupancy problems. With six different operations that may be selected to the right. Basic Calculations Concerning the Occupancy Problem | S | = (n) (n)…(n)= In the program, realizations were generated by: For i=1 to number of balls Generate a random number from 1 to the number of cells with each number having a uniform probability of occurring End Conclusion B This has only been a basic introduction to occupancy problems and there are many other calculations that may be done based on the experiment of throwing r balls into n cells. These problems have many applications in the natural sciences, especially in physics. More complex calculations are able to explain the behaviors of elementary particles. Using the simulation method, we can begin to understand the probability distributions which arise from these models. -A binomial distribution describes the distribution of results of an experiment in which: 1. There is a sequence of n trials, where n is fixed in advance 2. Each trial results in one of two possible outcomes, which is denoted as either a success or a failure 3. The trials are independent, so each outcome on any particular trial does not influence the outcome of any other trial 4. The probability of success (p) is constant from trial to trial -Where the probability of x number of successes is Binomial Distributions Binomial Distribution of 1 st Cell T=number of balls in cell 1, where T is a random variable t=0, 1, 2,…, r, where t is all r of the balls being thrown : exactly t of ‘s are 1 and the others are not 1} Number of (T=t)= (n-1) r-t So, T is a binomial distribution such that T ~ Bin(r, ) Multinomial Distribution -A multinomial distribution is similar to a binomial with the exception that instead of having 2 possible outcomes, there are greater than 2 possible outcomes -Let = number of balls in cell 1 and = number of balls in cell 2 -The third outcome is a ball going into a cell other than cell 1 or cell 2 -(, ) ~ multinomial Minimum Calculations Y=minimum number of balls occurring in any cell So, For Y>0 it is non-trivial to calculate the P(Y) without the use of a simulation. Statistical Mechanics -We have r indistinguishable particles subdivided into n small regions, or phase spaces with the particles being randomly distributed into these phase spaces -It would seem that all arrangements are equally possible, however physicists have shown that this is not the case. So, there are two statistics to describe the behavior of particles: -Fermi-Dirac Statistics -Bose-Einstein Statistics -In this realization, no two particles may be in the same cell and all distinguishable arrangements have equal probabilities -This means that r ≤ n, so any of the arrangements can be chosen by randomly selecting which r cells contain a particle. Each arrangement has a probability of and describes the behavior of electrons, protons, and neutrons. -In this realization, each distinguishable arrangement is given a probability of -This has been proven, experimentally, to describe the behavior of photons, nuclei, and atoms that have an even number of elementary particles Population Genetics -Since genetic data is often analyzed through categorical observations, the computation of expected frequencies of different genetic models can be described -These are important in genetics when testing the non- randomness of mutagen-induced mutations across loci. -The occupancy problem is applied to these analyses to combinatorially solve the problem of an inadequate sample size. -In this application, r is the size of the random sample and n is the number of classes being analyzed in the sample Matlab Demo Function 1: Can generate one realization at a time for a certain number of balls and cells. Function 2: Can simulate a large number of realizations and empirically compute probabilities. Function 3: Allows the user to change the number of balls and cells Output 1: One arrangement of 50 balls and 25 cells Output 5: Displays the distribution of the balls in the first cell over 1000 realizations Output 3: Displays the number of balls in each cell over 1,000 realizations Output 4: Displays the minimum number of balls for each of 10,000 realizations if each arrangement has 50 balls and 25 cells selected by the user Output 2: Randomly generates birthdays of50 people Output 6: Shows how many days have a certain number of births in common Acknowledgments: I would like to thank Andrew Raim and all of the members of CIRC for their help. using occupancy problems References Feller, William. An Introduction to Probability Theory and Chakraborty, Ranajit. “A Class Population Genetic Its Applications. New York: John Wiley & Sons, Questions Formulated as the Generalized Occupancy Problem.” Genetics Society of America (1993)