Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang
Outline Why use statistical model Target ▫Gene expression Binomial distribution ▫Poisson distribution Over dispersion Negative binomial ▫Chi-square approximation Conclusion
Statistics model A statistical model is a probability distribution constructed to enable inferences to be drawn or decisions made from data. Population sample Inference Make a decision : Hypothesis testing designer consumer We have to choose a statistics model for sample (mean, variance) We (mean, variance) size
Target Gene expression ▫We like to use statistical model to test an observed difference in read counts is significant. Look like a significant region How about this Can we sure ? Noise or not
Count data A type of data in which the observations can take only the non-negative integer values {0, 1, 2, 3,...}, and where these integers arise from counting rather than ranking. An individual piece of count data is often termed a count variable. Binomial Poisson Negative binomial All of them are this type
Binomial distribution
33 goals 110 shots in this season Success : 0.3 Fail : 0.7 What is the probability if he scored 6 goals in 10 shots
Binomial distribution
Poisson distribution
e = …
Poisson Games goals Goals of game Poisson Raw data
The presence of greater variability (statistical dispersion) in a data set than would be expected based on a given simple statistical model. Overdispersion
Negative binomial
Parameter estimation
Approximate control limits Chi-square approximation
Example = 67.0
Conclusion Thanks for attention
Statistics model Suitable type ▫Which distribution should we use Parameters ▫Get some information from data Inference ▫What do we want to know ▫How could we make a decision Hypothesis testing
Statistics model Suitable type ▫Binomial distribution Parameters ▫n = 10, p = 0.7 Inference ▫2 successes
Multinomial distribution The analog of the Bernoulli distribution is the categorical distribution, where each trial results in exactly one of some fixed finite number k of possible outcomes. ibutionhttp://en.wikipedia.org/wiki/Multinomial_distr ibution
Trinomial distribution
Count data A type of data in which the observations can take only the non-negative integer values {0, 1, 2, 3,...}, and where these integers arise from counting rather than ranking. We tend to use fixed fractions of genes. The probability that reads appeared in this region The number of read counts in this interval (Binomial distribution) (Poisson distribution)
Poisson example
Negative binomial