CS 589 Information Risk Management 6 February 2007.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Brief introduction on Logistic Regression
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Managerial Decision Modeling with Spreadsheets
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Bayesian statistics 2 More on priors plus model choice.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Binomial Distribution & Bayes’ Theorem. Questions What is a probability? What is the probability of obtaining 2 heads in 4 coin tosses? What is the probability.
Visual Recognition Tutorial
CS 589 Information Risk Management 30 January 2007.
The Rational Decision-Making Process
Bayesian estimation Bayes’s theorem: prior, likelihood, posterior
CS 589 Information Risk Management 23 January 2007.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Results 2 (cont’d) c) Long term observational data on the duration of effective response Observational data on n=50 has EVSI = £867 d) Collect data on.
Presenting: Assaf Tzabari
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Log-linear and logistic models
CHAPTER 6 Statistical Analysis of Experimental Data
Thanks to Nir Friedman, HU
Crash Course on Machine Learning
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
A quick intro to Bayesian thinking 104 Frequentist Approach 10/14 Probability of 1 head next: = X Probability of 2 heads next: = 0.51.
EM and expected complete log-likelihood Mixture of Experts
Statistical Decision Theory
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Theory of Probability Statistics for Business and Economics.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Week 71 Hypothesis Testing Suppose that we want to assess the evidence in the observed data, concerning the hypothesis. There are two approaches to assessing.
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.
Probability and Measure September 2, Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
QM Spring 2002 Business Statistics Probability Distributions.
Two Main Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
1 Francisco José Vázquez Polo [ Miguel Ángel Negrín Hernández [ {fjvpolo or
Bayesian Statistics and Decision Analysis
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Sampling and estimation Petter Mostad
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
PROBABILITY & STATİSTİCS Prof. Dr. Orhan TORKUL Res. Assist. Furkan YENER.
Gibbs Sampling and Hidden Markov Models in the Event Detection Problem By Marc Sobel.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Oliver Schulte Machine Learning 726
Unit 5 – Chapters 10 and 12 What happens if we don’t know the values of population parameters like and ? Can we estimate their values somehow?
Appendix A: Probability Theory
Bayes Net Learning: Bayesian Approaches
More about Posterior Distributions
Discrete Event Simulation - 4
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
PSY 626: Bayesian Statistics for Psychological Science
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

CS 589 Information Risk Management 6 February 2007

Today More Bayesian ideas – Empirical Bayes Your presentations Prior Distributions for selected distribution parameters Updating Priors  Posterior Distribution  Updated Parameter Estimates

References A. R. Solow, “An Empirical Bayes Analysis of Volcanic Eruptions”, Mathematical Geology, 33, Vol.1, J. Geweke, Contemporary Bayesian Economics and Statistics. Wiley, S. L. Scott, “A Bayesian Paradigm for Designing Intrusion Detection Systems”, Computational Statistics and Data Analysis, 45, Vol. 1, 2003.

Why are we doing this? Model risks Model outcomes Use the models in a model of the decision situation to help us rank alternatives Gain deeper understanding of the problem and the context of the problem

Basic Relation The prior distribution in the numerator should be selected with some care. The distribution in the denominator is known as the predictive distribution.

Recall: Why Bayesian Approach? Incorporate prior knowledge into the analysis From Scott – synthesize probabilistic information from many sources Consider the following exercise: P(I) =.01; P(D|I) =.9, P(D not|I not) =.95. An intrusion alarm goes off. What is the probability that it’s really an intrusion?

Priors Prior for a Poisson parameter is Gamma

Gamma Parameters How do we pick them? Expert Data Expert + Data

Recall Our Data Example Go from Data to Gamma Parameters We want to pick parameters that reflect the data We will have to use our judgment to decide on a final prior parametric estimate

Parameterization Ideas Distribution Mean = Data Mean Equate –Cumulative/Frequency Distribution Data –Sum of Distribution Frequency Data and 1 –Sum of Absolute Differences and 0 Pick Criteria that fit best

We can formulate and optimize Pick the best parameters given what we know I used Excel and the Solver add-in Any optimization program will work Canned probability functions are preferred …

Use All the Data Several reasonable possibilities This will matter for updating purposes Use all data for the parameter estimate Use some of the data to estimate the gamma prior – and therefore the Poisson parameter – and the rest to illustrate the idea of updating the prior

Prior Distribution The prior should reflect our degree of certainty, or degree of belief, about the parameter we are estimating One way to deal with this is to consider distribution fractiles Use fractiles to help us develop the distribution that reflects the synthesis of what we know and what we believe

Prior + Information As we collect information, we can update our prior distribution and get a – we hope – more informative posterior distribution Recall what the distribution is for – in this case, a view of our parameter of interest The posterior mean is now the estimate for the Poisson lambda, and can be used in decision- making

Information For our Poisson parameter, information might consist of data similar to what we already collected in our example We update the Gamma, take the mean, and that’s our new estimate for the average occurrences of the event per unit of measurement.

Sum of Absolute Differences Minimized

Updating It’s pretty intuitive Add the number of hourly intrusions to alpha Add the number of hours (that is, the number of hour intervals) to beta Be careful with beta – sometimes it’s written in inverse form, which means we need to add the inverse of the number of hourly units

Back to our Example Use the first 22 observations Update with the remaining 2 What happens to –Our distribution? –Our Poisson parameter estimate? First, let’s get our new Prior

New Prior The first one is a result of minimizing the sum of absolute differences between probability computations and summing computed probabilities to 1 The second is computed without the latter constraint

Updates What can we say about them vis-à-vis –The original gamma estimate from all 24 points –The measures we care about (mean, relative accuracy, etc.) Which one is “better”?

E(Lambda) = 2.79

E(Lambda) = 2.815

Another way to Observe Data In this case, we’ll use the next 12 hours And we’ll update our prior distributions Which one provides more accuracy? How would we know in a more realistic situation?

E(Lambda) = 2.902

So, What’s the Conclusion? Do our updated priors make sense – especially in light of the original data-driven distribution? What can we say about the way in which observed data can impact our posterior distribution and the associated estimate for the Poisson parameter? What else can we conclude?

Another Prior Distribution Of interest in Information Risk – and risk in general – applications is the notion of the probability of a binary outcome –Intrusion/Non-Intrusion –Bad item/non-bad item In this case, we can model the probability of an event happening – or not The number of events of interest in a space of interest could be modeled using a binomial distribution

Example Suppose we know how many intrusion attempts (or any other event) happened in the course of normal operation of our system – and we know how many non-intrusion events happened. So our data would look something like the following slide

Now … We might be interested in the probability that a given input is malicious, bad, etc. How could we do this risk model? The binomial is a clear choice We know n for a given period We need p p seems to vary – what can we do?

A Model for p Develop a prior distribution for p that combines –The data –What we know that might not be in the data Use the expectation of the distribution for E(p) Use E(p) in our preliminary analysis

Another Prior The Prior Distribution model for the binomial p is a beta distribution. Binomial Beta

Beta Prior The predictive distribution is the Beta-Binomial (you can look it up) Like the Gamma prior for the Poisson, this is very easy to update after observing data

Other Estimates Outcomes –These can be in the form of costs, both real and opportunity –Distributions are better than point estimates if we know that we don’t know the future Problem: Expected Value criterion can diminish the importance of our probability modeling efforts for events and outcomes

Outcome Distributions Unlike our discussion to this point, where the variable of interest has been associated with a discrete distribution, outcome distributions may be continuous in nature Normal, Lognormal, Logistic Usually estimating more than one parameter Possibly more complex prior – info – posterior structure

Homework I’m going to send you sample datasets I need team identification – same ones as today? Due at the beginning of class next week Presentation, not paper Also – please be ready to discuss the Scott paper