Quantification of Digital Forensic Hypotheses Using Probability Theory Richard E Overill & Jantje A M Silomon King’s College London Kam-Pui Chow & Hayson.

Slides:



Advertisements
Similar presentations
PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
Advertisements

SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)
The t-test Inferences about Population Means. Questions What is the main use of the t-test? How is the distribution of t related to the unit normal? When.
Assumptions for Z Confidence Intervals
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Teaching Basic Statistics with R: An Introduction to Interactive Packages Shuen-Lin Jeng National Cheng Kung University.
Power Laws for Cyber Crime Richard Overill & Jantje Silomon Department of Informatics King’s College London.
ICDFI 2013 Keynote Speech 1: Quantifying Likelihood in Digital Forensic Investigations Dr Richard Overill Department of Informatics, King’s College London.
Probability theory 2010 Order statistics  Distribution of order variables (and extremes)  Joint distribution of order variables (and extremes)
Chapter 7 Sampling and Sampling Distributions
Programme in Statistics (Courses and Contents). Elementary Probability and Statistics (I) 3(2+1)Stat. 101 College of Science, Computer Science, Education.
Stat 301 – Day 14 Review. Previously Instead of sampling from a process  Each trick or treater makes a “random” choice of what item to select; Sarah.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Chapter 5. Operations on Multiple R. V.'s 1 Chapter 5. Operations on Multiple Random Variables 0. Introduction 1. Expected Value of a Function of Random.
Computational Learning Theory
1 Random Variables and Probability Distributions: Discrete versus Continuous For this portion of the session, the learning objective is:  Learn that the.
The Central Limit Theorem For simple random samples from any population with finite mean and variance, as n becomes increasingly large, the sampling distribution.
How do you simplify? Simple Complicated.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Sample Of size 2 Of size 3 1 A,B=3,1 2 A,B,C=3,1,5 3 A,C=3,5 4
Claims about a Population Mean when σ is Known Objective: test a claim.
© Copyright McGraw-Hill CHAPTER 6 The Normal Distribution.
A quick intro to Bayesian thinking 104 Frequentist Approach 10/14 Probability of 1 head next: = X Probability of 2 heads next: = 0.51.
Sampling Theory Determining the distribution of Sample statistics.
Richard E Overill & Jantje A M Silomon Department of Informatics, King’s College London K P Chow & Y W Law Department of Computer Science, University of.
Dr Richard Overill Department of Informatics King’s College London Cyber Sleuthing or the Art of the Digital Detective.
Introduction to Statistical Inference Probability & Statistics April 2014.
Additional Properties of the Binomial Distribution
Chapter 8 Hypothesis Testing I. Chapter Outline  An Overview of Hypothesis Testing  The Five-Step Model for Hypothesis Testing  One-Tailed and Two-Tailed.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Continuous Probability Distributions Continuous random variable –Values from interval of numbers –Absence of gaps Continuous probability distribution –Distribution.
© 2003 Prentice-Hall, Inc.Chap 7-1 Basic Business Statistics (9 th Edition) Chapter 7 Sampling Distributions.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Lecture 11 The Central Limit Theorem Math 1107 Introduction to Statistics.
Sampling and sampling distibutions. Sampling from a finite and an infinite population Simple random sample (finite population) – Population size N, sample.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Statistics Workshop Tutorial 5 Sampling Distribution The Central Limit Theorem.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Section 6-5 The Central Limit Theorem. THE CENTRAL LIMIT THEOREM Given: 1.The random variable x has a distribution (which may or may not be normal) with.
CHAPTER 9 Estimation from Sample Data
Chap 7-1 Basic Business Statistics (10 th Edition) Chapter 7 Sampling Distributions.
Randomized Algorithms for Bayesian Hierarchical Clustering
Methodology Solving problems with known distributions 1.
1 Six Sigma Green Belt Introduction to Control Charts Sigma Quality Management.
Chapter 7 Statistical Inference: Estimating a Population Mean.
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
"Classical" Inference. Two simple inference scenarios Question 1: Are we in world A or world B?
An importer of Herbs and Spices claims that average weight of packets of Saffron is 20 grams. However packets are actually filled to an average weight,
1 of 26Visit UMT online at Prentice Hall 2003 Chapter 7, STAT125Basic Business Statistics STATISTICS FOR MANAGERS University of Management.
Copyright© 1998, Triola, Elementary Statistics by Addison Wesley Longman 1 Testing a Claim about a Mean: Large Samples Section 7-3 M A R I O F. T R I O.
Statistical Testing of Random Number Generators Juan Soto 301/
Confidence Intervals and Hypothesis Testing Mark Dancox Public Health Intelligence Course – Day 3.
Mean Field Methods for Computer and Communication Systems Jean-Yves Le Boudec EPFL Network Science Workshop Hong Kong July
Example Random samples of size n =2 are drawn from a finite population that consists of the numbers 2, 4, 6 and 8 without replacement. a-) Calculate the.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Basic Business Statistics (8th Edition)
Elementary Statistics
Introduction to Econometrics
LEARNING Chapter 18b and Chuck Dyer
STA 291 Spring 2008 Lecture 18 Dustin Lueker.
Example: All vehicles made
Factorization & Independence
Factorization & Independence
Fall 2018, COMP 562 Poster Session
STA 291 Summer 2008 Lecture 18 Dustin Lueker.
CS639: Data Management for Data Science
Normal Probability Distributions
Sample Characteristics
Presentation transcript:

Quantification of Digital Forensic Hypotheses Using Probability Theory Richard E Overill & Jantje A M Silomon King’s College London Kam-Pui Chow & Hayson Tse University of Hong Kong

Synopsis Introduction & Background Probabilistic Models Simplifying Assumptions Results & Interpretation Summary & Conclusions Questions & Comments?

Introduction & Background Possession of Child Pornography (CP) is a serious offence in HK, UK and elsewhere Under prosecution, 2 common defences are: – Trojan Horse (when many CP images are recovered) – Inadvertent (when a few CP images are recovered amongst many non-CP images) We used complexity theory to quantify the plausibility of the THD (ICDFI-2012, ICDFI-2013) Here we use probability theory to quantify the plausibility of the Inadvertent Defence (ID)

Probabilistic Models Greedy download – every image on website – the probability distribution is trivially singular. Selective download – a representative sample of images on website – Infinite website: probabilities do not change as download proceeds – use the Binomial Theorem; – Finite website: probabilities change as images are downloaded – use the “Urn/Bag of balls” model.

Simplifying Assumptions Random browsing behaviour. Random distribution of CP images on website. No duplicates in download. Single download session. Single website. Single computer. One individual.

Results & Interpretation 2 actual HK cases: – Case 1: 248/30,000 images were CP (2010); – Case 2: 84/714,430 images were of CP (2013). “worst case” (prosecution) results: “worst-case” probabilitiesFinite ModelInfinite Model Case Case

Case 1 - Probability Distributions Finite ModelInfinite Model

Case 2 - Probability Distributions Finite ModelInfinite Model

Summary & Conclusions Infinite model worst-case results (2.5% & 4.3%) suggest a criminal prosecution is feasible. Finite model worst-case results (3% & 8%) also suggest a criminal prosecution is feasible but are influenced by assumptions of website size. Non-worst-case probabilities fall off rapidly: σ ≈ √μ Simple probability models can be used to quantify the plausibility of the Inadvertent defence (ID) against possession of CP.

Questions & Comments?