Presentation is loading. Please wait.

Presentation is loading. Please wait.

Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median.

Similar presentations


Presentation on theme: "Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median."— Presentation transcript:

1 Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median Mode Measures of Location For reference see (available on-line): “The Dynamic Character of Disguised Behaviour for Text-based, Mixed and Stylized Signatures” LA Mohammed, B Found, M Caligiuri and D Rogers J Forensic Sci 56(1),S136-S141 (2011)

2 Histogram Points of Interest Velocity for the first segment of genuine signatures in (soon to be classic) Mohammed et al. study. What is a good summary number? How spread out is the data? (We will talk about this later)

3 Arithmetic sample mean (average): The sum of data divided by number of observations: Measures of Location intuitive formula fancy formula

4 Example from LAM study: Compute the average absolute size of segment 1 for the genuine signature of subject 2: Subj. 2; Gen; Seg. 1Absolute Size (cm) 10.0548 20.2951 30.1026 40.1005 50.2491 60.1287 70.0496 80.2299 90.256 100.0538 Measures of Location

5 Example: More useful: Consider again Absolute Average Velocity for Genuine Signatures across all writers in the LAM study: 92 subjects × 10 measurements/subject = 920 velocity measurements Average Absolute Average Velocity: Measures of Location

6 Follow up question: Is there a difference in the Abs. Avg. Veloc. for Genuine signatures vs. Disguised signatures (DWM and DNM)?? Genuine DWMDNM We will learn how to answer this, but not yet. Measures of Location

7 Sample median: Ordering the n pieces of data from smallest value to largest value, the median is the “middle value”: If n is odd, median is largest data point. If n is even, median is average of and largest data points. Measures of Location

8 Example: Median of Average Absolute Velocity for Genuine Signatures, LAM: Avg Measures of Location

9 Sample mode: Needs careful definition but basically: The data value that occurs the most Avg mode = 9.2541 Med Measures of Location

10 Some trivia: Nice and symmetric: Mean = Median = Mode Mean Modes Measures of Location

11

12 Toss out the largest 5% and smallest 5% of the data

13 Sample variance: (Almost) the average of squared deviations from the sample mean. Measures of Data Spread data point i sample mean there are n data points Standard deviation is The sample average and standard dev. are the most common measures of central tendency and spread Sample average and standard dev have the same units

14 Measures of Data Spread If you have “enough” data, you can fit a smooth probability density function to the histogram

15 Measures of Data Spread ~ 68% ± 1s ~ 95% ± 2s ~ 99% ± 3s Trivia: The famous (standardized) “Bell Curve” Also called “normal” and “Gaussian” Mean = 0 Std Dev = 1 Units are in Std Devs ---

16 Measures of Data Spread

17 Sample range: The difference between the largest and smallest value in the sample Very sensitive to outliers (extreme observations) Percentiles: The p th percentile data value, x, means that p- percent of the data are less than or equal to x. Median = 50 th percentile Measures of Data Spread

18 1 st -%tile 99 th -%tile 1.52003 1.52008 Measures of Data Spread

19

20 Confidence Intervals A confidence interval (CI) gives a range in which a true population parameter may be found. Specifically, (1-α)×100% CIs for a parameter, constructed from a random sample (of a given sample size), will contain the true value of the parameter approximately (1-α)×100% of the time. α is called the “level of significance” Different from tolerance and prediction intervals

21 Confidence Intervals Caution: IT IS NOT CORRECT to say that there a (1-  α)×100% probability that the true value of a parameter is between the bounds of any given CI. true value of parameter Here 90% of the CIs contain the true value of the parameter Graphical representation of 90% CIs is for a parameter: Take a sample. Compute a CI.

22 Construction of a CI for a mean depends on: Sample size n Standard error for means Level of confidence 1-α α is significance level Use α to compute t c -value (1-α)×100% CI for population mean using a sample average and standard error is: Confidence Intervals

23 Compute a 99% confidence interval for the mean using this sample set: Confidence Intervals Fragment #Fragment nD 11.52005 21.52003 31.52001 41.52004 51.52000 61.52001 71.52008 81.52011 91.52008 101.52008 111.52008 Putting this together: [1.52005 - (3.17)(0.00001), 1.52005 + (3.17)(0.00001)] 99% CI for sample = [1.52002, 1.52009]

24 Confidence Intervals

25 Hypothesis Testing A hypothesis is an assumption about a statistic. Form a hypothesis about the statistic H 0, the null hypothesis Identify the alternative hypothesis, H a “Accept” H 0 or “Reject” H 0 in favour of H a at a certain confidence level (1-α)×100% Technically, “Accept” means “Do not Reject” The testing is done with respect to how sample values of the statistic are distributed Student’s-t Gaussian Binomial Poisson Bootstrap, etc.

26 Hypothesis Testing Hypothesis testing can go wrong: 1-β  is called test’s power Do the thicknesses of float glass differ from non float glass? How can we use a computer to decide? H 0 is really trueH 0 is really false Test rejects H 0 Type I error. Probability is α OK Test accepts H 0 OKType II error. Probability is β

27 Importing External Data From a Spread Sheet Use R function read.csv : Import (fake) float glass thickness data in file glass_thickness_simulated.csv : read.csv(“/Path/to/your/data/glass_thickness_simulated.csv", header=T)

28 Hypothesis Testing

29 Analysis of Variance Standard hypothesis testing is great for comparing two statistics. What is we have more than two statistics to compare? Use analysis of variance (ANOVA) Note that the statistics to be compares must all be of the same type Usually the statistic is an average “response” for different experimental conditions or treatments.

30 Analysis of Variance H 0 for ANOVA The values being compared are not statistically different at the (1-  )×100% level of confidence H a for ANOVA At least one of the values being compared is statically distinct. ANOVA computes an F-statistic from the data and compares to a critical F c value for Level of confidence D.O.F. 1 = # of levels -1 D.O.F. 2 = # of obs. - # of levels

31 Analysis of Variance Levels are “categorical variables” and can be: Group names Experimental conditions Experimental treatments

32 Analysis of Variance


Download ppt "Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median."

Similar presentations


Ads by Google