Summarizing Data Osborn. Given a sample from some population: Measures of Central Tendency For reference see (available on-line): “The Dynamic Character.

Slides:



Advertisements
Similar presentations
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Advertisements

Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Descriptive Measures MARE 250 Dr. Jason Turner.
Lecture (3) Description of Central Tendency. Hydrological Records.
Measures of Dispersion
Descriptive Statistics
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Intro to Descriptive Statistics
Biostatistics Unit 2 Descriptive Biostatistics 1.
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Edpsy 511 Homework 1: Due 2/6.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Coefficient of Variation
Describing Data: Numerical
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Visual Displays of Data and Basic Descriptive Statistics
Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Summary statistics Using a single value to summarize some characteristic of a dataset. For example, the arithmetic mean (or average) is a summary statistic.
Descriptive Statistics Anwar Ahmad. Central Tendency- Measure of location Measures descriptive of a typical or representative value in a group of observations.
Methods for Describing Sets of Data
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
Descriptive Statistics Roger L. Brown, Ph.D. Medical Research Consulting Middleton, WI Online Course #1.
1 1 Slide © 2003 Thomson/South-Western. 2 2 Slide © 2003 Thomson/South-Western Chapter 3 Descriptive Statistics: Numerical Methods Part A n Measures of.
1 1 Slide Descriptive Statistics: Numerical Measures Location and Variability Chapter 3 BA 201.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
Measures of Dispersion How far the data is spread out.
INVESTIGATION 1.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Introduction to Statistics Santosh Kumar Director (iCISA)
Chapter 3, Part A Descriptive Statistics: Numerical Measures n Measures of Location n Measures of Variability.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Describing Data Descriptive Statistics: Central Tendency and Variation.
Unit 3: Averages and Variations Week 6 Ms. Sanchez.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
LIS 570 Summarising and presenting data - Univariate analysis.
Describing Samples Based on Chapter 3 of Gotelli & Ellison (2004) and Chapter 4 of D. Heath (1995). An Introduction to Experimental Design and Statistics.
CHAPTER 2: Basic Summary Statistics
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
© 1999 Prentice-Hall, Inc. Chap Measures of Central Location Mean, Median, Mode Measures of Variation Range, Variance and Standard Deviation Measures.
Graphing and Summarizing Data
Methods for Describing Sets of Data
Statistical Methods Michael J. Watts
Statistics in Management
Statistical Methods Michael J. Watts
Measures of Central Tendency
Warm Up What is the mean, median, mode and outlier of the following data: 16, 19, 21, 18, 18, 54, 20, 22, 23, 17 Mean: 22.8 Median: 19.5 Mode: 18 Outlier:
Descriptive Statistics
Description of Data (Summary and Variability measures)
Descriptive Statistics
Statistics: The Interpretation of Data
St. Edward’s University
BUSINESS MATHEMATICS & STATISTICS.
CHAPTER 2: Basic Summary Statistics
Numerical Descriptive Measures
Chapter 4  DESCRIPTIVE STATISTICS: MEASURES OF CENTRAL TENDENCY AND VARIABILITY Understanding Statistics for International Social Work and Other Behavioral.
Presentation transcript:

Summarizing Data Osborn

Given a sample from some population: Measures of Central Tendency For reference see (available on-line): “The Dynamic Character of Disguised Behaviour for Text-based, Mixed and Stylized Signatures” LA Mohammed, B Found, M Caligiuri and D Rogers J Forensic Sci 56(1),S136-S141 (2011) What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median Mode

Histogram Points of Interest Velocity for the first segment of genuine signatures in (soon to be classic) Mohammed et al. study. What is a good summary number? “Central Tendency” How spread out is the data?

Arithmetic sample mean (average): The sum of data divided by number of observations: Measures of Central Tendency intuitive formula fancy formula

Example from L.A.M. study: Compute the average absolute size of segment 1 for the genuine signature of subject 2: Measures of Central Tendency Subj. 2; Gen; Seg. 1Absolute Size (cm)

Example: More useful: Consider again Absolute Average Velocity for Genuine Signatures across all writers in the LAM study: Measures of Central Tendency 92 subjects × 10 measurements/subject = 920 velocity measurements Average Absolute Average Velocity:

Sample median: Ordering the n pieces of data from smallest value to largest value, the median is the “middle value”: If n is odd, median is largest data point. If n is even, median is average of and largest data points. Measures of Central Tendency

Example from L.A.M. study: Compute the median absolute size of segment 1 for the genuine signature of subject 2: Measures of Central Tendency Subj. 2; Gen; Seg. 1Absolute Size (cm) Ordered

Example: Median of Average Absolute Velocity for Genuine Signatures, LAM: Measures of Central Tendency Avg

Sample mode: Needs careful definition but basically: The data value that occurs the most Measures of Central Tendency Tabulate the data and see which value(s) occur the most: Sample: mode

Sample mode: Measures of Central Tendency Computing modes can get tricky if there are more than one (multi- modal) Sample: modes…

Sample mode: Measures of Central Tendency What’s the mode here? Sample:

Sample mode: Mode of Average Absolute Velocity for Genuine Signatures, LAM: Measures of Central Tendency Avg mode = Med

Measures of Central Tendency Some trivia: Nice and symmetric: Mean = Median = Mode Mean Modes

Sample variance: (Almost) the average of squared deviations from the sample mean. Measures of Data Spread data point i sample mean there are n data points Standard deviation is The sample average and standard dev. are the most common measures of central tendency and spread Sample average and standard dev have the same units

Measures of Data Spread Standard deviation is “instructive” to do by hand a few times: Compute the standard deviation of the following blood alcohol volumes assayed in 10 samples of 10  L of blood drawn from a drunk driving suspect: 7.97 nL, 7.80 nL, 7.79 nL, 8.12 nL, 8.12 nL, 8.22 nL, 8.03 nL, 7.97 nL, 7.88 nL, 8.08 nL

Uncertainty Current national effort to standardize procedures, quantification of uncertainty and conclusions used in Forensic Science and Digital Forensics Efforts WILL EFFECT YOU Two major bodies currently writing draft policy: The National Commission on Forensic Science (NCFS) National Institute of Standards and Technology, Organization of Scientific Area Committees for Forensic Sciences (OSAC) So get to know whatever “uncertainty” is…

Uncertainty The “Guide to the expression of uncertainty in measurement” (GUM) is a document developed by the Joint Committee for Guides in Metrology (JCGM) and published by the International Standards Organization (ISO): Describes a generally accepted set of rules and methods to evaluate uncertainty in measurement. Uncertainty is defined as a parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand (JCGM 100:2008, sect 2.2.3) NOTE 2 Uncertainty of measurement comprises, in general, many components. Some of these components may be evaluated from the statistical distribution of the results of series of measurements and can be characterized by experimental standard deviations. The other components, which also can be characterized by standard deviations, are evaluated from assumed probability distributions based on experience or other information. very frequentist… very Bayesian…

Sample range: The difference between the largest and smallest value in the sample Very sensitive to outliers (extreme observations) Percentiles: The p th percentile data value, x, means that p- percent of the data are smaller than or equal to x. Median = 50 th percentile Measures of Data Spread

What is the sample range of deoxypyridinoline conc? Measures of Data Spread

1 st -%tile 99 th -%tile Measures of Data Spread First 1% of the data is between here First 99% of the data is between here RI

Box-and-whisker plot again for reference Deoxypyridinoline conc? Measures of Data Spread th -%tile 1 st -quartile 75 th -%tile 3 rd -quartile median 50 th -%tile range

Sample relative standard deviation: Ratio of standard dev to the average Also called coefficient of variation Data quality-outliers: Rule of thumb, if : x i > 75 th -%tile +  ×(75 th -%tile - 25 th -%tile) x i < 25 th -%tile +  ×(75 th -%tile - 25 th -%tile) x i outlier for  x i extreme outlier for  Measures of Data Spread

Deoxypyridinoline conc. %RSD? Which data might be outliers? Measures of Data Spread