Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median.

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Class Session #2 Numerically Summarizing Data
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Chapter 2 Simple Comparative Experiments
Inferences About Process Quality
Business Research Methods William G. Zikmund Chapter 17: Determination of Sample Size.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing.
Exploring Marketing Research William G. Zikmund
Continuous Probability Distribution  A continuous random variables (RV) has infinitely many possible outcomes  Probability is conveyed for a range of.
Visual Displays of Data and Basic Descriptive Statistics
Statistical Analysis Statistical Analysis
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Statistics for Data Miners: Part I (continued) S.T. Balke.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Statistics 1 Measures of central tendency and measures of spread.
Business Research Methods William G. Zikmund Chapter 17: Determination of Sample Size.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Research Ethics:. Ethics in psychological research: History of Ethics and Research – WWII, Nuremberg, UN, Human and Animal rights Today - Tri-Council.
What does Statistics Mean? Descriptive statistics –Number of people –Trends in employment –Data Inferential statistics –Make an inference about a population.
TYPES OF DATA KEEP THE ACTIVITIES ROLLING Data, Standard Deviation, Statistical Significance.
Introduction to Statistics Santosh Kumar Director (iCISA)
Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape.
Hypothesis Testing. “Not Guilty” In criminal proceedings in U.S. courts the defendant is presumed innocent until proven guilty and the prosecutor must.
Data Analysis.
Describing Data Descriptive Statistics: Central Tendency and Variation.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 6 Putting Statistics to Work.
PCB 3043L - General Ecology Data Analysis.
© Copyright McGraw-Hill 2004
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
CHAPTER 2: Basic Summary Statistics
Organizing and Analyzing Data. Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures.
1 Design and Analysis of Experiments (2) Basic Statistics Kyung-Ho Park.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Summarizing Data Osborn. Given a sample from some population: Measures of Central Tendency For reference see (available on-line): “The Dynamic Character.
Graphing and Summarizing Data
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Measures of Central Tendency
How Psychologists Ask and Answer Questions Statistics Unit 2 – pg
PCB 3043L - General Ecology Data Analysis.
Chapter 2 Simple Comparative Experiments
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Descriptive Statistics
Mean, Median, Mode The Mean is the simple average of the data values. Most appropriate for symmetric data. The Median is the middle value. It’s best.
Experimental Design Data Normal Distribution
CHAPTER 2: Basic Summary Statistics
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introductory Statistics
Presentation transcript:

Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median Mode Measures of Location For reference see (available on-line): “The Dynamic Character of Disguised Behaviour for Text-based, Mixed and Stylized Signatures” LA Mohammed, B Found, M Caligiuri and D Rogers J Forensic Sci 56(1),S136-S141 (2011)

Histogram Points of Interest Velocity for the first segment of genuine signatures in (soon to be classic) Mohammed et al. study. What is a good summary number? How spread out is the data? (We will talk about this later)

Arithmetic sample mean (average): The sum of data divided by number of observations: Measures of Location intuitive formula fancy formula

Example from LAM study: Compute the average absolute size of segment 1 for the genuine signature of subject 2: Subj. 2; Gen; Seg. 1Absolute Size (cm) Measures of Location

Example: More useful: Consider again Absolute Average Velocity for Genuine Signatures across all writers in the LAM study: 92 subjects × 10 measurements/subject = 920 velocity measurements Average Absolute Average Velocity: Measures of Location

Follow up question: Is there a difference in the Abs. Avg. Veloc. for Genuine signatures vs. Disguised signatures (DWM and DNM)?? Genuine DWMDNM We will learn how to answer this, but not yet. Measures of Location

Sample median: Ordering the n pieces of data from smallest value to largest value, the median is the “middle value”: If n is odd, median is largest data point. If n is even, median is average of and largest data points. Measures of Location

Example: Median of Average Absolute Velocity for Genuine Signatures, LAM: Avg Measures of Location

Sample mode: Needs careful definition but basically: The data value that occurs the most Avg mode = Med Measures of Location

Some trivia: Nice and symmetric: Mean = Median = Mode Mean Modes Measures of Location

Toss out the largest 5% and smallest 5% of the data

Sample variance: (Almost) the average of squared deviations from the sample mean. Measures of Data Spread data point i sample mean there are n data points Standard deviation is The sample average and standard dev. are the most common measures of central tendency and spread Sample average and standard dev have the same units

Measures of Data Spread If you have “enough” data, you can fit a smooth probability density function to the histogram

Measures of Data Spread ~ 68% ± 1s ~ 95% ± 2s ~ 99% ± 3s Trivia: The famous (standardized) “Bell Curve” Also called “normal” and “Gaussian” Mean = 0 Std Dev = 1 Units are in Std Devs ---

Measures of Data Spread

Sample range: The difference between the largest and smallest value in the sample Very sensitive to outliers (extreme observations) Percentiles: The p th percentile data value, x, means that p- percent of the data are less than or equal to x. Median = 50 th percentile Measures of Data Spread

1 st -%tile 99 th -%tile Measures of Data Spread

Confidence Intervals A confidence interval (CI) gives a range in which a true population parameter may be found. Specifically, (1-α)×100% CIs for a parameter, constructed from a random sample (of a given sample size), will contain the true value of the parameter approximately (1-α)×100% of the time. α is called the “level of significance” Different from tolerance and prediction intervals

Confidence Intervals Caution: IT IS NOT CORRECT to say that there a (1-  α)×100% probability that the true value of a parameter is between the bounds of any given CI. true value of parameter Here 90% of the CIs contain the true value of the parameter Graphical representation of 90% CIs is for a parameter: Take a sample. Compute a CI.

Construction of a CI for a mean depends on: Sample size n Standard error for means Level of confidence 1-α α is significance level Use α to compute t c -value (1-α)×100% CI for population mean using a sample average and standard error is: Confidence Intervals

Compute a 99% confidence interval for the mean using this sample set: Confidence Intervals Fragment #Fragment nD Putting this together: [ (3.17)( ), (3.17)( )] 99% CI for sample = [ , ]

Confidence Intervals

Hypothesis Testing A hypothesis is an assumption about a statistic. Form a hypothesis about the statistic H 0, the null hypothesis Identify the alternative hypothesis, H a “Accept” H 0 or “Reject” H 0 in favour of H a at a certain confidence level (1-α)×100% Technically, “Accept” means “Do not Reject” The testing is done with respect to how sample values of the statistic are distributed Student’s-t Gaussian Binomial Poisson Bootstrap, etc.

Hypothesis Testing Hypothesis testing can go wrong: 1-β  is called test’s power Do the thicknesses of float glass differ from non float glass? How can we use a computer to decide? H 0 is really trueH 0 is really false Test rejects H 0 Type I error. Probability is α OK Test accepts H 0 OKType II error. Probability is β

Importing External Data From a Spread Sheet Use R function read.csv : Import (fake) float glass thickness data in file glass_thickness_simulated.csv : read.csv(“/Path/to/your/data/glass_thickness_simulated.csv", header=T)

Hypothesis Testing

Analysis of Variance Standard hypothesis testing is great for comparing two statistics. What is we have more than two statistics to compare? Use analysis of variance (ANOVA) Note that the statistics to be compares must all be of the same type Usually the statistic is an average “response” for different experimental conditions or treatments.

Analysis of Variance H 0 for ANOVA The values being compared are not statistically different at the (1-  )×100% level of confidence H a for ANOVA At least one of the values being compared is statically distinct. ANOVA computes an F-statistic from the data and compares to a critical F c value for Level of confidence D.O.F. 1 = # of levels -1 D.O.F. 2 = # of obs. - # of levels

Analysis of Variance Levels are “categorical variables” and can be: Group names Experimental conditions Experimental treatments

Analysis of Variance