Presentation is loading. Please wait.

Presentation is loading. Please wait.

Visual Displays of Data and Basic Descriptive Statistics

Similar presentations


Presentation on theme: "Visual Displays of Data and Basic Descriptive Statistics"— Presentation transcript:

1 Visual Displays of Data and Basic Descriptive Statistics http://www.halcyonmaps.com

2 Where to get information on R : R: http://www.r-project.org/http://www.r-project.org/ Just need the base RStudio: http://rstudio.org/http://rstudio.org/ A great IDE for R Work on all platforms Sometimes slows down performance… CRAN: http://cran.r-project.org/http://cran.r-project.org/ Library repository for R Click on Search on the left of the website to search for package/info on packages

3 Finding our way around R/RStudio Script Window Command Line

4 Basic Input and Output Handy Commands: x <- 4 x <- “text goes in quotes” variables: store information Numeric input Text (character) input :Assignment operator

5 Get help on an R command: If you know the name: ?command name ?plot brings up html on plot command If you don’t know the name: Use Google (my favorite) ??key word Handy Commands:

6 Histograms: Histograms: “bin” a variable and plot frequencies nD Counts Relative Frequencies First Thing: Look at your Data!

7 Histograms

8 Box and Whiskers Plots: 25 th -%tile 1 st -quartile 75 th -%tile 3 rd -quartile median 50 th -%tile range possible outliers possible outliers First Thing: Look at your Data!

9 Note the relationship: Box-and-Whiskers

10 With Outliers: Without Outliers: Box-and-Whiskers

11

12 Stem-and-Leaf Displays Consider a numerical data set x 1, x 2, x 3,…, x n – each x i consists of at least two digits. – an informative visual representation a stem-and- leaf display.

13 Stems Leaves for each stem Stem-and-Leaf Displays

14 Dotplots Each observation is represented by a dot above the corresponding location on a horizontal measurement scale. – When a value occurs more than once, there is a dot for each occurrence – Dots are stacked vertically. A dotplot is useful when: – there is not a large set of data – where there are relatively few distinct values.

15 Dotplots

16 Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median Mode Measures of Location For reference see (available on-line): “The Dynamic Character of Disguised Behaviour for Text-based, Mixed and Stylized Signatures” LA Mohammed, B Found, M Caligiuri and D Rogers J Forensic Sci 56(1),S136-S141 (2011)

17 Histogram Points of Interest Velocity for the first segment of genuine signatures in (soon to be classic) Mohammed et al. study. What is a good summary number? How spread out is the data? (We will talk about this later)

18 Arithmetic sample mean (average): The sum of data divided by number of observations: Measures of Location intuitive formula fancy formula

19 Example from LAM study: Compute the average absolute size of segment 1 for the genuine signature of subject 2: Subj. 2; Gen; Seg. 1Absolute Size (cm) 10.0548 20.2951 30.1026 40.1005 50.2491 60.1287 70.0496 80.2299 90.256 100.0538 Measures of Location

20 Example: More useful: Consider again Absolute Average Velocity for Genuine Signatures across all writers in the LAM study: 92 subjects × 10 measurements/subject = 920 velocity measurements Average Absolute Average Velocity: Measures of Location

21 Follow up question: Is there a difference in the Abs. Avg. Veloc. for Genuine signatures vs. Disguised signatures (DWM and DNM)?? Genuine DWMDNM We will learn how to answer this, but not yet. Measures of Location

22 Sample median: Ordering the n pieces of data from smallest value to largest value, the median is the “middle value”: If n is odd, median is largest data point. If n is even, median is average of and largest data points. Measures of Location

23 Example: Median of Average Absolute Velocity for Genuine Signatures, LAM: Avg Measures of Location

24 Sample mode: Needs careful definition but basically: The data value that occurs the most Avg mode = 9.2541 Med Measures of Location

25 Some trivia: Nice and symmetric: Mean = Median = Mode Mean Modes Measures of Location

26

27 Toss out the largest 5% and smallest 5% of the data

28 Sample variance: (Almost) the average of squared deviations from the sample mean. Measures of Data Spread data point i sample mean there are n data points Standard deviation is The sample average and standard dev. are the most common measures of central tendency and spread Sample average and standard dev have the same units

29 Measures of Data Spread If you have “enough” data, you can fit a smooth probability density function to the histogram

30 Measures of Data Spread ~ 68% ± 1s ~ 95% ± 2s ~ 99% ± 3s Trivia: The famous (standardized) “Bell Curve” Also called “normal” and “Gaussian” Mean = 0 Std Dev = 1 Units are in Std Devs ---

31 Measures of Data Spread

32 Sample range: The difference between the largest and smallest value in the sample Very sensitive to outliers (extreme observations) Percentiles: The p th percentile data value, x, means that p- percent of the data are less than or equal to x. Median = 50 th percentile Measures of Data Spread

33 1 st -%tile 99 th -%tile 1.52003 1.52008 Measures of Data Spread

34

35 Sample relative standard deviation: Ratio of standard dev to the average Also called coefficient of variation Data quality-outliers: Rule of thumb, if : x i > 75 th -%tile +  ×(75 th -%tile - 25 th -%tile) x i < 25 th -%tile +  ×(75 th -%tile - 25 th -%tile) x i outlier for  x i extreme outlier for  Measures of Data Spread


Download ppt "Visual Displays of Data and Basic Descriptive Statistics"

Similar presentations


Ads by Google