Identifying and Correcting Outliers From Paleo Forage Fish Records using a Multivariate Statistical Approach MARS6300 Alex Filardo 2018.

Slides:

Advertisements

Similar presentations

Personal Response System (PRS). Revision session Dr David Field Do not turn your handset on yet!

Advertisements

Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.

Multivariate Methods EPSY 5245 Michael C. Rodriguez.

Quiz 5 Normal Probability Distribution.

1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.

CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.

Chapter 3 Statistical Concepts.

Multivariate Statistical Data Analysis with Its Applications

CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Basic concepts in ordination

Population: a data set representing the entire entity of interest - What is a population? Sample: a data set representing a portion of a population Population.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Chapter 16 The Chi-Square Statistic

Data set Proteins consumption shows the estimates of the average protein consumption from different food sources for the inhabitants of 25 European countries.

§ 5.3 Normal Distributions: Finding Values. Probability and Normal Distributions If a random variable, x, is normally distributed, you can find the probability.

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.

CFA Model Revision Byrne Chapter 4 Brown Chapter 5.

Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.

Methods of Presenting and Interpreting Information Class 9.

Analysis and Interpretation: Exposition of Data

STA 291 Spring 2010 Lecture 19 Dustin Lueker.

Section 7.3 Day 2.

CHAPTER 9 Testing a Claim

L2 Sampling Exercise A possible solution.

Hypothesis Testing Hypothesis testing is an inferential process

Dependent-Samples t-Test

CHAPTER 9 Testing a Claim

Lesson 6 Normal and Skewed Distribution Type one and Type two errors.

Review 1. Describing variables.

Chapter 5 Normal Probability Distributions.

Exploring Microarray data

Warm Up Check your understanding P. 586 (You have 5 minutes to complete) I WILL be collecting these.

Random Variable.

Analyzing Redistribution Matrix with Wavelet

APPROACHES TO QUANTITATIVE DATA ANALYSIS

CS548 Fall 2017 Anomaly Detection

Lesson 6 Normal and Skewed Distribution Type one and Type two errors.

Univariate Statistics

Lesson 6 Normal and Skewed Distribution Type one and Type two errors.

AP Statistics: Chapter 7

Analysis and Interpretation: Exposition of Data

Elementary Statistics

CHAPTER 9 Testing a Claim

Historical Vegetation Analysis

Quantitative Methods in HPELS HPELS 6210

Coral Species distribution and Benthic Cover type He’eia HI

Random Variable.

Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of

EPSY 5245 EPSY 5245 Michael C. Rodriguez

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 9 Testing a Claim

Multivariate Analysis of a Carbonate Chemistry Time-Series Study

Basic Practice of Statistics - 3rd Edition Two-Sample Problems

PCA of Waimea Wave Climate

CA3 Retrieves Coherent Representations from Degraded Input: Direct Evidence for CA3 Pattern Completion and Dentate Gyrus Pattern Separation Joshua P.

Volume 49, Issue 3, Pages (February 2006)

Dataset: Time-depth-recorder (TDR) raw data 1. Date 2

Chapter 7: The Distribution of Sample Means

CHAPTER 9 Testing a Claim

Which Test Should I Use?.

Chapter 13 Additional Topics in Regression Analysis

CHAPTER 9 Testing a Claim

Chapter Nine: Using Statistics to Answer Questions

CHAPTER 9 Testing a Claim

Examining Data.

CHAPTER 9 Testing a Claim

Descriptive Statistics

Chapter 5 Normal Probability Distributions.

CHAPTER 9 Testing a Claim

Presentation transcript:

Identifying and Correcting Outliers From Paleo Forage Fish Records using a Multivariate Statistical Approach MARS6300 Alex Filardo 2018

Fish Debris Data 5 separate matrices One for each species’ (sardine, anchovy, hake) scale count  3 total One for total bone count and one for total vertebrae count  2 total Columns: 4 slabs Rows: 309 Sampling Intervals For the sake of the presentation, I will primarily focus on hake scales but the procedure is the same for the other matrices. Note: this example matrix is from sardine, not hake

Objective: The goal of this project is to use multivariate statistics to identify outliers within fish debris counts. Scales are naturally shed to the sea floor throughout a fishes life. Skeletal debris reaches the sea floor after a predation event. Skeletal debris may be deposited to the sea floor in a predator fecal pellet. Outliers arrive to the sediments if a chunk of fish lands on the sea floor or a fecal pellet that is jam packed with more fish debris than a normal fecal pellet. I hypothesize that the multivariate methods will find more outliers than just looking at outliers by their standard deviation away from the mean.

Cross Correlations Here we look at the cross correlations between hake-and-bone and hake-and-vertebrae You can see that there are potential outliers Based on the comparison of cross correlations, hake appear to be the main contributors of bones however this large outlier could be skewing the results However, hake contribute almost no vertebrae to the sea floor compared to the other two species

Data Analysis - Hake Perform a PCA: Variance/Covariance (Centered) Distance Based Biplot List the Cross-Products Matrix Randomization test- Using time of day with 999 runs Axis 1 is statistically significant

Hake PCA Biplots When you look at the PCA biplots you can see some sampling intervals have a much greater distance from the other points

So I took the PCA scores and identified intervals that were greater than 2 standard deviations away from the mean For hake I found that 13 intervals were larger than 2 SD away from the mean I compared how many outliers this PCA method detects in comparison to an outlier analysis on the raw data with a 2 SD cutoff

Outlier Correction I then corrected the outliers in the raw data by changing any count that was larger than 1 standard deviation away from the interval’s mean to 1 SD away from the mean I then reran the PCA with the outliers corrected using the same PCA settings So I ran this process with will of the matricies

Same Process for the other matrices BONE PCA on raw data Extracted PCA scores Found the outliers within the PCA scores Corrected those outliers to 1 std. deviation from the intervals mean Reran the PCA and generally saw a slight increase in variance explained PCA with the raw data PCA with the corrected data

Comparing Outliers (Hake) I can now compare the outliers detected by the PCA for scales the the outliers for bones and vertebrae to see which bone and vertebrae outliers are likely due to a piece of hake falling to the sea floor So we can see that it is highly likely that interval 535 is a piece of hake (including scales and bones, but not vertebrae) that fell to the sea floor. They both have similar standard deviations It is probable that interval 120 is a piece of hake, however not as highly likely.

Comparing Outliers (Sardine) The sardine outliers indicate that interval 980 was likely a piece of sardine (or fecal pellet) that contained scales, bones, and vertebrae Interval 835, however, was an outlier for scales and vertebrae so this could be a fecal pellet or chunk of fish that only contained scales and vertebrae

Comparing Outliers (Anchovy) For anchovy we see it is likely that interval 1145, 980, and 195 deposited a piece of anchovy that contained scales, bones, and vertebrae Likely in different proportions of scales to bones to vertebrae

What now? To further test this method of finding outliers, I will run the fish debris data through a polar ordination and will chose one of the largest values as one of my endpoints This should allow me to find outliers in a similar fashion, just with a different ordination technique.