Download presentation
Presentation is loading. Please wait.
Published byScot Hart Modified over 6 years ago
1
Identifying and Correcting Outliers From Paleo Forage Fish Records using a Multivariate Statistical Approach MARS6300 Alex Filardo 2018
2
Fish Debris Data 5 separate matrices
One for each species’ (sardine, anchovy, hake) scale count 3 total One for total bone count and one for total vertebrae count 2 total Columns: 4 slabs Rows: 309 Sampling Intervals For the sake of the presentation, I will primarily focus on hake scales but the procedure is the same for the other matrices. Note: this example matrix is from sardine, not hake
3
Objective: The goal of this project is to use multivariate statistics to identify outliers within fish debris counts. Scales are naturally shed to the sea floor throughout a fishes life. Skeletal debris reaches the sea floor after a predation event. Skeletal debris may be deposited to the sea floor in a predator fecal pellet. Outliers arrive to the sediments if a chunk of fish lands on the sea floor or a fecal pellet that is jam packed with more fish debris than a normal fecal pellet. I hypothesize that the multivariate methods will find more outliers than just looking at outliers by their standard deviation away from the mean.
4
Cross Correlations Here we look at the cross correlations between hake-and-bone and hake-and-vertebrae You can see that there are potential outliers Based on the comparison of cross correlations, hake appear to be the main contributors of bones however this large outlier could be skewing the results However, hake contribute almost no vertebrae to the sea floor compared to the other two species
5
Data Analysis - Hake Perform a PCA: Variance/Covariance (Centered)
Distance Based Biplot List the Cross-Products Matrix Randomization test- Using time of day with 999 runs Axis 1 is statistically significant
6
Hake PCA Biplots When you look at the PCA biplots you can see some sampling intervals have a much greater distance from the other points
7
So I took the PCA scores and identified intervals that were greater than 2 standard deviations away from the mean For hake I found that 13 intervals were larger than 2 SD away from the mean I compared how many outliers this PCA method detects in comparison to an outlier analysis on the raw data with a 2 SD cutoff
8
Outlier Correction I then corrected the outliers in the raw data by changing any count that was larger than 1 standard deviation away from the interval’s mean to 1 SD away from the mean I then reran the PCA with the outliers corrected using the same PCA settings So I ran this process with will of the matricies
9
Same Process for the other matrices
BONE PCA on raw data Extracted PCA scores Found the outliers within the PCA scores Corrected those outliers to 1 std. deviation from the intervals mean Reran the PCA and generally saw a slight increase in variance explained PCA with the raw data PCA with the corrected data
10
Comparing Outliers (Hake)
I can now compare the outliers detected by the PCA for scales the the outliers for bones and vertebrae to see which bone and vertebrae outliers are likely due to a piece of hake falling to the sea floor So we can see that it is highly likely that interval 535 is a piece of hake (including scales and bones, but not vertebrae) that fell to the sea floor. They both have similar standard deviations It is probable that interval 120 is a piece of hake, however not as highly likely.
11
Comparing Outliers (Sardine)
The sardine outliers indicate that interval 980 was likely a piece of sardine (or fecal pellet) that contained scales, bones, and vertebrae Interval 835, however, was an outlier for scales and vertebrae so this could be a fecal pellet or chunk of fish that only contained scales and vertebrae
12
Comparing Outliers (Anchovy)
For anchovy we see it is likely that interval 1145, 980, and 195 deposited a piece of anchovy that contained scales, bones, and vertebrae Likely in different proportions of scales to bones to vertebrae
13
What now? To further test this method of finding outliers, I will run the fish debris data through a polar ordination and will chose one of the largest values as one of my endpoints This should allow me to find outliers in a similar fashion, just with a different ordination technique.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.