Identifying and Correcting Outliers From Paleo Forage Fish Records using a Multivariate Statistical Approach MARS6300 Alex Filardo 2018.

Slides:



Advertisements
Similar presentations
Personal Response System (PRS). Revision session Dr David Field Do not turn your handset on yet!
Advertisements

Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Quiz 5 Normal Probability Distribution.
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Chapter 3 Statistical Concepts.
Multivariate Statistical Data Analysis with Its Applications
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Basic concepts in ordination
Population: a data set representing the entire entity of interest - What is a population? Sample: a data set representing a portion of a population Population.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Chapter 16 The Chi-Square Statistic
Data set Proteins consumption shows the estimates of the average protein consumption from different food sources for the inhabitants of 25 European countries.
§ 5.3 Normal Distributions: Finding Values. Probability and Normal Distributions If a random variable, x, is normally distributed, you can find the probability.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
CFA Model Revision Byrne Chapter 4 Brown Chapter 5.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Methods of Presenting and Interpreting Information Class 9.
Analysis and Interpretation: Exposition of Data
STA 291 Spring 2010 Lecture 19 Dustin Lueker.
Section 7.3 Day 2.
CHAPTER 9 Testing a Claim
L2 Sampling Exercise A possible solution.
Hypothesis Testing Hypothesis testing is an inferential process
Dependent-Samples t-Test
CHAPTER 9 Testing a Claim
Lesson 6 Normal and Skewed Distribution Type one and Type two errors.
Review 1. Describing variables.
Chapter 5 Normal Probability Distributions.
Exploring Microarray data
Warm Up Check your understanding P. 586 (You have 5 minutes to complete) I WILL be collecting these.
Random Variable.
Statistics.
Analyzing Redistribution Matrix with Wavelet
APPROACHES TO QUANTITATIVE DATA ANALYSIS
CS548 Fall 2017 Anomaly Detection
Lesson 6 Normal and Skewed Distribution Type one and Type two errors.
Univariate Statistics
Lesson 6 Normal and Skewed Distribution Type one and Type two errors.
AP Statistics: Chapter 7
Analysis and Interpretation: Exposition of Data
Elementary Statistics
CHAPTER 9 Testing a Claim
Historical Vegetation Analysis
Quantitative Methods in HPELS HPELS 6210
Coral Species distribution and Benthic Cover type He’eia HI
Random Variable.
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
EPSY 5245 EPSY 5245 Michael C. Rodriguez
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 9 Testing a Claim
Multivariate Analysis of a Carbonate Chemistry Time-Series Study
Basic Practice of Statistics - 3rd Edition Two-Sample Problems
PCA of Waimea Wave Climate
CA3 Retrieves Coherent Representations from Degraded Input: Direct Evidence for CA3 Pattern Completion and Dentate Gyrus Pattern Separation  Joshua P.
Volume 49, Issue 3, Pages (February 2006)
Dataset: Time-depth-recorder (TDR) raw data 1. Date 2
Chapter 7: The Distribution of Sample Means
CHAPTER 9 Testing a Claim
Which Test Should I Use?.
Chapter 13 Additional Topics in Regression Analysis
CHAPTER 9 Testing a Claim
Chapter Nine: Using Statistics to Answer Questions
CHAPTER 9 Testing a Claim
Examining Data.
CHAPTER 9 Testing a Claim
Descriptive Statistics
Chapter 5 Normal Probability Distributions.
CHAPTER 9 Testing a Claim
Presentation transcript:

Identifying and Correcting Outliers From Paleo Forage Fish Records using a Multivariate Statistical Approach MARS6300 Alex Filardo 2018

Fish Debris Data 5 separate matrices One for each species’ (sardine, anchovy, hake) scale count  3 total One for total bone count and one for total vertebrae count  2 total Columns: 4 slabs Rows: 309 Sampling Intervals For the sake of the presentation, I will primarily focus on hake scales but the procedure is the same for the other matrices. Note: this example matrix is from sardine, not hake

Objective: The goal of this project is to use multivariate statistics to identify outliers within fish debris counts. Scales are naturally shed to the sea floor throughout a fishes life. Skeletal debris reaches the sea floor after a predation event. Skeletal debris may be deposited to the sea floor in a predator fecal pellet. Outliers arrive to the sediments if a chunk of fish lands on the sea floor or a fecal pellet that is jam packed with more fish debris than a normal fecal pellet. I hypothesize that the multivariate methods will find more outliers than just looking at outliers by their standard deviation away from the mean.

Cross Correlations Here we look at the cross correlations between hake-and-bone and hake-and-vertebrae You can see that there are potential outliers Based on the comparison of cross correlations, hake appear to be the main contributors of bones however this large outlier could be skewing the results However, hake contribute almost no vertebrae to the sea floor compared to the other two species

Data Analysis - Hake Perform a PCA: Variance/Covariance (Centered) Distance Based Biplot List the Cross-Products Matrix Randomization test- Using time of day with 999 runs Axis 1 is statistically significant

Hake PCA Biplots When you look at the PCA biplots you can see some sampling intervals have a much greater distance from the other points

So I took the PCA scores and identified intervals that were greater than 2 standard deviations away from the mean For hake I found that 13 intervals were larger than 2 SD away from the mean I compared how many outliers this PCA method detects in comparison to an outlier analysis on the raw data with a 2 SD cutoff

Outlier Correction I then corrected the outliers in the raw data by changing any count that was larger than 1 standard deviation away from the interval’s mean to 1 SD away from the mean I then reran the PCA with the outliers corrected using the same PCA settings So I ran this process with will of the matricies

Same Process for the other matrices BONE PCA on raw data Extracted PCA scores Found the outliers within the PCA scores Corrected those outliers to 1 std. deviation from the intervals mean Reran the PCA and generally saw a slight increase in variance explained PCA with the raw data PCA with the corrected data

Comparing Outliers (Hake) I can now compare the outliers detected by the PCA for scales the the outliers for bones and vertebrae to see which bone and vertebrae outliers are likely due to a piece of hake falling to the sea floor So we can see that it is highly likely that interval 535 is a piece of hake (including scales and bones, but not vertebrae) that fell to the sea floor. They both have similar standard deviations It is probable that interval 120 is a piece of hake, however not as highly likely.

Comparing Outliers (Sardine) The sardine outliers indicate that interval 980 was likely a piece of sardine (or fecal pellet) that contained scales, bones, and vertebrae Interval 835, however, was an outlier for scales and vertebrae so this could be a fecal pellet or chunk of fish that only contained scales and vertebrae

Comparing Outliers (Anchovy) For anchovy we see it is likely that interval 1145, 980, and 195 deposited a piece of anchovy that contained scales, bones, and vertebrae Likely in different proportions of scales to bones to vertebrae

What now? To further test this method of finding outliers, I will run the fish debris data through a polar ordination and will chose one of the largest values as one of my endpoints This should allow me to find outliers in a similar fashion, just with a different ordination technique.