Exploratory Data Analysis

Slides:



Advertisements
Similar presentations
Very simple to create with each dot representing a data value. Best for non continuous data but can be made for and quantitative data 2004 US Womens Soccer.
Advertisements

Multiple Regression Analysis of Biological Data
Making effective plots: 1.Don’t use default Excel plots! 2.Figure should highlight the key relationships in the data. 3.Should be clear - no extraneous.
Descriptive Statistics Summarizing data using graphs.
Beginning the Visualization of Data
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
Sometimes, Tables can be confusing
Exploratory Data Analysis Statistics Introduction If you are going to find out anything about a data set you must first understand the data Basically.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator.
StatisticsStatistics Graphic distributions. What is Statistics? Statistics is a collection of methods for planning experiments, obtaining data, and then.
Graphs An Introduction. What is a graph?  A graph is a visual representation of a relationship between, but not restricted to, two variables.  A graph.
Descriptive Statistics Summarizing data using graphs.
Unit 4 Statistical Analysis Data Representations.
Chapter 4: Variability. Variability Provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together.
Proposal: Preliminary Results and Discussion. Dos and Don’ts DoDon’t Include initial results if you have them You can also conduct and report on informal.
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and.
Plot type specific considerations
Fundamentals Analysis of Biological Data/Biometrics Dr. Ryan McEwan
Prof. Eric A. Suess Chapter 3
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Data analysis is one of the first steps toward determining whether an observed pattern has validity. Data analysis also helps distinguish among multiple.
Methods for Describing Sets of Data
Graphing Miss Sauer’s 7th Grade Science Class
Exploring Data: Summary Statistics and Visualizations
Visualizing your data effectively
The Standard Deviation as a Ruler and the Normal Model
Analysis of Quantitative Data
Regression & Correlation
Psychology 202a Advanced Psychological Statistics
Descriptive Statistics
Correlation, Bivariate Regression, and Multiple Regression
TYPES OF GRAPHS There are many different graphs that people can use when collecting Data. Line graphs, Scatter plots, Histograms, Box plots, bar graphs.
EXPLORING descriptions of SPREAD Math 6 Plus Unit 13
Unit 4 Statistical Analysis Data Representations
Module 11 Math 075. Module 11 Math 075 Bivariate Data Proceed similarly as univariate distributions … What is univariate data? Which graphical models.
Comparing ≥ 3 Groups Analysis of Biological Data/Biometrics
EXPLORING descriptions of SPREAD UNIT 13
How could data be used in an EPQ?
Ms jorgensen Unit 1: Statistics and Graphical Representations
Proposal: Preliminary Results and Discussion
Advantages and disadvantages of types of graphs
Exam 2 Analysis of Biological Data/Biometrics Dr. Ryan McEwan
Advantages and disadvantages of types of graphs
Unit 4 Statistics Review
CHAPTER 1 Exploring Data
Unit 3: Statistics Final Exam Review.
Variance terms Analysis of Biological Data/Biometrics Dr. Ryan McEwan
Unit 2: Statistics Final Exam Review.
Presenting Scientific Data
Analysing your pat data
Multivariate Data Analysis of Biological Data/Biometrics Ryan McEwan
Displaying and Summarizing Quantitative Data
Topic 1: Statistical Analysis
CHAPTER 3 Describing Relationships
Summary (Week 1) Categorical vs. Quantitative Variables
Unit 4: Describing Data After 10 long weeks, we have finally finished Unit 3: Linear & Exponential Functions. Now on to Unit 4 which will last 5 weeks.
Lesson – Teacher Notes Standard:
EXPLORING descriptions of SPREAD Math6-UNIT 10
Probability and Statistics
Higher National Certificate in Engineering
Descriptive Statistics
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Introduction to Types of Visual Displays
Unit 5 Correlation.
Data Literacy Graphing and Statisitics
EXPLORING descriptions of SPREAD
Reading, Constructing, and Analyzing Graphs
“Learning is not a spectator sport.”
Presentation transcript:

Exploratory Data Analysis Analysis of Biological Data/Biometrics Dr. Ryan McEwan Department of Biology University of Dayton ryan.mcewan@udayton.edu

A principle task for the padawan data analyst is committing to exploratory data analysis (EDA) In some cases the scientist will be conducting a discrete experimental test of some variable…if so, skipping right to the final analysis is legitimate. In most studies this will not end up being the case. Often, many variables have been measured. Sometimes the study will take an unsuspecting turn, the focus will shift, or the scientist will simply start out looking at a broad suite of variables. Even if you have a pretty good idea of what you are getting at, and what particular comparisons you want to make, exploration is highly advised.

When you have lots of data… Scattergrams are your friend. You can quickly graph relationships to get a look into the structure of the data set… In this case there is a curvilinear regression line through the points…. What is the strength of this line? Do you buy it?

Scattergrams are your friend Blocking panels together in this kind of way can give an interesting idea of the big picture, and can make scanning easy

Scattergrams are your friend Driving regression lines through clouds of points, then panning back yields a broader view of an extremely large data set Of course, there may be multivariate relationships or other patters that are difficult to visually discern, but it is still a good way to understand the data

Scattergrams are your friend But you don’t have to drive lines, sometimes the pattern is good enough

Scattergrams are your friend. You can use them for illustration of complex ideas, if you are crafty!

Scattergrams are your friend. Here is a line and scatter plot, showing a temporal relationship

Histogram/Bar Charts: Classic way to compare values In this particular case there was no way to calculate error…more on that later.

Histogram/Bar Charts. This is called a stacked bar chart. A pretty nice way to display information…the top of the bars matter, but their composition does too.

Histogram/Bar Charts. Here is a standard bar chart, but this time there are little whiskers on the top of the bars…. What are those? Those are “error bars” and are a great way to get a feel for the variation around your result. If the bars are very large, then your confidence should be very small. Rule of thumb- if the error bars overlap the bars are not different.

Histogram/Bar Charts. This is a bar chart that shows change over a time, so you have plus or minus in this one. Note also I have stars…these indicate statistical significance.

Histogram/Bar Charts. This is a bar chart with standard error bars. Looks impressive!

Histogram/Bar Charts. Here is the same data set, but scaled differently. Less impressive….

Box plot! The box plot is the best way to actually get a look at your data set. In this case, outliers are the dots, the line in the middle is the median, the top and bottom of the box are the 25 and 75 percentile of the data set, bars are 10th and 90th percentile. hmmmmmmmmmmmm

Mixing and matching formats for effect This is a figure format that allows you to explore the relationship between two variables. In this case, a bar chart is on the y1 axis, and a drought measure on the y2.

Mixing and matching formats for effect This is a figure format that allows you to explore the relationship between two variables. In this case, a bar chart is on the y1 axis, and a drought measure on the y2.

Ordination is nice when you have too many variables to handle easily

Cluster Analysis can be real useful for sussing out patterns and relationships

Refuse to be boxed in! Invent your own figure type Herbaceous species cover over space

Refuse to be boxed in! Invent your own figure type

Geospatial!!!!

Exploratory Data Analysis Analysis of Biological Data/Biometrics Dr. Ryan McEwan Department of Biology University of Dayton ryan.mcewan@udayton.edu