Data talking to theory, theory talking to data: how can we make the connections? Stevan J. Arnold Oregon State University Corvallis, OR.

Slides:



Advertisements
Similar presentations
Regression and correlation methods
Advertisements

A Method for Detecting Pleiotropy
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
The Elephant in the Dark:
On board do traits fit B.M. model? can we use model fitting to answer evolutionary questions? pattern vs. process table.
A Simulation-Based Approach to the Evolution of the G-matrix Adam G. Jones (Texas A&M Univ.) Stevan J. Arnold (Oregon State Univ.) Reinhard Bürger (Univ.
I. Necessary Conditions II. Selection in Nature III. Units & Levels of Selection Natural Selection.
Phenotypic evolution: the emergence of a new synthesis Stevan J. Arnold Oregon State University.
Simulation Studies of G-matrix Stability and Evolution Stevan J. Arnold (Oregon State Univ.) Adam G. Jones (Texas A&M Univ.) Reinhard Bürger (Univ. Vienna)
Variance and covariance M contains the mean Sums of squares General additive models.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Resolving the paradox of stasis: models with stabilizing selection explain evolutionary divergence on all timescales Suzanne Estes & Stevan J. Arnold.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Final Review Session.
NOTES ON MULTIPLE REGRESSION USING MATRICES  Multiple Regression Tony E. Smith ESE 502: Spatial Data Analysis  Matrix Formulation of Regression  Applications.
“Ghost Chasing”: Demystifying Latent Variables and SEM
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Chapter 11 Multiple Regression.
CHAPTER 30 Structural Equation Modeling From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
Constraints on multivariate evolution Bruce Walsh Departments of Ecology & Evolutionary Biology, Animal Science, Biostatistics, Plant Science.
EVOLUTION ALONG SELECTIVE LINES OF LEAST RESISTANCE* Stevan J. Arnold Oregon State University *ppt available on Arnold’s website.
Lorelei Howard and Nick Wright MfD 2008
Chapter Two Probability Distributions: Discrete Variables
Evolutionary Algorithms BIOL/CMSC 361: Emergence Lecture 4/03/08.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Reductionism & the Modern Synthesis “Change in allele frequency over time” Wright Dobzhansky Stebbins Fisher Simpson Haldane.
1.2 Inheritance of a Single Trait & Response to Selection Stevan J. Arnold Department of Integrative Biology Oregon State University.
A Simulation-Based Approach to the Evolution of the G-matrix Adam G. Jones (Texas A&M Univ.) Stevan J. Arnold (Oregon State Univ.) Reinhard Bürger (Univ.
2.2 Selection on a Single & Multiple Traits Stevan J. Arnold Department of Integrative Biology Oregon State University.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
A New Look at the Major Features of Evolution Stevan J. Arnold Department of Integrative Biology Oregon State University.
2.2 Selection on a Single & Multiple Traits Stevan J. Arnold Department of Integrative Biology Oregon State University.
Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.
Can the Fisher-Lande Process Account for Birds-of-Paradise and Other Sexual Radiations? Stevan J. Arnold & Lynne D. Houck Department of Integrative Biology.
The Multivariate Gaussian
1 Inferences about a Mean Vector Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
July 11, 2006Bayesian Inference and Maximum Entropy Probing the covariance matrix Kenneth M. Hanson T-16, Nuclear Physics; Theoretical Division Los.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 22.
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
Lecture 12 Factor Analysis.
Characterization of mating systems Stevan J. Arnold Oregon State University.
3.1 Selection as a Surface Stevan J. Arnold Department of Integrative Biology Oregon State University.
[Topic 1-Regression] 1/37 1. Descriptive Tools, Regression, Panel Data.
Linear Correlation (12.5) In the regression analysis that we have considered so far, we assume that x is a controlled independent variable and Y is an.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
3.2 Evolution on Adaptive Landscapes Stevan J. Arnold Department of Integrative Biology Oregon State University.
Stats & Summary. The Woodbury Theorem where the inverses.
Principal Component Analysis
2.1 Multivariate Inheritance & Response to Selection Stevan J. Arnold Department of Integrative Biology Oregon State University.
Adaptive Dynamics in Two Dimensions. Properties of Evolutionary Singularities n Evolutionary stability Is a singular phenotype immune to invasions by.
© 2000 Prentice-Hall, Inc. Chap Chapter 10 Multiple Regression Models Business Statistics A First Course (2nd Edition)
Multivariate Transformation. Multivariate Transformations  Started in statistics of psychology and sociology.  Also called multivariate analyses and.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
8- Multiple Regression Analysis: The Problem of Inference The Normality Assumption Once Again Example 8.1: U.S. Personal Consumption and Personal Disposal.
Phylogenetic comparative methods Comparative studies (nuisance) Evolutionary studies (objective) Community ecology (lack of alternatives)
Department of Mathematics
REGRESSION G&W p
CJT 765: Structural Equation Modeling
CHAPTER- 17 CORRELATION AND REGRESSION
OVERVIEW OF LINEAR MODELS
Simple Linear Regression
OVERVIEW OF LINEAR MODELS
Principal Component Analysis
3 basic analytical tasks in bivariate (or multivariate) analyses:
Presentation transcript:

Data talking to theory, theory talking to data: how can we make the connections? Stevan J. Arnold Oregon State University Corvallis, OR

Conclusions The most cited scientific articles are methods, reviews, and conceptual pieces A worthy goal in methods papers is to connect the best data to the most powerful theory The most useful theory is formulated in terms of measureable parameters Obstacles to making the data-theory connection can lie with the data, the theory or because the solution resides in a different field Sometimes a good solution is worth waiting for

The papers Lande & Arnold 1983 The measurement of selection on correlated characters. Evolution Arnold 1983 Morphology, performance, and fitness. American Zoologist Arnold & Wade 1984 On the measurement of natural and sexual selection … Evolution Phillips & Arnold 1989 Visualizing multivariate selection. Evolution Phillips & Arnold 1999 Hierarchial comparison of genetic variance- covariance matrices … Evolution Jones et al. 2003, 2004, 2007 Stability and evolution of the G- matrix … Evolution Estes & Arnold 2007 Resolving the paradox of stasis … American Naturalist Hohenlohe & Arnold 2008 MIPoD: a hypothesis testing framework for microevolutionary inference … American Naturalist

Citations Lande & Arnold 1983 …………… Arnold 1983 …………………………413 Arnold & Wade 1984………………..560 Phillips & Arnold 1989 ……………..165 Phillips & Arnold 1999 ………… Jones et al. 2003, 2004, 2007 ………76 Estes & Arnold 2007………………….24 Hohenlohe & Arnold 2008 …………....2

Format Original goal: What we were looking for in the first place Obstacle: Why we couldn’t get there Epiphany: How we got past the block New goal: What we could do once we got past the block

Lande & Arnold 1983 correlated characters Original goal: Understand the selection gradient, Obstacle: β impossible to estimate because it is the first derivative of an adaptive landscape Epiphany: β is also a vector of partial regressions of fitness on traits, New goal: Estimate β (and γ) using data from natural populations

The selection gradient as the direction of steepest uphill slope on the adaptive landscape

Arnold 1983 morphology, performance, & fitness Original goal: What is the relationship between performance studies and selection? Obstacle: Performance measures are distantly related to fitness Epiphany: Recognize two parts to fitness and selection (β), one easy to measure, the other difficult New goal: Estimate selection gradients corresponding to these two parts ( )

A path diagram view of the relationships between morphology, performance and fitness, showing partitioned selection gradients Arnold 1983

Arnold & Wade 1984 natural vs. sexual selection Original goal: Find a way to measure sexual selection using Howard’s (1979) data Obstacle: Howard used multiple measures of reproductive success Epiphany: Use a multiplicative model of fitness to analyze multiple episodes of selection New goal: Measure the force of natural vs. sexual selection

Howard’s 1979 data table

Arnold & Wade’s 1984 parameterization of Howard’s data

Howard’s 1979 plot showing selection of body size

Arnold & Wade’s 1984 analysis and plot of Howard’s data, showing that most of the selection body size is due to sexual selection

Phillips & Arnold 1989 visualizing multivariate selection Original goal: How can one visualize the selection implied by a set of β- and γ- coefficients? Obstacle: Univariate and even bivariate diagrams can be misleading, so what is the solution? Epiphany: Canonical analysis is a long-standing solution to this standard problem New goal: Adapt canonical analysis to the interpretation of selection surfaces

The canonical solution is a rotation of axes Arnold et al. 2008

Phillips & Arnold 1999 comparison of G-matrices Original goal: How can one test for the equality and proportionality of G-matrices Obstacle: Sampling covariances (family structure) complicates test statistics Epiphany: Use Flury’s (1988) hierarchial approach; use bootstrapping to account for family structure New goal: Implement a hierarchy of tests that compares eigenvectors and values

The G-matrix can be portrayed as an ellipse Arnold et al. 2008

The Flury hierarchy of matrix comparisons Arnold et al. 2008

Jones et al. 2003, 2004,2007 stability and evolution of G Original goal: What governs the stability and evolution of the G-matrix? Obstacle: No theory accounts simultaneously for selection and finite population size Epiphany: Use simulations New goal: Define the conditions under which the G-matrix is least and most stable

Alignment of mutation and selection stabilizes the G-matrix Arnold et al. 2008

Estes & Arnold 2007 paradox of stasis Original goal: Use Gingerich’s (2001) data to test stochastic models of evolutionary process Obstacle: Data in the form of rate as a function of elapsed time; models make predictions about divergence as a function of time Epiphany: Recast the data so they’re in the same form as the models New goal: Test representatives of all available classes of stochastic models using the data

Gingerich’s 2001 plot, showing decreasing rates as a function of elapsed time

Estes and Arnold 2007 plot of Gingerich’s data in a format for testing stochastic models of evolutionary process

DISPLACED OPTIMUM MODEL z W θ z p(z) Lande 1976

Hohenlohe & Arnold 2008 MIPoD Original goal: Combine data on: inheritance (G- matrix), effective population size (N e ), selection, divergence and phylogeny to make inferences about processes producing adaptive radiations Obstacle: What theory? Epiphany: Use neutral theory; use maximum likelihood to combine the data New goal: Implement a hierarchy of tests that compares the G-matrix with the divergence matrix (comparison of eigenvectors and values)

An adaptive landscape vision of the radiation: peak movement along a selective line of least resistance

PaperGoalObstaceEpiphany Lande & Arnold 1983conceptualdata to theory connection not apparentalgebraic revelation Arnold 1983data to theory connectionwrong fitness currencyuse multiplicative ftiness model Arnold & Wade 1984data to theory connectionwrong fitness currencyuse multiplicative ftiness model Phillips & Arnold 1989conceptualavailable solution not appliedapply solution (canonical analysis) Phillips & Arnold 1999statisticalavailable solution not appliedapply solution (Flury hierarchy) Jones et al theoreticalno theory / limited datasimulate Estes & Arnold 2007data to theory connectiondata in wrong form transform data so they mesh with theory Hohenlohe & Arnold 2008data to theory connectiondata to theory connection not apparent use neutral theory (+ Flury hierarchy & ML) Summary

PaperGoalObstacleEpiphany Lande & Arnold 1983conceptual4 yearsalgebraic revelation Arnold 1983data to theory connectionweeksuse multiplicative ftiness model Arnold & Wade 1984data to theory connectionweeksuse multiplicative ftiness model Phillips & Arnold 1989conceptualmonthsapply solution (canonical analysis) Phillips & Arnold 1999statistical10 yearsapply solution (Flury hierarchy + bootsrapping) Jones et al theoretical1 yearsimulate Estes & Arnold 2007data to theory connectionweekstransform data so they mesh with theory Hohenlohe & Arnold 2008data to theory connection10 yearsuse neutral theory (+ Flury hierarchy & ML) Wait for it, wait for it …

Conclusions The most cited scientific articles are methods, reviews, and conceptual pieces A worthy goal in methods papers is to connect the best data to the most powerful theory The most useful theory is formulated in terms of measureable parameters Obstacles to making the data-theory connection can lie with the data, the theory, or because the solution resides in a different field or needs to be invented Sometimes a good solution is worth waiting for

Acknowledgments Russell Lande (Imperial College) Michael J. Wade (Indiana Univ) Patrick C. Phillips (Univ. Oregon) Adam G. Jones (Texas A&M Univ.) Reinhard Bürger (Univ. Vienna) Suzanne Estes (Portland State Univ.) Paul A. Hohenlohe (Oregon State Univ.)