Contact: Biplot Analysis of Multi-Environment Trial Data Weikai Yan May 2006.

Slides:



Advertisements
Similar presentations
Tables, Figures, and Equations
Advertisements

Copyright © 2008, SAS Institute Inc. All rights reserved. Discovering Meaningful Patterns in Genomics Data with JMP Genomics Jordan Hiller JMP Genomics.
BIPLOT ANALYSIS OF AUTOMOBILE EVALUATION DATA Weikai Yan, Ph. D Web:
Psychology Practical (Year 2) PS2001 Correlation and other topics.
3D Geometry for Computer Graphics
Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.
An Introduction to Multivariate Analysis
Chapter 17 Overview of Multivariate Analysis Methods
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
Quantitative Genetics
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
Analysis of Variance & Multivariate Analysis of Variance
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Separate multivariate observations
Scales and Indices While trying to capture the complexity of a phenomenon We try to seek multiple indicators, regardless of the methodology we use: Qualitative.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Chapter 2 Dimensionality Reduction. Linear Methods
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
CHAPTER NINE Correlational Research Designs. Copyright © Houghton Mifflin Company. All rights reserved.Chapter 9 | 2 Study Questions What are correlational.
MANAGEMENT AND ANALYSIS OF WILDLIFE BIOLOGY DATA Bret A. Collier 1 and T. Wayne Schwertner 2 1 Institute of Renewable Natural Resources, Texas A&M University,
Quantitative Skills 1: Graphing
Analyzing and Interpreting Quantitative Data
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
es/by-sa/2.0/. Principal Component Analysis & Clustering Prof:Rui Alves Dept Ciencies Mediques.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Sort the graphs. Match the type of graph to it’s name.
Dr. Scott Sebastian, Research Fellow, Pioneer Hi-Bred International Plant Breeding Seminar at University of California Davis Accelerated Yield.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
1 Mohamed Alosh, Ph.D. Kathleen Fritsch, Ph.D. Shiowjen Lee, Ph.D. DBIII, OB, CDER, FDA Efficacy Evaluation in Acne Clinical Trials.
Chapter 6: Analyzing and Interpreting Quantitative Data
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Analyzing Expression Data: Clustering and Stats Chapter 16.
1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Principle Component Analysis and its use in MA clustering Lecture 12.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Charts Overview PowerPoint Prepared by Alfred P.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Differences Among Groups
1.5 Scatter Plots & Line of Best Fit. Scatter Plots A scatter plot is a graph that shows the relationship between two sets of data. In a scatter plot,
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Unsupervised Learning II Feature Extraction
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Canadian Bioinformatics Workshops
PREDICT 422: Practical Machine Learning
Confidence Intervals.
Exploring Microarray data
Statistical Tools in Quantitative Genetics
Chapter 2 Describing Data: Graphs and Tables
Basic Statistical Terms
Descriptive Statistics vs. Factor Analysis
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
GGE Biplot Rebecca Nolan.
Understanding Multi-Environment Trials
What are BLUP? and why they are useful?
Dimension reduction : PCA and Clustering
Statistical Tools in Quantitative Genetics
Linkage analysis and genetic mapping
Math review - scalars, vectors, and matrices
Lecture 16. Classification (II): Practical Considerations
Marios Mattheakis and Pavlos Protopapas
Unsupervised Learning
Presentation transcript:

Contact: Biplot Analysis of Multi-Environment Trial Data Weikai Yan May 2006

Weikai Yan 2006 Multi-Environment Trials (MET) MET are essential MET are expensive MET data are valuable MET data are not fully used

Weikai Yan 2006 Why biplot analysis? Biplot analysis can help understand MET data –Graphically, –Effectively, –Conveniently

Weikai Yan 2006 Outline Multi-environment trial (MET) data Basics of biplot analysis Biplot analysis of G-by-E data Biplot analysis of G-by-T data Better understanding of MET data Conclusions

Contact: Multi-environment trial data

Weikai Yan 2006 MET data is a genotype-environment-trait (G-E-T) 3-way table Multiple Genotypes Multiple Environments Multiple Traits

Weikai Yan 2006 A G-E-T 3-way table contains many 2-way tables G by E: for each trait G by T (trait): in each environment; across environments E by T: for each genotype; across genotypes G-E-T data >> G-E data

Weikai Yan 2006 A G-E-T 3-way table is an extended 2-way table G by V: –each E-T combination as a variable (V) P by T: –each G-E combination as a phenotype (P)

Weikai Yan 2006 A G-E-T 3-way table implies informative 2-way tables Association by environment 2-way tables –Associations: among traits between traits and genetic markers

Weikai Yan 2006 Goals of MET data analysis Short-term goals: –Variety evaluation Response to the environment (G x E) Trait profiles (G x T) Long-term goals: –To understand the target environment (G x E) the test environments (G x E) the crop (G x T) the genotype x environment interaction (A x T)

Contact: Basics of biplot analysis Most two-way tables can be visually studied using biplots

Weikai Yan 2006 Origin of biplot Gabriel (1971) One of the most important advances in data analysis in recent decades Currently… > 50,000 web pages Numerous academic publications Included in most statistical analysis packages Still a very new technique to most scientists Prof. Ruben Gabriel, The founder of biplot Courtesy of Prof. Purificación Galindo University of Salamanca, Spain

Weikai Yan 2006 What is a biplot? Biplot = bi + plot –plot scatter plot of two rows OR of two columns, or scatter plot summarizing the rows OR the columns –bi BOTH rows AND columns 1 biplot >> 2 plots

Weikai Yan 2006 Mathematical definition of a Biplot Graphical display of matrix multiplication Inner product property –P ij =OA i *OB j *cos ij –Implies the product matrix A(4, 2) B(2, 3) P(4, 3) Matrix multiplication A1A2 A3 A4 B1 B2 B3 5.0 cos = P11 = 5*4.472* = 20

Weikai Yan 2006 Practical definition of a biplot Practical definition of a biplot Any two-way table can be analyzed using a 2D-biplot as soon as it can be sufficiently approximated by a rank-2 matrix. (Gabriel, 1971) G-by-E table Matrix decomposition G1G2 G3 G4 E1 E2 E3 P(4, 3) G(3, 2) E(2, 3) (Now 3D-biplots are also possible…)

Weikai Yan 2006 Singular Value Decomposition (SVD) & Singular Value Partitioning (SVP) (0 f 1) Singular values Matrix characterising the rows Matrix characterising the columns SVD = PCA? SVD: SVP: The rank of Y, i.e., the minimum number of PC required to fully represent Y Rows scoresColumn scores Biplot Plot

Weikai Yan 2006 Biplot interpretations Inner-product property Interpretations based on biplots with f = 1 approximates YY T, the distance matrix Similarity/dissimilarity among row (genotype) factors Interpretations based on biplots with f = 0 approximates Y T Y, the variance matrix Similarity/dissimilarity among column (environment) factors Combined use of f = 0 and f = 1 (Gabriel, 2002 Biometrika; Yan, 2002, Agron J; Built in the GGEbiplot software)

Weikai Yan 2006 Biplot analysis is… to use biplots to display –a two-way data per se (Y), –its distance matrix (YY T ), and –its variance matrix (Y T Y) so that –relationships among rows, –relationships among columns, and –interactions between rows and columns can be graphically visualized.

Weikai Yan 2006 Data centering prior to biplot analysis The general linear model for a G-by-E data set (P) –P = M + G + E + GE Possible two-way tables (Y): Y = P = M + G + E + GE original data: QQE biplot Y = P – M = G + E + GE global-centered (PCA) Y = P – M – E = G + GE column-centered: GGE biplot Y = P – M – G = E + GE row-centered Y = P – M – G – E = GE double-centered: GE biplot All models are useful, depending on the research objectives (built in GGEbiplot)

Weikai Yan 2006 Data scaling prior to biplot analysis Different GGE biplots Y ij = ( i + ij )/s j S j = 1 no scaling S j = (s.d.) j all environments are equally important S j = (s.e.) j heterogeneity among environments is removed (built in GGEbiplot)

Weikai Yan 2006 Four questions must be asked before trying to interpret a biplot 1.What is the model? How the data were centered and scaled? What are we looking at? 2.What is the goodness of fit? How confident are we about what we see? What if the data is fitted poorly? 3.How singular values are partitioned? What questions can be asked? 4.Are the axes drawn to scale? Are the patterns artifacts? (All are addressed explicitly in GGEbiplot)

Contact: Biplot Analysis of G-by-E data MEGA- ENVIRONMENT ANALYSIS TESTENVIRONMENTEVALUATION GENOTYPEEVALUATION

Weikai Yan 2006 Sample G-by-E data (Yield data of 18 genotypes in 9 environments, 1993, Ontario, Canada)

Weikai Yan 2006 Before trying to interpret a biplot… 1.Model selection? Centering = 2 (G+GE) Scaling =0 2.Goodness of fit? 78%. 3.Singular value partitioning? SVP = 2 (environment- metric ) 4.Draw to scale? Yes.

Weikai Yan 2006 G By E data analysis MEGA- ENVIRONMENT ANALYSIS TESTENVIRONMENTEVALUATION GENOTYPEEVALUATION Mega-environment is a group of geographical locations that share the same (set of) best genotypes consistently across years.

Weikai Yan 2006 Relationships among environments Relationships among environments The Environment-vector view Angle vs. correlation The angles among test environments Environment grouping

Weikai Yan 2006 Which-won-where (Crossover GE is GE that caused genotype rank changes and different winners in different test environments) G12 G7 G18 G8 G13

Weikai Yan 2006 Are there meaningful crossover GE? Are there meaningful crossover GE? The which-won-where view (Crossover GE is GE that caused genotype rank changes and different winners in different test environments)

Weikai Yan 2006 Are the crossover patterns* repeatable? If YES… –The target environment can be divided into multiple mega-environments –GE can be exploited by selecting for each mega- environment –GE G If NO … –The target environment CANNOT be divided into multiple mega-environments –GE CANNOT be exploited –GE must be avoided by testing across locations and years *Not the environment-grouping patterns Mega-environment is a group of geographical locations that share the same (set of) best genotypes consistently across years. Multi-year data are needed

Weikai Yan 2006 Classify your target environment into one of three categories With Crossover GENo Crossover GE Repeatable (2) Multiple MEs Select for specifically adapted genotypes for each ME (1) Single simple ME A single test location, single year suffices to select a single best variety Not repeatable (3) Single complex ME Select for generally adapted genotypes across the whole regions across multiple years ME: mega-environment

Weikai Yan 2006 G By E data analysis MEGA- ENVIRONMENT ANALYSIS TESTENVIRONMENTEVALUATION GENOTYPEEVALUATION

Weikai Yan 2006 Discriminating ability and representativeness Vector length: discriminating ability Angle to the AE: representativeness Average-environment axis Average environment

Weikai Yan 2006 Ideal test environments: discriminating and representative Ideal test environment

Weikai Yan 2006 Classify each test environment into one of three categories For each good or useful test environment: is it essential? DiscriminativeNot discriminative Representative (2) Good for selecting (more important) (1) Useless Not representative (3) Useful for culling (less important)

Weikai Yan 2006 Vector length = discrimination = GE = GE1 + GE2 Contribution to Proportionate GE Contribution to Non- proportionate GE

Weikai Yan 2006 G By E data analysis MEGA- ENVIRONMENT ANALYSIS TESTENVIRONMENTEVALUATION GENOTYPEEVALUATION

Weikai Yan 2006 Vector length = GGE = G + GE Contribution To GE (instability) Contribution To G (mean performance)

Weikai Yan 2006 Mean vs. Stability

Weikai Yan 2006 Genotype ranking on both MEAN and STABILITY The ideal genotype

Weikai Yan 2006 Genotype classification Mean Stability High mean performance Low mean performance High stabilityGenerally adapted (VERY GOOD) Bad everywhere (VERY BAD) Low stabilitySpecifically Adapted (GOOD) Bad somewhere (BAD) Are there stability genes?!

Weikai Yan 2006 G x E data analysis summary 1) Mega-environment analysis 2) Test environment evaluation 3) Genotype evaluation Important comments: –(2) and (3) are meaningful only for a single mega-environment –Any stability analysis is meaningful only for a single mega- environment –Any stability index can be used only as a modifier to the ranking based on mean performance

Contact: Other ways to view a GGE biplot

Weikai Yan 2006 Inner-product property

Weikai Yan 2006 Ranking on a single environment

Weikai Yan 2006 Ranking on two environments

Weikai Yan 2006 Relative adaptation of a genotype

Weikai Yan 2006 Compare any two genotypes

Contact: Biplot analysis of Genotype by trait data

Weikai Yan 2006 Objectives of G By T data analysis Genotype evaluation based on trait profiles Relationship among breeding objectives

Weikai Yan 2006 Data of 4 traits for 19 covered oat varieties (Ontario 2004) (Background info: High yield, high groat, high protein, and low oil are desirable for milling oats)

Weikai Yan 2006 Relationships among traits

Weikai Yan 2006 Trait profile of each genotype

Weikai Yan 2006 Trait profile of a genotype

Weikai Yan 2006 Trait profile comparison between two genotypes

Weikai Yan 2006 Genotype ranking based on a trait

Weikai Yan 2006 Parent selection based on trait profiles

Weikai Yan 2006 Independent culling

Contact: Fuller understanding of MET data MET data are more informative than you thought

Weikai Yan 2006 A G-E-T 3-way dataset contains various 2-way tables G by E data G by T data E by T data: –for each genotype; all genotypes G by V data: –each E-T as a variable (V) P by T data: –each G-E as a phenotype (P) Genetic association by environment data Trait association by environment data

Weikai Yan 2006 Genetic-covariate by environment biplot (QTL by environment biplot) Barley Genomics Data

Weikai Yan 2006 Trait-association by environment biplot Oat MET Data

Weikai Yan 2006 Four-way data analysis Year…

Contact: Conclusions

Weikai Yan 2006 Conclusion (1) GGE biplot analysis is an effective tool for G by E data analysis to achieve understandings about…. 1.the target environment, 2.the test environments, and 3.the genotypes 4.stability analysis is useful only to a single mega-environment

Weikai Yan 2006 Conclusion (2) GGE biplot analysis is an effective tool for G by T data analysis to achieve understandings about…. 1. the interconnected plant system, 2. positively correlated traits 3. negatively correlated traits 4. the strength and weakness of the genotypes

Weikai Yan 2006 Conclusion (3) Biplot analysis is an effective tool for other two-way table analysis –Marker by environment –QTL by environment –Gene by treatment –Diallel cross –…

Weikai Yan 2006 Conclusion (4) Biplot analysis can be VERY EASY… –From reading data to displaying the biplot: 2 seconds –Displaying any of the perspectives of a biplot and changing from one to another: 1 second –Displaying the biplot for any subset: 1 second –Learning how to use the software and interpret biplots: 30 minutes –Everything can be just one mouse-click away

Contact: Thank you Contact: Weikai Yan: web: