This demo will show the analysis functionality of Phenom-Networks based on a dataset generated in the Hebrew University, the Faculty of Agriculture in Dani Zamirs laboratory. To initiate the tutorial you will need to select the Tomatoes database and log in the system either as a guest or as a registered user..
After you log in, go to Phenotype -> analysis. This is the main analysis page of the system which allows the user to perform statistical analysis of the data. The analysis options appear on the left side, organized into categories like univariate, multivariate and so on. In the middle, there is a table that lists all traits and on the right there are fields (which depend on the selected analysis) that will contain the traits that will be analyzed. The toolbar includes filtering possibilities to subset the data as well as other options. Phenotype -> analysis
The first step before starting to analyze is to select the study or studies that you are interested in. For this click on the studies filter button from the toolbar. This will open a popup window that contain all available studies. Lets click on the M82-pennellii ILs folder, and then select Akko 2004, and click OK. Studies filter button
All traits measured in this set of studies appear now in the traits list. In this example Ill show how to correlate between a large number of traits. For this I select the Correlate Y by X analysis. If we click on the trait group button (from the toolbar) we can see all the traits categories that are available (enzyme-activity, inflorescence, metabolites, morphology, protein). Suppose we want to correlate all enzyme activity traits to morphology traits. Lets select the enzyme activity from the trait group combo, then click search trait. 2) Trait group3) Search trait 1) Trait group
The click on the Search trait will reload the traits list such that only enzyme activity traits are present. Lets select them all and put them in the Y. Variables field. 1) Only enzyme-activity traits appear in the list. Lets select them all 2) Click on the Y. variables
1) Morphology traits selected 2) Traits moved to the X Variables. After I selected the morphology traits and moved them to the X Variables field, Im ready to invoke the analysis by clicking the OK button. 3) Click OK
The results are displayed as a table. Each row represents a correlation between two traits. For example, the selected row indicates that the correlation parameters between seed length/width and aspartate aminotransferase are: R (Pearson coefficient)=-0.34, N (sample size)=139 and P value=3.6e-0.5. the table is order by the P value column. Now lets click on the selected row. Note: we can see that there is a great involvement of the traits Brix (sugar concentration) and Harvest index, which is quite trivial. One may repeat this analysis by excluding these two traits (just by not adding them to the variables field), and thus focus on more interesting correlations.
The corresponding correlation plot appears on a separate result tab. You point the mouse on each data point to see the genotypes name (germplasm identification).
Now I go back to the Form tab, and choose the Heatmap analysis under the correlation category. Note that the traits stay in their fields, so I dont need to reclassify them. I can just click OK. 1) Correlations->Heatmap 2) Click OK
Here is the correlation heatmap between the enzyme activity and the morphology traits. i can point the mouse on a square t see the corresponding traits and the correlation between them. Red square represents positive correlatio, green are negative and black means no correlation (r value around 0). The image on the upper left side shows the distributions of all r values.
Now I want to produce a correlation network among all selected traits. When I select the Network analysis, the fields on the right part are changed: the X Variables field disappear. The reason for this is that network analysis take a single list of traits and perform pairwise correlations among all of them (not just between two group of traits like previous analyses). The Y Variables now contains only the enzyme activity traits. Lets add to the Y variables the morphology traits (this is done the same way as we did before). Now we can adjust the a value threshold to be more stringent (because there are many comparisons), and click OK. 1) Correlations->Network 2) move morphology traits to the Y Variables field (this will add them to the already existed enzyme activity). 3) Adjust alpha value 4) Click OK
This is the network correlation plot. Each circle represents a trait and a line connecting two traits represents a significant correlation between them according to the alpha threshold that I set before. A grey line indicates positive correlation and a blue is negative. Red circle is enzyme activity trait and blue is morphology.
You can point the mouse on each circle to see the traits name. if you click on a trait, a new analysis is launched depending on the state of the radio button at the top. By default it will do distribution analysis. For example, lets click on the fruit number trait. The state of the radio button will determine what happen when you click on a trait Fruit number
Well get distribution of fruit number trait in a new result tab.
Here i select the traits pairwise correlations. This analysis is not so appropriate for too many traits because it may yield very big figure that is difficult to interpret. First I clear the previous traits from the Y Variables by clicking the Clear button, and then I put few traits in the Y Variable. Next I can put a factor to the Color by so the data points in the figure will be colored by the different levels of that factor (this is not mandatory and can be omitted). 2) Clear previous content 1) Traits pairwise 3) Select few traits and put them in the Y Variable. 4) You can check the remove outlier option.
This is the result of traits pairwise. The traits appear on the diagonal. On the lower triangle its the correlation plot and on the upper, there is the correlation parameters.
Now Ill go back to the Correlate Y by X. But this time I check the Figures option (see parameters on the right part). this determines that the output will be correlation plots (figures) and not a table (as I did before). In the X and Y fields you can put as many traits as you like, and the number of resultant figures will be equals to the number of trait in Y multiply by number of traits in X. here I select Brix as Y and fruit number as X. 1) Correlate Y by X 2) Select Figures
Here is the correlation plot. Each data point represent a genotype, and its X value is the genotypes mean for the fruit number and its Y axis is the mean of Brix.
Now I will repeat this analysis, except that instead of calculating genotypes means, I will display all replications of that genotype on the plot (typically 9 plants per genotype in this study). To do this I uncheck the summarize by GID checkbox. This will display the summarize field. To this field I put the field code column and press OK. 1) Uncheck summarize by GID 2) Add Field code factor to the summarize field. Note that if I put germplasm identification – variety name in the summarize factor, itll yield an identical figure as I did before.
This gives correlation between the same traits as before, only now all plants are displayed on the figure (instead of only the genotypes mean). The label of each data point now represents the field code of that plant.
In this last example, Ill show correlation of a trait between different studies. For this Ill select the akko 2003 and akko 2004 figures in the studies filter.
Now I choose the Correlate Y by X, output as Table and I select the enzyme activity group of traits and put them both in Y Variables and in X Variables. 1) Correlate Y by X 2) Output as table 3) Select only the enzyme activity and put them both in X and Y variables fields. 4) OK
Here is the correlation table, only now each row correspond to correlation of a single trait between two studies. For example, the first row represents the correlation of phosphofructokinase between the studies akk and akko click on it to see the correlation plot.