CuffDiff ran successfully. Output files include gene_exp.diff What are the next steps? Use Navigation bar to find files; they may be under DNA Subway if Green Line was used. Click on gene_exp.diff Note size of file, 9.41 MB.
Under Download tab, click on Simple Download. Save File, OK Open Downloads Folder Find gene_exp.diff; make sure size is similar as in iPlant folder. Open a blank excel workbook and then open the gene_exp.diff file
Click Delimited, My data has headers, Next Tab Delimiters, Next General, Finish Workbook opens Save As xlsx.
Need to first filter for fold-change. Select log2(fold change) column, Filter, Number Filters, Between Custom AutoFilter window comes up. Use these settings for 2-fold cutoffs, recall that values are log2. Then sort that column from highest to lowest. Z to A icon.
Select significant column, Under Data tab, sort Z to A so that yes genes are first. yes is q-value ≤ Click to expand the selection and Sort. Scroll down to yes/no junction. After filtering, many rows are no longer visible, so gene count cannot be done by looking at row number. The list of yes genes with log2 fold change≤ -1 or ≥ 1 (2-fold difference) can be copied and pasted to another sheet to get gene number and clean gene list for Gene Ontology analysis = 2769 genes UP = 2759 genes DOWN Highlight junction between up- and down-regulated genes.
Open DAVID ( Open Functional Annotation. How do you go from a list of 2000 plus genes to something that is biologically relevant? Gene Ontology can be used to determine if certain biological processes are enriched in your set of genes. The genome has a certain percentage of genes with identified biological process; are any of these biological processes observed at a higher frequency in your gene list? If so, they are “enriched”. One tool to look for enrichment is DAVID. Note if your gene list is small, it will be difficult to get significant enrichment scores.
Select up-regulated genes from excel file. Click on first gene, scroll down to highlighted spot which separates up- and down-regulated genes. Shift, then click to select all up-regulated genes. Paste list into section A: Paste a list Select Identifier. The sample gene list is from Arabidopsis thaliana. Many other options are available in DAVID, but mostly model organisms. Identify List Type Submit
The gene list will now show as genes plus some unknowns. Unknowns can be viewed (View Unmapped Ids); in this case they were nearly all mitochondrial genes, and not a big concern. To keep analysis simple, Clear All Categories, Open Gene_Ontology and select GOTERM_BP_FAT, which is all the Biological Processes. Just use this one category. Scroll down, Click on Functional Annotation Clustering.
Annotation Clusters are related groups of BP GO terms. Note that all those in Cluster 1 are related to phosphate and phosphorylation. The Count is the number of genes for each GO term. P-value and Benjamini indicate probability that enrichment is real versus spurious. Scroll down the results to see much higher p-values. The file must be downloaded (red arrow) to get the False Discovery Rate (FDR), the probability that enrichment is spurious.
File will download into a new window. Right click and Select All and Copy. Open Excel and select Paste Special. Choose Unicode Text and Click OK. Expand column B to see GO terms. The column with the FDR is highlighted. 1% (or even 5%) is an acceptable cutoff. The top annotation clusters will be likely be much lower than the cutoff IF your gene list is large.
Scroll down to see higher FDRs. In this cluster, the top two GO terms have FDRs below 1%. Some of the other categories have FDRs at 99%; one can be certain they are spurious. Go through this list carefully to find GO terms with acceptable FDRs. Hopefully, this analysis has suggested some interesting Biological Processes to pursue. All genes for each GO term are listed in the Genes Column. Highlight interesting BP GO terms. Be sure to save this excel file. Repeat for down-regulated genes.
An alternative GO analysis tool is BiNGO which runs within Cytoscape. BiNGO works with a much larger group of organisms and has a useful network display which connects similar GO Biological Process terms. To use BiNGO, first download Cytoscape, a free software tool for network analysis. Open Cytoscape, and Under App Manager, search for BiNGO, and install BiNGO. Once installed, go to App Manager, Click on Bingo and Bingo Settings will come up.
Provide a meaningful name. Copy and Paste gene list from excel file. Use default for what to access. Set FDR to 0.01 if 1% is desired cutoff for enrichment. Choose correct organism. Start BiNGO.
Output shows graphical view and BiNGO output.
Move Bingo Output behind by clicking on network image. Zoom In and use purple square below to focus on certain areas of interest.
This cluster of related Biological Processes is similar to annotation cluster 1 from DAVID, protein phosphorylation.
Clicking on a particular node will bring up a panel that describes the node. Amino acid transport is related to arginine and basic amino acid transport. Spend some time looking through the BiNGO output as was done for DAVID. Be sure to save your file under Save As, so it can be opened in the future by BiNGO. These two tools will provide information for genes worthy of future study depending on the biological questions of interest.