Summarizing Differential Expression Using Mann-Whitney U-tests
RNA-Seq… at it’s Most Basic Form Samples from two conditions Isolate RNA Generate cDNA Create sequencing library by fragmenting, size selection and adding adaptors Run sequencer Generate short reads Identify differentially expressed genes Profound biological discovery
Heat stress experiment analyzed with tag-based RNA-seq individual stress control stress
Input : - list of significant genes (“our list”) - all GO annotations for all genes in a genome (or transcriptome) Enrichment test: whether “our list” contain more representatives of a certain GO category than expected by chance (Fisher’s exact, hypergeometric, or similar test) Gene Ontology enrichment analysis (classic)
Mann-Whitney U-test Use ranks to test if distributions of group X and group Y are different Robust to outliers and does not require normally distributed data
Input : - list of significant genes with measures to rank them - GO annotations for all genes in a genome (or transcriptome) Enrichment test: whether a GO category is significantly enriched with either top- or bottom-ranking genes (two-sided Mann-Whitney U test, or permutations) Advantages: - no need to do choose a “significance cutoff” - can keep track of direction of change Gene Ontology enrichment analysis (rank-based) controlstress Genes annotated with the GO term MWU test determines whether genes annotated with the GO term in question (stripes on the white box to the left) are significantly “bunched up” either on top or at the bottom of the ranked list. “delta rank” : mean rank of GO-term genes minus mean rank of all other genes (how much shift in ranks there is).
control treatment Differential Expression Analysis (DESeq EdgeR) Namepvalue-log(p)Rank gene gene gene gene gene gene gene gene delta rank
- Cluster GO categories according to the proportion of shared genes would bring similar biological processes together - Merge identical or very similar categories to reduce redundancy. Some GO categories in your data might share the same genes (and some may overlap completely)
Run R Script GO_MWU.R go to ~ /Desktop/Mann-Whitney_U-tests/MWU_go open the file GO_MWU.R execute commands by highlighting and pressing control + enter
gene,logP isogroup0,0.6 isogroup1,3.5 isogroup10,6.8 isogroup100,6.4 isogroup1000,1.7 isogroup10000,0.1 isogroup10001,-0.2 isogroup10002,0.6 isogroup10003,-0.4 heats.csv (differential expression dataset) V1V2 isogroup15359GO: ;GO: ;GO: ; isogroup0GO: isogroup100GO: ;GO: isogroup10001GO: isogroup10002GO: ;GO: ;GO: ;GO: isogroup10003GO: ;GO: ;GO: ;GO: ; isogroup10004GO: ;GO: ;GO: ;GO: isogroup10006GO: isogroup10007 GO: ;GO: ;GO: ;GO: ;GO: ;GO: ;GO: amil_defog_iso2go.tab (links genes with their GO terms) id: GO: name: mitochondrial genome maintenance namespace: biological_process def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw] is_a: GO: ! mitochondrion organization [Term] id: GO: name: reproduction namespace: biological_process alt_id: GO: alt_id: GO: def: "The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms." [GOC:go_curators, GOC:isa_complete, GOC:jl, ISBN: ] subset: goslim_generic subset: goslim_pir subset: goslim_plant subset: gosubset_prok go.obo (links GO terms with names, namespaces, and definitions)
Molecular function: Cellular component: Dendrograms : sharing of genes between categories. Fractions : genes with an unadjusted p<0.05 / total number of genes within the category. FDR-adjusted p-values GO_MWU: response to adult corals to 3 days of heat stress
Run R Script GO_MWU.R go to ~ /Desktop/Mann-Whitney_U-tests/MWU_go open the file GO_MWU.R execute commands by highlighting and pressing control + enter
KOG-MWU: same idea as GOMWU (“KOGMWU” package in ) Non-hierarchical and [mostly] non-overlapping nature of KOG class annotations allows for quantitative comparisons of diverse datasets based on KOG delta-ranks. “categories enriched with either up- or down-regulated genes”
Questions