Download presentation
Presentation is loading. Please wait.
Published byHella Dorothea Kaiser Modified over 5 years ago
1
GSEA-Pro Tutorial Gene Set Enrichment Analysis for Prokaryotes
Anne de Jong, University of Groningen
2
Introduction The main principle of a Gene Set Enrichment Analysis (GSEA) is to discover which biological function is or functions are overrepresented in a set of genes or proteins resulting from an -omics (e.g. RNA-Seq) analysis. GSEA-Pro uses the Genome2D database that describes the relation between genes/proteins and functions (functional classification) of all RefSeq and Genbank complete genomes (>20k genomes). GSEA-Pro uses multiple classification; COG, GO, KEGG, PFAM, InterPro, Superfamily and Keywords. GSEA-Pro only allows locus-tags as ID for genes as well as for proteins
3
Introduction Overview of Functional Analysis of Genes Sets
Transcriptomics Proteomics Metagenomics -omics One or multiple sets of Genes Unravel the biological function of a “Gene Set” 3
4
Input STEP 1: Select Genome
GSEA-Pro offers the choice between RefSeq or Genbank. Be aware that for the same strain the locus- tags might differ between RefSeq and Genbank, commonly the RefSeq locus-tags contain the 2-letter code 'RS'. Conversion between RefSeq and Genbank locus-tags is supported at STEP 2: Four types of data tables can be used as input Single list of locus-tags: This is a bare list of 'Top Hits' genes (as locus-tags) deduced from transcriptome or proteome analysis results. Meaning only those that are significant are posted here Single list of locus-tags with ratio values: The first column contains the locus-tags, the second ratio values generated by differential expression (DE) analysis. In this case the cutoff values are used or auto detected by GSEA-Pro. Experiments: GSEA-Pro can handle multiple experiments (e.g. derived from time series). Also here the cutoff values can be set or auto detected. Clustering: Clustering algorithms will group genes showing similar behavior over perturbation experiments or time series. GSEA-Pro will handle each cluster as a gene set and will show the biological function of each cluster. The first column of the input table should contain the locus-tags and the column with cluster ID's should have the obligatory header “clusterID”.
5
Input Step 3: Examples of input data tables
Tables can be uploaded to the webserver as tab delimited file or by copy and paste directly from e.g. Excel Single list Single list + ratio data Experiments Clustering [ value columns will be ignored ]
6
Result of 5 experiments in L. lactis
Ranking p-value Classification # of locus-tags Columns are sortable Detailed info
8
Overview of Classification x Experiment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.