GxDb a universal tool to collect, analyse, manage and visualize transcriptomic data Wolfgang Raffelsberger, Raymond Ripp and Laetitia Poidevin BingGi Days January 2010
What is transcriptomic ? -> a high throughput analysis of gene expression by measuring the amount of mRNA What are the techniques ? -> DNA microarrays -> SAGE -> Differential Display -> …. => large quantities of data GxDb: integrative tool to Introduction collect treat analyze manage visualize
GxDb is a website and a database
Organization of data in GxDb Sample Individual name age description Individual name age description Organism Genotype Tissue Treatment SampleCondition ex: mouse wt aged 9 day Arraytype ex: Mouse430_2
Arraytype RealExp Arraytype Sample CEL file r3 CEL file r2 CEL file r1 RealExp 2 Arraytype Sample 2 CEL file r5 CEL file r4 CEL file r3 RealExp 3 Arraytype Sample 3 CEL file r8 CEL file r7 CEL file r6 RealExp 4 Arraytype Sample 4 CEL file r11 CEL file r10 CEL file r9 Organization of data in GxDb ex: Mouse430_2 ex: wt_d9 ex: wt_d11 ex: wt_d13 ex: wt_d15
Organization of data in GxDb Arraytype RealExp Arraytype Sample CEL file r3 CEL file r2 CEL file r1 RealExp 2 Arraytype Sample 2 CEL file r5 CEL file r4 CEL file r3 RealExp 3 Arraytype Sample 3 CEL file r8 CEL file r7 CEL file r6 RealExp 4 Arraytype Sample 4 CEL file r11 CEL file r10 CEL file r9 Experiment Arraytype RealExp Arraytype Sample CEL file r3 CEL file r2 CEL file r1 RealExp 2 Arraytype Sample 2 CEL file r5 CEL file r4 CEL file r3 RealExp 3 Arraytype Sample 3 CEL file r8 CEL file r7 CEL file r6 RealExp 4 Arraytype Sample 4 CEL file r11 CEL file r10 CEL file r9 Experiment Signal Intensity Ratio Cluster ≠ expressed genes Quality Treatment and Analysis protocol
1) Normalization 6 methods: RMA, gcRMA, dChip, MAS5.0, plier, vsn => signal intensity 2) Calculate average (between replicats) and ratio 3) Filtering - Eliminate probesets that are never expressed in all arrays of one experiment based on distribution or call (according to normalization method) - Eliminate probesets with very low changes between condition et reference based on fold change based on standard deviation 4) Statistical analysis - method: t-test combined with empirical bayes for shrinkage - estimation of FDR (false discovery rate) - tag probesets with differential expression (automatic threshold findings) Treatment and Analysis protocol
1) Normalization 2) Calculate average (replicats) and ratio 3) Filtering 4) Statistical analysis 5) Clustering tool: Cluspack methods: k-means (DPC) Mixtures models (aic and bic) => clusters 6) Quality Control Report tool: RReportGenerator for Automatic Statistical Analysis Automatic Statistical Analysis to estimate the quality of arrays
Upload form
Step 1: Selection of Arraytype and Experiment
Upload form Step 1 Create your new experiment
Organism Genotype SampleCondition Individual TreatmentType Treatment Tissue Sample Upload form Step 1 Create your news samples
Upload form Step 1: Selection of Arraytype and Experiment
Upload form Step 2: Upload of.cel files
Upload form Step 3: Select the corresponding sample to each cel file
Upload form Step 4: Select the interesting comparisons to calculate ratio Ratio: Condition / reference Example: C3H_rd1_d10 / C3H_wt_d10
Upload form Step 5: Launch Treatment and Analysis protocol
Upload form Step 5: Clustering, Quality analysis and loading in database
Signal Intensity Ratio ≠ expressed gene Clustering RealExp Organization of data in GxDb Quality Sample Experiment Cel file Arraytype-Probeset
Query GxDb
Experiment Probeset Sample RealExp Signal Intensity Ratio Cluster
time-course of retinal development Visualization in GxDb
GxDb Website Upload Querying Display alnitak Star3 Star4 Star5 Star6 Star7 Star8 /GxData GxDb SQL database Web Services Café des sciences QSub Ordonnanceur GxDb ressources Languages used: PHP (HTML) - Upload - PipeWork - RadarGenerator - Fed R - Treatment and analysis protocol - RReportGenerator SQL Tcl - Gx (~ Gscope) - Probeset loading C - Cluspack
Conclusion and Prospects Automated raw-data upload, storage, treatment and analysis multiple treatment protocols multiple clustering methods multiple human and automatic expert analysis => Comparisons => Analyse the strengths and weaknesses of the different protocols Improvement of website More user friendly Visualization of clusters, ratio Tools for meta-analysis Possibility of upload data directly from GEO Diagnostic report to analyze easier the data Links to others databases and tools: STRING, GSEA..
Ratio Pipework Organism Normalization Ratio minimum Ratio maximum
Integration and storage in a unifying format Automated raw-data upload, storage, treatment and analysis multiple treatment protocols multiple clustering methods multiple human and automatic expert analysis => Comparisons => Analyse the strengths and weaknesses of the different protocols Facilitated querying and data visualization Advantages of GxDb
Arraytype RealExp Arraytype Sample CEL file r3 CEL file r2 CEL file r1 Arraytype RealExp 2 Arraytype Sample 2 CEL file r5 CEL file r4 CEL file r3 Arraytype RealExp 3 Arraytype Sample 3 CEL file r8 CEL file r7 CEL file r6 Arraytype RealExp 4 Arraytype Sample 4 CEL file r11 CEL file r10 CEL file r9 GxDb transcriptomics
PROBESET 3 probeset_id genename genedescription species speciessymbol representpublicid refseqtranscriptid gscope_id swissprot unigene_id entrezgene ensembl mgi cytoband chromoloc omim tissuespecificity linkeddiseases go_biologicalprocess go_cellularcomponent go_molecularfunction pathway interpro transmembrane PROBESET 2 genename probeset_id genedescription species speciessymbol representpublicid refseqtranscriptid gscope_id swissprot unigene_id entrezgene ensembl mgi cytoband chromoloc omim tissuespecificity linkeddiseases go_biologicalprocess go_cellularcomponent go_molecularfunction pathway interpro transmembrane Experiment Arraytype RealExp 1 Arraytype Sample CEL file r3 CEL file r2 CEL file r1 Arraytype RealExp 2 Arraytype Sample CEL file r3 CEL file r2 CEL file r1 Arraytype RealExp 3 Arraytype Sample CEL file r3 CEL file r2 CEL file r1 Arraytype RealExp 4 Arraytype Sample 4 CEL file r11 CEL file r10 CEL file r9 Arraytype PROBESET probeset_id genename genedescription species speciessymbol representpublicid refseqtranscriptid gscope_id swissprot unigene_id entrezgene ensembl mgi cytoband chromoloc omim tissuespecificity linkeddiseases go_biologicalprocess go_cellularcomponent go_molecularfunction pathway interpro transmembrane Sample Individual name age description Individual name age description Organism Genotype Tissue Treatment SampleCondition Signal Intensity Ratio Cluster
already exists ? Arraytypes Create new Arraytype already exists ? Sample Create new Sample with existing or new Individual existing or new Organism existing or new Tissues existing or new Genotype existing or new Treatment Upload your.CEL files Enter their association to Arraytypes and Samples Define Couples of RealExps for the Ratio Calculation Fill in the other information for the Experiment Run Automatic Analysis Query and Display Results GxDb protocol from upload to display Quality Report Signal Intensity Ratio Cluster Differentially Expressed Genes