BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis
Species: Caenorhabditis elegans Nematode worm Genome of ~100M bp (completed 2002) ~20,000 genes
Project choice: Advanced Project Investigation of differences in gene expression over multiple conditions
Project Overview Dataset Preparation Transcriptome Construction Pipeline Differentially Expressed Genes Gene Function Biological Explanation Co-expressed Genes Modules Functional Description & Explanation Module Conservation b/w species Gene Expression (Basic Project) Relationship to Transcript Properties Visualisation of Interaction Network
Datasets to use We will use four different conditions, corresponding to four different life-stages of the organism (L2, L3, L4 & YA) For each life-stage, there are 2-3 datasets (runs) of transcript reads, available on the NCBI SRA online database. Reference Genome also required
Dataset preparation.sra files are first converted to.fastq files via fastq-dump.fastq run-files are merged together to create a single.fastq file per stadia, via command-line script (cat) Reference genome selected from Ensembl database, after a Ref. genome from Wormbase failed to work
Merged transcriptome file CuffLinks program CuffLinks program Pipeline Overview Transcript reads.fastq file Transcript reads.fastq file TopHat program TopHat program Reference genome.gtf file Reads splice-aligned to genome Reconstructed transcriptome Transcriptome quantified (4 files) CuffDiff program CuffDiff program Differential gene expression CuffMerge program CuffMerge program
Project Task Delegation (M) Determine most differentially expressed genes, and (M) Visualisation of these genes~Qianqian (M) Link these genes to the NCBI database to determine gene function ~David (M) Biological explanation of differential gene expression across the different conditions Differentially Expressed Genes (S) Find modules of co-expressed genes using WGCNA~Thijs (C) Visualisation of these genes in Cytoscape (S) Functional description and explanation of the identified modules (S) Conservation of modules in a closely related species Co- expressed Genes Modules (S) Determine most highly expressed genes, for all 4 conditions, and (C) Any correlation between gene expression and transcript properties ~Matthew (W) Visualisation of these genes in an interaction network Gene Expression (Basic Project)
Problem Management Problem Overloaded Server Online database/software unavailable Online queries too large (overloading APIs) Bad time management Solution Run overnight Wait; good time management Download database and run queries locally Good time management
Data Validation Run the pipeline on another closely- related organism for comparable results? Do the biological explanations of the gene expression make sense in light of the conditional contexts?