Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will input files and directory names, make the choice for trimmed or untrimmed RNAseq data to be assembled, and retrieve the top 100 expressed genes provided with biological information. Bioinformatics tools aimed to be used: TopHat, samtools, cufflinks, perl scripts, cytoscape Oryza group Arjan, Claire, Robert & Ruud
RNAseq data Trimmed Oryza group Untrimmed TopHat Cufflinks Pick up the 100 most highly expressed genes in the yeast RNAseq
Oryza group Arjan: Developed a package to retrieve gene sequences. Claire: Correlation between GC content and gene expression. Robert: Trimming, Length sorting, TopHat, Cufflinks, retrieving Top100, formatting pipeline. Ruud: Correlation between exon/intron size and gene expression. Generate output table. Progress:
Oryza group Biological questions on the selected genes: Correlation between exon/intron size and gene expression Correlation between GC content and gene expression
Oryza group Using conserved palindrome motif list in yeast: 1. Look for proportion of G/C-only repeats in top 100 gene list? 2. Look for repeats length in top 100 gene list?
... compare results with profile of less expressed genes?... compare differently expressed genes on both growth conditions?... GO annotation? Plot data on cytoscape? Oryza group
Arjan: Codon usage over Top100 genes. Claire: Correlation between conserved palindrome sequences and gene expression. Robert: GO-annotation package. Ruud: retrieve Top100 gene information from NCBI. Next week packages: Oryza group