Project progress Brachypodium Rodenburg Wang Muminov Karrenbelt
Project Planning TopHat, Cufflinks and Cuffmerge [M] Cuffdiff [S] ==> [C] Select top & bottom 100 expressed genes [M] Analyse [M] Conserved regions and upstream motifs [C] Cytoscape network inference [C] ==> [S] GO enrichment analysis [W] ==> [S] Analyse: transcript length, GC content, intron length, codon usage Make a table out of this
Results so far Tophat.pl and Cufflinks.pl transcripts.gtf into fasta Analysis.pl containing subs for: Transcript length, GC content, translation Test file in fasta format used for verification Instead of using cuffmerge to create merged.gtf the average FPKM is calculated from the transcript.gtf files in order to select top and bottom 100 genes
Flowchart FastQ files TopHat Mapped reads Cufflinks Assembled transcripts R Analysis Average FPKM Codon bias Translate GC content Analyse Fasta files Fasta formatter Select genes Transcript length Intron length Transcript features Transcript ID Cellular component Molecular function GO enrichment Network Cytoscape Biological process
Future perspective Tasks Assigned to Pipeline assembly Sander R scripting Bob Codon bias Michiel Network inference & GO Simon Verification MEME analysis SignalP Finish all Must haves Including intron length and codon bias Use modules instead of merging scripts Avoid confounding of variables etc. MEME, SignalP, alternative initiation codons, overlapping genes
Issues and Challenges Top 100 transcripts ≠ top 100 genes Analysis of isoforms Cusp package Cytoscape into the pipeline? Verification Cusp: command requires file input, thus we cannot write 1 file containing all 100 sequences, rather have to write 100 files and loop through these Verification: literature, protein BLAST