NGS Bioinformatics Workshop 1.5 Tutorial – Genome Annotation April 5th, 2012 IRMACS Facilitator: Richard Bruskiewich Adjunct Professor, MBB
Workflow for Today Prepare to visualize annotation Get a genomic sequence from Genbank Repeat mask it.
Retrieve a genomic sequence… Retrieve a (relatively small <100kb, eukaryote) genomic sequence clone from Genbank Query Nucleotide divisione.g. Arabidopsis BAC clone (HE ) Select FASTA Save.. To File.. As “Fasta” (rename?)
Blast is a low hanging fruit… Use BLAST to quickly survey for similar sequences Megablast against nucleotide e.g. HE is closest to A. thaliana chr. 5? Megablast against reference RNA sequence db
Repeat Masking Upload the clone file to RepeatMasker on the web and run with appropriate parameters: Save the results (including the masked sequence) to your computer
ab initio Gene Predictions Genscan: Cut and paste results as text to a file Fgenesh:
Blast2GO Annotation workbench, via Gene Ontology (GO) terms. First, save the predicted peptides (e.g. from fgenesh) need to fix the FASTA headers to assign proper identifiers (could write a script?) (Java web) start blast2go workbench Load in peptides Do the analysis… e.g. run blastp, GO, annotation, Interpro, etc. See for details on GOwww.geneontology.org for interpro info
EMBOSS European Molecular Biology Open Software Suite (EMBOSS): Download and install version of interest (e.g. Linux, Mac OSX, Windows…) Decide what do to: Let’s try a CpG island plot (cpgplot)
Study Genes by Comparative Genomics JGI Vista toolkit: GenomeVista rVista