Using MATLAB to identify genes in novel genomes based on homology Christine DeGennaro Postdoc, Springer Lab
Major points You can use MATLAB to automate repetitive tasks You can integrate existing software into your MATLAB scripts With some patience and the help of Google / MATLAB documentation, this is very achievable with a basic level of MATLAB skill
Project background Motivation: Want to understand thermostability and folding characteristics of proteins from cryophiles Goals: Clone and express proteins from several cryophilic organisms for in vitro study
Project background 37°C 30°C 25°C 17°C 13°C S. cerevisiae C. saitoi C. socialis C. victoriae C. vishnaicii G. martinii L. antarcticum 4
How would you do this by hand? Antarctic yeast contigs
How would you do this by hand? S. cerevisiae HIS3 BLAST
How would you do this by hand? S. pombe HIS3 BLAST again
How would you do this by hand?
How would you do this by hand?
Identify possible start and stop codons How would you do this by hand? Identify possible start and stop codons
How would you do this by hand? Identify possible start and stop codons Identify possible splice sites
How would you do this by hand? Identify possible start and stop codons Identify possible splice sites Identify the most likely gene features/boundaries
How would you do this by hand? Identify possible start and stop codons Identify possible splice sites Identify the most likely gene features/boundaries Design primers to amplify the region for cloning
How can MATLAB make this easier? Ortholog sequences Ortholog sequences Ortholog sequences Ortholog sequences Assembled contigs MATLAB ANALYSIS 1.) Identify region with BLAST 2.) Gene feature predictions 3.) Amplification primer optimization YFG1
Running BLAST with MATLAB RUN BLAST blastlocal('InputQuery','FASTA/HIS3/Scer_YOR202W.fasta',... 'database', 'C:/Users/cmd16/Genomes/C_socialis.fa',... 'BlastPath','C:/Program Files/blast-2.2.17/bin/blastall.exe',... 'program','tblastn',... 'Format',8);
FASTA/HIS3/Scer_YOR202W.fasta Running BLAST with MATLAB RUN BLAST blastlocal('InputQuery','FASTA/HIS3/Scer_YOR202W.fasta',... 'database', 'C:/Users/cmd16/Genomes/C_socialis.fa',... 'BlastPath','C:/Program Files/blast-2.2.17/bin/blastall.exe',... 'program','tblastn',... 'Format',8); FASTA/HIS3/Scer_YOR202W.fasta C:/Program Files/blast-2.2.17/bin/blastall.exe C:/Users/cmd16/Genomes/C_socialis.fa tBLASTn Output format 8
Running BLAST with MATLAB blastlocal('InputQuery','FASTA/HIS3/Scer_YOR202W.fasta',... 'database', 'C:/Users/cmd16/Genomes/C_socialis.fa',... 'BlastPath','C:/Program Files/blast-2.2.17/bin/blastall.exe',... 'program','tblastn',... 'Format',8); FASTA/ADE2/Scer_YOR128C.fasta FASTA/ADE2/Scer_YOR128C.fasta FASTA/ADE2/Scer_YOR128C.fasta FASTA/ADE2/Scer_YOR128C.fasta FASTA/HIS3/Scer_YOR202W.fasta C:/Program Files/blast-2.2.17/bin/blastall.exe C:/Users/cmd16/Genomes/C_socialis.fa tBLASTn Output format 8
Gene prediction output
Cryptococcus neoformans HIS3
MATLAB and Primer3
MATLAB and Primer3 PRIMER3 INPUT FILE SEQUENCE_ID=Cryptococcus_socialis_HIS3 SEQUENCE_TEMPLATE=CACCCTGATAGGGGAATCCT... SEQUENCE_INCLUDED_REGION=528,848 PRIMER_TASK=pick_cloning_primers PRIMER_PICK_ANYWAY=0 PRIMER_PICK_LEFT_PRIMER=1 PRIMER_PICK_INTERNAL_OLIGO=0 PRIMER_PICK_RIGHT_PRIMER=1 PRIMER_OPT_SIZE=18 PRIMER_MIN_SIZE=15 PRIMER_MAX_SIZE=21 PRIMER_NUM_RETURN=1 =
MATLAB analysis outputs a. MATLAB objects: BLAST data, summary of analysis, list of primers b. MATLAB figure: showing all BLAST hits c. FASTA file: containing sequence of contig/region d. Genbank file: contains sequence + annotation
MATLAB outputs: Genbank file
Major points You can use MATLAB to automate repetitive tasks You can integrate existing software into your MATLAB scripts With some patience and the help of Google / MATLAB documentation, this is very achievable with a basic level of MATLAB skill