Using MATLAB to identify genes in novel genomes based on homology

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
From Genes to Genomes: Concepts and Applications of DNA Technology, Jeremy W. Dale, Malcolm von Schantz and Nick Plant. © 2012 John Wiley & Sons, Ltd.
MainLabMeeting_PingZheng_ Ran the fgenesh on the large contigs from the matina_1_6_RNA dataset and performed BLAST the Putative genes against.
SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Run BLAST in command line mode Yanbin Yin Fall
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Genetic and physical maps around the sex-determining M- locus of the dioecious plant asparagus Telgmann-Rauber et al
A Study of GeneWise with the Drosophila Adh Region Asta Gindulyte CMSC 838 Presentation Authors: Yi Mo, Moira Regelson, and Mike Sievers Paracel Inc.,
Sequence Analysis. DNA and Protein sequences are biological information that are well suited for computer analysis Fundamental Axiom: homologous sequences.
Genome Annotation BCB 660 October 20, From Carson Holt.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Sequence Analysis with Artemis & Artemis Comparison Tool (ACT) South East Asian Training Course on Bioinformatics Applied to Tropical Diseases (Sponsored.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
 GEP Digital Laboratory Notebook Nick Reeves, Mt. San Jacinto Community College.
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
GeneWise and Artemis Exercises Spliced Alignment using GeneWise Click on the GeneWise hyperlink on the course links page,
BME 110L / BIOL 181L Computational Biology Tools February 19: In-class exercise: a phylogenetic tree for that.
Remember the limitations? –You must know the sequence of the primer sites to use PCR –How do you go about sequencing regions of a genome about which you.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Genomics (BIO 426) James Madison University. Why are you here? Have you taught Genomics before? Plan to teach it soon? Might you teach it sometime? Just.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Large-scale Prediction of Yeast Gene Function Introduction to Bio-Informatics Winter Roi Adadi Naama Kraus
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
Welcome to the combined BLAST and Genome Browser Tutorial.
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Culturable Bacterial Communities Analyzer DIANA VANESSA SARRIA-ZUNIGA ELIANA TORRES-ZELADA April 29, 2016.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
What is BLAST? Basic BLAST search What is BLAST?
Virginia Commonwealth University
Human Genome Project.
Daphnia Genome Preview at wFleaBase.org
Basics of BLAST Basic BLAST Search - What is BLAST?
Mirela Andronescu February 22, 2005 Lab 8.3 (c) 2005 CGDN.
University of Pittsburgh
Small RNA and Cyanobacteria
GEP Annotation Workflow
Sequencing Data Analysis
Step 1: amplification and cloning procedures
Genome Center of Wisconsin, UW-Madison
Today… Review a few items from last class
INFORMATION FLOW AARTHI & NEHA.
Gene Annotation with DNA Subway
Identify D. melanogaster ortholog
Comparative Genomics.
Basic Local Alignment Search Tool
Pairwise Sequence Alignment
Follow-up from last night: XSEDE credits
Sequencing Data Analysis
Presentation transcript:

Using MATLAB to identify genes in novel genomes based on homology Christine DeGennaro Postdoc, Springer Lab

Major points You can use MATLAB to automate repetitive tasks You can integrate existing software into your MATLAB scripts With some patience and the help of Google / MATLAB documentation, this is very achievable with a basic level of MATLAB skill

Project background Motivation: Want to understand thermostability and folding characteristics of proteins from cryophiles Goals: Clone and express proteins from several cryophilic organisms for in vitro study

Project background 37°C 30°C 25°C 17°C 13°C S. cerevisiae C. saitoi C. socialis C. victoriae C. vishnaicii G. martinii L. antarcticum 4

How would you do this by hand? Antarctic yeast contigs

How would you do this by hand? S. cerevisiae HIS3 BLAST

How would you do this by hand? S. pombe HIS3 BLAST again

How would you do this by hand?

How would you do this by hand?

Identify possible start and stop codons How would you do this by hand? Identify possible start and stop codons

How would you do this by hand? Identify possible start and stop codons Identify possible splice sites

How would you do this by hand? Identify possible start and stop codons Identify possible splice sites Identify the most likely gene features/boundaries

How would you do this by hand? Identify possible start and stop codons Identify possible splice sites Identify the most likely gene features/boundaries Design primers to amplify the region for cloning

How can MATLAB make this easier? Ortholog sequences Ortholog sequences Ortholog sequences Ortholog sequences Assembled contigs MATLAB ANALYSIS 1.) Identify region with BLAST 2.) Gene feature predictions 3.) Amplification primer optimization YFG1

Running BLAST with MATLAB RUN BLAST blastlocal('InputQuery','FASTA/HIS3/Scer_YOR202W.fasta',... 'database', 'C:/Users/cmd16/Genomes/C_socialis.fa',... 'BlastPath','C:/Program Files/blast-2.2.17/bin/blastall.exe',... 'program','tblastn',... 'Format',8);

FASTA/HIS3/Scer_YOR202W.fasta Running BLAST with MATLAB RUN BLAST blastlocal('InputQuery','FASTA/HIS3/Scer_YOR202W.fasta',... 'database', 'C:/Users/cmd16/Genomes/C_socialis.fa',... 'BlastPath','C:/Program Files/blast-2.2.17/bin/blastall.exe',... 'program','tblastn',... 'Format',8); FASTA/HIS3/Scer_YOR202W.fasta C:/Program Files/blast-2.2.17/bin/blastall.exe C:/Users/cmd16/Genomes/C_socialis.fa tBLASTn Output format 8

Running BLAST with MATLAB blastlocal('InputQuery','FASTA/HIS3/Scer_YOR202W.fasta',... 'database', 'C:/Users/cmd16/Genomes/C_socialis.fa',... 'BlastPath','C:/Program Files/blast-2.2.17/bin/blastall.exe',... 'program','tblastn',... 'Format',8); FASTA/ADE2/Scer_YOR128C.fasta FASTA/ADE2/Scer_YOR128C.fasta FASTA/ADE2/Scer_YOR128C.fasta FASTA/ADE2/Scer_YOR128C.fasta FASTA/HIS3/Scer_YOR202W.fasta C:/Program Files/blast-2.2.17/bin/blastall.exe C:/Users/cmd16/Genomes/C_socialis.fa tBLASTn Output format 8

Gene prediction output

Cryptococcus neoformans HIS3

MATLAB and Primer3

MATLAB and Primer3 PRIMER3 INPUT FILE SEQUENCE_ID=Cryptococcus_socialis_HIS3 SEQUENCE_TEMPLATE=CACCCTGATAGGGGAATCCT... SEQUENCE_INCLUDED_REGION=528,848 PRIMER_TASK=pick_cloning_primers PRIMER_PICK_ANYWAY=0 PRIMER_PICK_LEFT_PRIMER=1 PRIMER_PICK_INTERNAL_OLIGO=0 PRIMER_PICK_RIGHT_PRIMER=1 PRIMER_OPT_SIZE=18 PRIMER_MIN_SIZE=15 PRIMER_MAX_SIZE=21 PRIMER_NUM_RETURN=1 =

MATLAB analysis outputs a. MATLAB objects: BLAST data, summary of analysis, list of primers b. MATLAB figure: showing all BLAST hits c. FASTA file: containing sequence of contig/region d. Genbank file: contains sequence + annotation

MATLAB outputs: Genbank file

Major points You can use MATLAB to automate repetitive tasks You can integrate existing software into your MATLAB scripts With some patience and the help of Google / MATLAB documentation, this is very achievable with a basic level of MATLAB skill