Determine ORF and BLASTP

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Ribosome footprinting
Bioinformatics. Bioinformatics is an applied science that uses computer programs to access molecular biology databanks to make inferences about the information.
Max BachourJessica Chen. Shotgun or 454 sequencing High throughput sequencing technique that can collect a large amount of data at a fast rate. Works.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
CSE182-L12 Gene Finding.
BIOLOGY 3020 Fall 2008 Gene Hunting (DNA database searching)
Annotation Presentation Alternative Start Codons &
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
WSSP Chapter 7 BLASTN: DNA vs DNA searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc.
WSSP Chapter 7 BLASTN: DNA vs DNA searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Transcription BIT 220 Chapter 12 Basic process of Transcription Figures 12.3 Figure 12.5.
1. Bacterial genomes - genes tightly packed, no introns... HOW TO FIND GENES WITHIN A DNA SEQUENCE? Scan for ORFs (open reading frames) - check all 6 reading.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Copyright OpenHelix. No use or reproduction without express written consent1.
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
1 P6a Extra Discussion Slides Part 1. 2 Section A.
Notes: Protein Synthesis
WSSP Chapter 8 BLASTX Translated DNA vs Protein searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Review of Protein Synthesis. Fig TRANSCRIPTION TRANSLATION DNA mRNA Ribosome Polypeptide (a) Bacterial cell Nuclear envelope TRANSCRIPTION RNA PROCESSING.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Amino acids are coded by mRNA base sequences.
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Protein Synthesis. DNA is in the form of specific sequences of nucleotides along the DNA strands The DNA inherited by an organism leads to specific traits.
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Finding genes in the genome
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
Performing BlastP Amino acids Based on the nature of the side chains:  Aliphatic amino acids- G, A, V, L, I, P  Aromatic amino acids- F, Y, W  Polar.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
What is BLAST? Basic BLAST search What is BLAST?
Bacterial infection by lytic virus
bacteria and eukaryotes
Bacterial infection by lytic virus
Eukaryotic Gene Structure
Basics of BLAST Basic BLAST Search - What is BLAST?
Chapter 11: From DNA to Protein
Gene architecture and sequence annotation
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
WSSP-14 Chapter 6 Editing the DNA sequence waveforms
Computational Analysis of your cDNA Sequences using the
RNA and Protein Synthesis
Quiz#4 LC710 11/14/11 name___________
What do you with a whole genome sequence?
Basic Local Alignment Search Tool
mRNA Degradation and Translation Control
Protein Synthesis Translation
Practice Clone 3 Download and get ready!.
ORF identification in Allgenes Project
CHAPTER 17 FROM GENE TO PROTEIN.
Gene Expression Practice Test
Conserved Domain Database and Cn3D
Basic Local Alignment Search Tool
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Additional file 3 >HWI-EAS344:7:70:153:1969#0/1 Length = 75 
Section 13.2 Protein Synthesis.
Figure 1a. Insertion of sequence into Claudi capsid gene
Presentation transcript:

Determine ORF and BLASTP WSSP Chapter 9 Determine ORF and BLASTP atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgctga ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgttgg attgaaggta attatcttgc atgagccagc tgatgagtat gatacagttt

© 2014 WSSP 2 2

Steps and terms used in protein expression 1st ATG in mRNA p 9-1 © 2014 WSSP

Cloning the cDNA library p 9-1 © 2014 WSSP

Possible reading frames © 2014 WSSP

Possible types of clones in the cDNA library © 2014 WSSP

Link to Toolbox translation program DSAP Define ORF page: Link to Toolbox translation program p 9-3 © 2014 WSSP

Toolbox: DNA Sequence Translation Program PolyA tail at 3’ end Reading frames p 9-3 © 2014 WSSP

EX1.14 +1 Reading Frame Longest ORF Translation stop p 9-3 © 2014 WSSP

Which one of these would be the correct ORF? A) B) Rule #1: If downstream of a stop codon, translation of the protein MUST start with an M (MET) p 9-3 © 2014 WSSP

Could this ORF code for the protein? © 2014 WSSP

Does this region match the BLASTX matches? Region of DNA that codes for the highlighted in protein sequence BLASTx p 9-4 © 2014 WSSP

Could the DNA code for a partial protein? © 2014 WSSP

Does this region match the BLASTX matches? Region of DNA that codes for the highlighted in protein sequence BLASTx p 9-4 © 2014 WSSP

Examine +2 reading frame A) Yes Could this be the ORF that codes the protein? B) No p 9-4 © 2014 WSSP

Does this region match the BLASTX matches? Region of DNA that codes for the highlighted in protein sequence BLASTx p 9-4 © 2014 WSSP

Examine +3 reading frame A) Yes Could this be the ORF that codes the protein? B) No WHY? p 9-5 © 2014 WSSP

Examine +2 reading frame Which type of clone? A) B) C) Can not tell p 9-4 © 2014 WSSP

An example of a partial coding sequence Similar Seq. © 2014 WSSP

Is this a partial ORF cDNA clone? What about this region? © 2014 WSSP

The first part of the protein may not have matches because it is not conserved. 2 60 410 475 Query Sbjct Region of similarity © 2014 WSSP

What type of clone would this sequence be? C) Can not tell p 9-4 © 2014 WSSP

The BLASTx helps determine which reading frame is correct It also helps suggest the start point p 9-6 © 2014 WSSP

Based on the BLASTX information shown, which ORF is correct for EX1.14? - Both Neither Can not tell © 2014 WSSP

Chose the reading frame and paste in the protein sequence Do not include the * (stop codon) Make sure to include bases that code for the stop codon p 9-7 © 2014 WSSP

The Five Commandments of DSAP I. The stop codon is part of the ORF.

DSAP BLASTp page p 9-8 © 2014 WSSP

Paste in protein sequence NCBI BLASTp page Paste in protein sequence p 9-8 © 2014 WSSP

BLASTp results of EX1.14 +2 ORF Link to Conserved Domain Database p 9-9 © 2014 WSSP

BLASTp results of EX1.14 +1 ORF © 2014 WSSP

BLASTp results of EX1.13 +3 ORF No matches © 2014 WSSP

Enter BLASTp data into table M * Protein AAAAAA Possible DNA Clones AAAAAA AAAAAA p 9-10 © 2014 WSSP

Does the EX1.14 clone likely code for the start of the protein? BLASTP alignment A) Yes B) No © 2014 WSSP

Suppose the cDNA was missing the first 13 bp Does this DNA code for the start of the protein? >gi|226493894|ref|NP_001150519.1| dynein light chain LC6, flagellar outer arm [Zea mays] Length=93 Score = 139 bits (351), Expect = 8e-32 Identities = 63/81 (77%), Positives = Query 1 MPRKMQAEAMNAASHALDLFDVADCKSLAAHIKKEFDKIYGPGWQCVVGSSFGCFFTHKK 60 MP KMQA+AM+AAS ALD FDV DC+S+A+HIKKEFD I+GPGWQCVVGS FGC+ TH K Sbjct 13 MPAKMQAQAMSAASRALDRFDVLDCRSIASHIKKEFDAIHGPGWQCVVGSGFGCYITHSK 72 Query 61 GSFIYFRLETLHFLIFKGAAA 81 GSFIYFRLE+L FL+FKGAAA Sbjct 73 GSFIYFRLESLRFLVFKGAAA 93 © 2014 WSSP

Suppose the cDNA was missing the first 13 bp Did they choose the correct ORF? >gi|226493894|ref|NP_001150519.1| dynein light chain LC6, flagellar outer arm [Zea mays] Length=93 Score = 139 bits (351), Expect = 8e-32 Identities = 63/81 (77%), Positives = Query 1 MPRKMQAEAMNAASHALDLFDVADCKSLAAHIKKEFDKIYGPGWQCVVGSSFGCFFTHKK 60 MP KMQA+AM+AAS ALD FDV DC+S+A+HIKKEFD I+GPGWQCVVGS FGC+ TH K Sbjct 13 MPAKMQAQAMSAASRALDRFDVLDCRSIASHIKKEFDAIHGPGWQCVVGSGFGCYITHSK 72 Query 61 GSFIYFRLETLHFLIFKGAAA 81 GSFIYFRLE+L FL+FKGAAA Sbjct 73 GSFIYFRLESLRFLVFKGAAA 93 © 2014 WSSP

Suppose the cDNA was missing the first 13 bp Did they choose the correct ORF? BLASTP starting here >gi|226493894|ref|NP_001150519.1| dynein light chain LC6, flagellar outer arm [Zea mays] Length=93 Score = 139 bits (351), Expect = 8e-32 Identities = 63/81 (77%), Positives = Query 1 MPRKMQAEAMNAASHALDLFDVADCKSLAAHIKKEFDKIYGPGWQCVVGSSFGCFFTHKK 60 MP KMQA+AM+AAS ALD FDV DC+S+A+HIKKEFD I+GPGWQCVVGS FGC+ TH K Sbjct 13 MPAKMQAQAMSAASRALDRFDVLDCRSIASHIKKEFDAIHGPGWQCVVGSGFGCYITHSK 72 Query 61 GSFIYFRLETLHFLIFKGAAA 81 GSFIYFRLE+L FL+FKGAAA Sbjct 73 GSFIYFRLESLRFLVFKGAAA 93 BLASTP starting here >gi|226493894|ref|NP_001150519.1| dynein light chain LC6, flagellar outer arm [Zea mays] Score = 156 bits (395), Expect = 6e-37 Identities = 72/92 (78%), Positives = 82/92 (89%), Gaps = 0/92 (0%) Query 1 LEGRARVEDTDMPRKMQAEAMNAASHALDLFDVADCKSLAAHIKKEFDKIYGPGWQCVVG 60 LEG+A VEDTDMP KMQA+AM+AAS ALD FDV DC+S+A+HIKKEFD I+GPGWQCVVG Sbjct 2 LEGKAVVEDTDMPAKMQAQAMSAASRALDRFDVLDCRSIASHIKKEFDAIHGPGWQCVVG 61 Query 61 SSFGCFFTHKKGSFIYFRLETLHFLIFKGAAA 92 S FGC+ TH KGSFIYFRLE+L FL+FKGAAA Sbjct 62 SGFGCYITHSKGSFIYFRLESLRFLVFKGAAA 93 © 2014 WSSP

Q3 Does the DNA sequence code for the likely start of the protein? gi|226500868|ref|NP_001151809.1| 50S ribosomal protein L20 [Zea mays] Length=122 Score = 207 bits (528), Expect = 2e-52 Identities = 100/113 (88%), Positives = 106/113 (93%), Gaps = 0/113 (0%) Query 1 MNKEKILKLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWIQRINAATRL 60 MNK KI KLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWI+RINA TRL Sbjct 1 MNKGKIFKLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWIERINAGTRL 60 Query 61 HAVNYGTFMHGLLRENVQLNRKVLSELSMHEPHSFKALVDVSRAAFPGNRQIP 113 H VNYG FMHGL++EN+QLNRKVLSELSMHEP+SFKALVDVSR AFPGNR +P Sbjct 61 HGVNYGNFMHGLMKENIQLNRKVLSELSMHEPYSFKALVDVSRNAFPGNRPVP 113 Yes No Can not tell from data © 2014 WSSP

Q4 Does the DNA sequence code for the likely end of the protein? gi|226500868|ref|NP_001151809.1| 50S ribosomal protein L20 [Zea mays] Length=122 Score = 207 bits (528), Expect = 2e-52 Identities = 100/113 (88%), Positives = 106/113 (93%), Gaps = 0/113 (0%) Query 1 MNKEKILKLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWIQRINAATRL 60 MNK KI KLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWI+RINA TRL Sbjct 1 MNKGKIFKLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWIERINAGTRL 60 Query 61 HAVNYGTFMHGLLRENVQLNRKVLSELSMHEPHSFKALVDVSRAAFPGNRQIP 113 H VNYG FMHGL++EN+QLNRKVLSELSMHEP+SFKALVDVSR AFPGNR +P Sbjct 61 HGVNYGNFMHGLMKENIQLNRKVLSELSMHEPYSFKALVDVSRNAFPGNRPVP 113 Yes No Can not tell from data © 2014 WSSP

Q3 Does the DNA sequence code for the likely start of the protein? BLASTp >gi|226500868|ref|NP_001151809.1| 50S ribosomal protein L20 [Zea mays] Length=122 Score = 124 bits (310), Expect = 4e-27 Identities = 57/68 (83%), Positives = 63/68 (92%), Gaps = 0/68 (0%) Query 1 MRSLWIQRINAATRLHAVNYGTFMHGLLRENVQLNRKVLSELSMHEPHSFKALVDVSRAA 60 MRSLWI+RINA TRLH VNYG FMHGL++EN+QLNRKVLSELSMHEP+SFKALVDVSR A Sbjct 46 MRSLWIERINAGTRLHGVNYGNFMHGLMKENIQLNRKVLSELSMHEPYSFKALVDVSRNA 105 Query 61 FPGNRQIP 68 FPGNR +P Sbjct 106 FPGNRPVP 113 Yes No Can not tell from data Query Sbjct 1 45 68 113 84 122 © 2014 WSSP

Q3 Does the DNA sequence code for the likely start of the protein? BLASTp gi|226500868|ref|NP_001151809.1| 50S ribosomal protein L20 [Zea mays] Length=122 Score = 124 bits (310), Expect = 4e-27 Identities = 57/68 (83%), Positives = 63/68 (92%), Gaps = 0/68 (0%) Query 1 MRSLWIQRINAATRLHAVNYGTFMHGLLRENVQLNRKVLSELSMHEPHSFKALVDVSRAA 60 MRSLWI+RINA TRLH VNYG FMHGL++EN+QLNRKVLSELSMHEP+SFKALVDVSR A Sbjct 46 MRSLWIERINAGTRLHGVNYGNFMHGLMKENIQLNRKVLSELSMHEPYSFKALVDVSRNA 105 Query 61 FPGNRQIP 68 FPGNR +P Sbjct 106 FPGNRPVP 113 BLASTx GENE ID: 100285444 LOC100285444 | 50S ribosomal protein L20 [Zea mays] Length=122 Score = 146 bits (369), Expect = 7e-34 Identities = 68/79 (86%), Positives = 74/79 (93%), Gaps = 0/79 (0%) Frame = +1 Query 1 SYRDRRNKKRDMRSLWIQRINAATRLHAVNYGTFMHGLLRENVQLNRKVLSELSMHEPHS 180 SYRDRRNKKRDMRSLWI+RINA TRLH VNYG FMHGL++EN+QLNRKVLSELSMHEP+S Sbjct 35 SYRDRRNKKRDMRSLWIERINAGTRLHGVNYGNFMHGLMKENIQLNRKVLSELSMHEPYS 94 Query 181 FKALVDVSRAAFPGNRQIP 237 FKALVDVSR AFPGNR +P Sbjct 95 FKALVDVSRNAFPGNRPVP 113 Yes No Can not tell from data © 2014 WSSP

Compare the BLASTx and BLASTp results for EX1.14: Are the matches to the same proteins? p 9-11 © 2014 WSSP

Compare the BLASTx and BLASTp results for EX1 Compare the BLASTx and BLASTp results for EX1.14: Are the e-values similar? p 9-11 © 2014 WSSP

Do they start and stop at the same place in the Sbjct sequence? Compare the BLASTx and BLASTp results for EX1.14: Are the alignments similar? Do they start and stop at the same place in the Sbjct sequence? BLASTx >ref|NP_001150519.1| dynein light chain LC6, flagellar outer arm [Zea mays] Length=93 Query 11 MLEGRARVEDTDMPRKMQAEAMNAASHALDLFDVADCKSLAAHIKKEFDKIYGPGWQCVV 190 MLEG+A VEDTDMP KMQA+AM+AAS ALD FDV DC+S+A+HIKKEFD I+GPGWQCVV Sbjct 1 MLEGKAVVEDTDMPAKMQAQAMSAASRALDRFDVLDCRSIASHIKKEFDAIHGPGWQCVV 60 Query 191 GSSFGCFFTHKKGSFIYFRLETLHFLIFKGAAA 289 GS FGC+ TH KGSFIYFRLE+L FL+FKGAAA Sbjct 61 GSGFGCYITHSKGSFIYFRLESLRFLVFKGAAA 93 BLASTp >gi|226493894|ref|NP_001150519.1| dynein light chain LC6, flagellar outer arm [Zea mays] Length=93 Score = 158 bits (400), Expect = 2e-37 Query 1 MLEGRARVEDTDMPRKMQAEAMNAASHALDLFDVADCKSLAAHIKKEFDKIYGPGWQCVV 60 MLEG+A VEDTDMP KMQA+AM+AAS ALD FDV DC+S+A+HIKKEFD I+GPGWQCVV Sbjct 1 MLEGKAVVEDTDMPAKMQAQAMSAASRALDRFDVLDCRSIASHIKKEFDAIHGPGWQCVV 60 Query 61 GSSFGCFFTHKKGSFIYFRLETLHFLIFKGAAA 93 GS FGC+ TH KGSFIYFRLE+L FL+FKGAAA Sbjct 61 GSGFGCYITHSKGSFIYFRLESLRFLVFKGAAA 93 p 9-12 © 2014 WSSP

Compare the BLASTx and BLASTp results for another clone: Did the student select the correct protein sequence? Yes No Can not tell © 2014 WSSP

Compare the BLASTx and BLASTp results: Are the matches to the same proteins? Yes No Can not tell © 2014 WSSP

Compare the BLASTx and BLASTp results: Are the e-values similar? Yes No Can not tell © 2014 WSSP

Did the student choose the correct protein sequence? BLASTp gi|241946706|gb|EES19851.1| hypothetical protein SORBIDRAFT_09g026330 Length=122Score = 211 bits (538), Expect = 1e-53, Method: Compositional matrix adjust. Identities = 100/113 (88%), Positives = 106/113 (93%), Gaps = 0/113 (0%) Query 11 MNKEKILKLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWIQRINAATRL 70 MNK KI KLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWI+RINA TRL Sbjct 1 MNKGKIFKLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWIERINAGTRL 60 Query 71 HAVNYGTFMHGLLRENVQLNRKVLSELSMHEPHSFKALVDVSRAAFPGNRQIP 123 H VNYG FMHGL++EN+QLNRKVLSELSMHEP+SFKALVDVSR AFPGNR +P Sbjct 61 HGVNYGNFMHGLMKENIQLNRKVLSELSMHEPYSFKALVDVSRNAFPGNRPVP 113 BLASTx gi|241946706|gb|EES19851.1| hypothetical protein SORBIDRAFT_09g026330 Length=122 Score = 207 bits (528), Expect = 5e-52 Identities = 100/113 (88%), Positives = 106/113 (93%), Gaps = 0/113 (0%) Frame = +3 Query 69 MNKEKILKLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWIQRINAATRL 248 MNK KI KLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWI+RINA TRL Sbjct 1 MNKGKIFKLAKGFRGRAKNCIRIARERVEKALQYSYRDRRNKKRDMRSLWIERINAGTRL 60 Query 249 HAVNYGTFMHGLLRENVQLNRKVLSELSMHEPHSFKALVDVSRAAFPGNRQIP 407 H VNYG FMHGL++EN+QLNRKVLSELSMHEP+SFKALVDVSR AFPGNR +P Sbjct 61 HGVNYGNFMHGLMKENIQLNRKVLSELSMHEPYSFKALVDVSRNAFPGNRPVP 113 Yes No Can not tell from data #1 Mistake in Selecting an ORF © 2014 WSSP

Compare the BLASTx and BLASTp a different clone: Did the student select the correct protein sequence? Yes No Can not tell © 2014 WSSP

Compare the BLASTx and BLASTp : Are the matches to the same proteins? Yes No Can not tell © 2014 WSSP

Compare the BLASTx and BLASTp: Are the e-values similar? Yes No Can not tell © 2014 WSSP

Did the student choose the correct protein sequence? gi|226500868|ref|NP_001151809.1| 50S ribosomal protein L20 [Zea mays] Length=122 Score = 124 bits (310), Expect = 4e-27 Identities = 57/68 (83%), Positives = 63/68 (92%), Gaps = 0/68 (0%) Query 1 MRSLWIQRINAATRLHAVNYGTFMHGLLRENVQLNRKVLSELSMHEPHSFKALVDVSRAA 60 MRSLWI+RINA TRLH VNYG FMHGL++EN+QLNRKVLSELSMHEP+SFKALVDVSR A Sbjct 46 MRSLWIERINAGTRLHGVNYGNFMHGLMKENIQLNRKVLSELSMHEPYSFKALVDVSRNA 105 Query 61 FPGNRQIP 68 FPGNR +P Sbjct 106 FPGNRPVP 113 BLASTp GENE ID: 100285444 LOC100285444 | 50S ribosomal protein L20 [Zea mays] Length=122 Score = 146 bits (369), Expect = 7e-34 Identities = 68/79 (86%), Positives = 74/79 (93%), Gaps = 0/79 (0%) Frame = +1 Query 1 SYRDRRNKKRDMRSLWIQRINAATRLHAVNYGTFMHGLLRENVQLNRKVLSELSMHEPHS 180 SYRDRRNKKRDMRSLWI+RINA TRLH VNYG FMHGL++EN+QLNRKVLSELSMHEP+S Sbjct 35 SYRDRRNKKRDMRSLWIERINAGTRLHGVNYGNFMHGLMKENIQLNRKVLSELSMHEPYS 94 Query 181 FKALVDVSRAAFPGNRQIP 237 FKALVDVSR AFPGNR +P Sbjct 95 FKALVDVSRNAFPGNRPVP 113 BLASTx Yes No Can not tell from data #2 Mistake in Selecting an ORF © 2014 WSSP

Data already entered in! DSAP Review Page Data already entered in! © 2014 WSSP

DSAP Review Page p. 7-17 © 2014 WSSP

Do NOT use Toolbox to determine the 5’ UTR!!! © 2014 WSSP

The Five Commandments of DSAP I. The stop codon is part of the ORF. II. The start of the 5’ UTR is always the first base.

Do NOT use Toolbox to determine the 3’ UTR!!! © 2014 WSSP

Determine ranges of 5’ UTR and 3’ UTR by highlighting the ranges in the DSAP cDNA text box © 2014 WSSP

What should you do if your clone is a partial? 5’ UTR No 5’ UTR © 2014 WSSP

The Five Commandments of DSAP I. The stop codon is part of the ORF. II. The start of the 5’ UTR is always the first base. III. If the clone is a partial there is no 5’ UTR.

An example of a partial coding sequence Similar Seq. © 2014 WSSP

? S I R XGC TCA ATC CGT The first bases are part of the reading frame © 2014 WSSP

The Five Commandments of DSAP I. The stop codon is part of the ORF. IV. If the clone is a partial, the start of the ORF is always the first base. II. The start of the 5’ UTR is always the first base. III. If the clone is a partial, there is no 5’ UTR.

What should you do if you get these results? BLASTX © 2014 WSSP

Why is my cDNA noncoding? Genomic DNA ORF RNA AAAAAAA cDNA (Partial) AAAAAAA Recent genome wide RNA sequence studies show that more than 10% of polyA RNAs are non-coding © 2014 WSSP

If your DNA is non-coding, enter in the entire sequence as 3’ UTR © 2014 WSSP

The Five Commandments of DSAP I. The stop codon is part of the ORF. IV. If the clone is a partial, the start of the ORF is always the first base. II. The start of the 5’ UTR is always the first base. V. If the clone is a noncoding, the entire DNA is considered 3’ UTR. III. If the clone is a partial, there is no 5’ UTR.

The Five Commandments of DSAP I. The stop codon is part of the ORF II. The start of the 5’ UTR is always the first base III. If the clone is a partial, there is no 5’ UTR. IV. If the clone is a partial, the start of the ORF is always the first base. V. If the clone is non-coding, the entire DNA 3’ UTR © 2014 WSSP