Download presentation
Presentation is loading. Please wait.
Published byWesley Sheren Modified over 10 years ago
1
Xt ESTs 32,000 unique transcript set –16,000 clusters –16,000 singletons Clusters –9,000 (55%) have a blastx hit –4,000 might be full-length –2,000 ~98% probability of being FL Singletons –5,500 (35%) have a blastx hit –1,500 might be full-length –200 – 500 ‘probably’ FL
2
What are we looking for? FL perfect –good enough to spend £500 on a morphelino FL probable –likely enough for a gain of function expt Gene transcript –Good enough to put on an array For FL, distinguish between –knowing it’s full-length and –being sure of which ATG is the start
3
Looking for full-length transcripts Perfect full-length -Open reading frame -defined by clear prior stop codon -Clear ATG 3’ of STOP codon -Reasonable run of stop free sequence before another stop signal or end of ESTs -Consensus sequence agrees with ESTs -Blastx data -Blastx hits indicating coding sequence -Start of matching proteins exactly aligned with predicted start methionine -No other protein alignments consensus sequence CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCG PROTEIN Hs 1e-187 Gene name =================================================================================== PROTEIN Mm 1e-190 Gene name =================================================================================== PROTEIN Dr 1e-201 Gene name =================================================================================== PROTEIN Xl 1e-202 Gene name =================================================================================== GCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC TCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC AGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAG AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGCGCTAT CTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCG TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTATACG
4
Blast aligned with ATG Less perfect, but possible sufficient, indications of full- length 1. Blast hits line up with ATG -Perfect PROTEIN Hs 1e-187 Gene name =================================================================================== PROTEIN Mm 1e-190 Gene name =================================================================================== PROTEIN Dr 1e-201 Gene name =================================================================================== PROTEIN Xl 1e-202 Gene name =================================================================================== AGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC GAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC GCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAG AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGCGCTAT CTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCG TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTATACG -Weak hits, maybe several agree PROTEIN Ce 8.2e-9 Gene name =================================================================================== AGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC GAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTATACG -Strong hits but not clear agreement, predicted proteins confuse PROTEIN Hs 1e-187 Gene name =================================================================================== PROTEIN Mm 1e-190 Gene name =================================================================================== PREDICTED Dr 1e-201 Gene name =================================================================================== PROTEIN Xl 1e-202 Gene name ============================================================================= AGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC GAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC GCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAG AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGCGCTAT CTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCG TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTA
5
Proteins alignments start within ORF 2. Proteins aligned within well-defined ORF PROTEIN Hs 1e-10 Gene name =================================================================================== PROTEIN Dr 1e-19 Gene name =================================================================================== FRAGMENT Dm 1e-19 Gene name =================================================================================== PREDICTED Mm 1e-50 Gene name =================================================================================== PROTEIN Dr 1e-87 Gene name ============================================================ GCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC TCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC AGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAG AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGCGCTAT CTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCG TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTATACG
6
Proteins alignments overlap ORF 3. Proteins aligned some part overlaps well-defined ORF Weak hits, indication of domain homology quite likely to be FL PROTEIN Hs 1e-4 Gene name ========================================================================================================================== PROTEIN Dr 1e-5 Gene name ========================================================================================================================= FRAGMENT Dm 1e-6 Gene name ===================================================================================================================== PREDICTED Mm 1e-8 Gene name ========================================================================================================================== PROTEIN Dr 1e-8 Gene name ================================================================================================================================ CTATATATATATATCGATCGCTTAGGCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGC TCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATA AGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATAGGC AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATAGGCT Strong hits, probabably real homolog, ORF may be artefact of sequencing error, or in UTR PROTEIN Hs 1e-81 Gene name ======================================================================================================== PROTEIN Dr 1e-98 Gene name =================================================================================================== PROTEIN Xl 1e-107 Gene name ================================================================================================= CTATATATATATATCGATCGCTTAGGCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGC TCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATA AGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATAGGC AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATAGGCT
7
Protein alignment has upstream STOP 4. There are protein alignments and a well-defined STOP codon upstream PROTEIN Hs 1e-187 Gene name ============================================================= PROTEIN Mm 1e-190 Gene name ================================================================ PROTEIN Dr 1e-201 Gene name ================================================================ PROTEIN Xl 1e-202 Gene name ================================================================ GCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC -Mostly applicable to small clusters where codons are not well agreed
8
Long open reading frame…. 5. There is a long open reading frame, but maybe no blastx hits -------------------------------------------------- more than 500 (?) ----------------------- GCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC TCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC CTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCG TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTATACG -May just be in UTR -plenty of long ORFs observed in obvious UTR -May not even be RNA… -what about blastn data? -ESTscan would also be useful
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.