Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008
Source: Nature (Commentary on ENCODE
Starting databases Putative Unique Transcripts (PUTs) Expressed Sequence Tags (ESTs)
42% of the total EST sequences in GenBank assembled into PUTs 82% of the ESTs can be mapped to a unique genomic region vs 72% of the PUTs PercentileNo. of ESTs/PUT ESTs vs PUTs
Download PUT sequences Map them to the genome using GMAP Map to protein-coding regions Map to AT RNA genes Yes? Map to other AT features No? BLASTn against all known CDS sequences + GeneWise to confirm alignment on translated CDS sequences BLASTx against all known proteins to verify absence of any protein in the sequences Coding Index to double-verify absence of protein-like seq BLASTn against Repetitive Sequence Database No match? ~324, ,
Download PUT sequences Map them to the genome using GMAP Map to protein-coding regions Map to AT RNA genes Yes? Map to other AT features No? BLASTn against all known CDS sequences + GeneWise to confirm alignment on translated CDS sequences BLASTx against all known proteins to verify absence of any protein in the sequences Coding Index to double-verify absence of protein-like seq BLASTn against Repetitive Sequence Database No match? ~324, ,
Issues PUT sequences of not very good quality Use sequence of the region on the genome where these PUTs map Use EST sequences? BLAST against database does not give all hits BLAST against a different database, of a different size. PUTs extremely close to genes may be part of extended UTR regions Remove ridiculously close ones. Check directions of other PUTs.
What if… A sequence passes through all filters… but still is a protein sequence?
Issues Most of these PUTs do not show conservation Does that mean they are non-functional? Most of these PUTs do not seem to have a secondary structure like RNA Does that mean they are not RNA genes?
Plans for the next month Get the final list of novel PUTs Assign them directionality and estimate assembly error rates using EST mapping Conservation Secondary structure