Phytome A Data Analysis Pipline presented by Jason Phillips
High Level Flow Chart Retrieve Unigenes Translate Unigenes Families
Main Outline ● Unigenes (Where'd they come from, where'd they go?) ● Translation (methods and procedures) ● Building Families (the power of together-ness)
phytome » Unigene ● What are? ● Where from? ● Nine Species ● Arabidopsis, a special case ● Storage
phytome » Unigene » What Are? Combined EST's that overlap
phytome » Unigene » Where From? ● TIGR ● Other sources?
phytome » Unigene » Nine Species
phytome » Unigene » Arabidopsis Highly annotated... Highly sequenced... Highly translated...
phytome » Unigene » Storage species count ghir mcry 8455 osat hann mtru lesc ljap lsat atha total:
phytome » Translation ● Methods ● Estwise ● Estscan ● FrameFinder ● Procedure ● Numbers
phytome » Translation » methods EST-WISE ESTSCAN FRAMEFINDER AB INITIO HOMOLOGIES via BLAST sprot + trembl
phytome » Translation » procedure ● EST-WISE (Mac OSX Cluster) – blast swiss prot: 10.3 hours, 35 nodes (~15 days) – blast trembl: 35.7 hours, 35 nodes (~52 days) ● ESTSCAN (Mustard) ● FrameFinder (Mustard)
phytome » Translation » numbers 242,246 Unigenes 242,246 Unigenes ESTWISE FRAMEFINDER ESTSCAN 151, , , ,416 15,258 4
phytome » Families ● Relationships ● Clustering ● Numbers
phytome » Families » Relationships Blast everything against everything sequences blastable db of sequences query sbjct e-value mtru302 ljap mtru302 lesc mtru302 hann osat59606 osat osat59606 osat osat59606 atha
phytome » Families » Relationships But we have 4 set's of sequences! tblastx 242,246 nucleotides blastp 151,830 estwise blastp 226,988 estscan blastp 242,242 framefinder Which method do we trust?
phytome » Families » Relationships 4 data sets...4 family interpretations tb ew es ff ~3 days, 28 nodes (~84 days) ~1/4 day, 21 nodes (~5days) BLAST OFF!
phytome » Families » Relationships Method size no blast no trans attrition tb ew ff es BLAST RESULTS
phytome » Families » Clustering TRIBE MCL evalue gene
phytome » Families » Clustering TRIBE MCL evalue gene
phytome » Families » Clustering fam id member atha atha atha atha osat osat atha atha lsat lsat query sbjct evalue atha7499 atha atha7499 atha osat23081 atha osat23081 osat atha1072 atha atha1072 lsat atha1072 lsat atha1072 atha tribe mcl
phytome » Families » Clustering tb ff es ew tb ff es ew TRIBE MCL blast results families
phytome » Families » Clustering Let's look as some histograms!
What should we do next round?