Protein Homology Discovery Mixed bag of proteins Protein Homologies PHD Genes Database Open reading frame finder Proteins Database BLAST Clustering Protein Homology Database Inner components of PHD - Cylinders are used for databases (Raw information) - Blocks are used for operations performed on this information Goals and motivation for the PHD -Building a library of protein families -Useful for functional and structural prediction -Methodology applied to UniGene organisms
Inner workings of the PHD P HD BLAST ORF Genes Proteins Genes downloaded from public Unigene Db Apply an open- reading frame finder algorithm to extract the proteins Clustering Proteins Db constructed Do an all against all alignment using BLAST Clustering algorithms applied to BLAST results Final Db contains protein homologies
Who goes into the funnel?
species # of genes/proteins bovine taurus 6871 zebrafish 10,000 homo sapien 32,000 mouse From UniGene: From Stanford: Yeast 15,000
Who goes into the funnel? Drosophila 15,000 From Flybase: From Sanger institute: Nematoad 19,000
Clustering (what we have) - Graphs were constructed from binary matrices using a threshold e-value - Clustering algorithms operated on the resultant graphs to give rise to the several clusters Example on clustering
Clustering (what we will have) Hierarchical clustering
…..
Current work Parallelizing BLAST Clustering / hierarchical clustering Hierarchical Clustering Parallel Clustering All against All Future work