Download presentation
Presentation is loading. Please wait.
Published bySamantha Golden Modified over 9 years ago
1
Development of a Chicken Unigene Database Project No. 9 Mentors: Dr. Wellington Martins - Dr. Joan Burnside Animal Science Dept. University of Delaware Jianshan Tang Ruoming Jin Department of CIS University of Delaware Lilian Lacoste DBI - French National School of Aeronautics and Space
2
Results 2815 contigs 6390 singlets 17,090 ESTs Phrap 9,205 cluster Phrap Clustering Result:
3
Second clustering method : using BLAST output Contig 1 BLAST output1 Contig 2 BLAST output2 Filtering Parsing Comparing Similarity function Similarity matrix
4
What ' s " gbc " ? Graph Based Clustering Clustering, a process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters. Graph, the relation of the data could be expressed as graph If there is a relation of two nodes, one edge connects them Working in bioinformatics Protein sequence clustering EST clustering A lot of other applications! Objective of "gbc" Support different input format Efficiently support very large sparse graph clustering Flexible to use by user
5
How to use " gbc " Output Cluster number, and all the nodes belongs to the cluster Clique clustering a clique is a completely connected subgraph each maximal clique in the graph becomes a cluster clusters many overlap generally produces small but very tight clusters Single-link clustering A maximal connected subgraph becomes a cluster produces larger but weaker clusters
6
A little about Implementation Works Two clustering algorithm Single-link Clique Graph Classes Efficiently support dense/sparse graph Provide the same interface without modifying clustering code
7
Analysis program Reset BLAST output Change matrix threshold Reset semantics Run analysis New contig set Number of contigs Comparison algorithm Clustering algorithm Results output Analysis tools Process log output
8
Analysis tools : contig information Display the BLAST output : - sequences references - sequences annotations - percentage of matching basepairs Display the list of contigs sorted according to their best matching percentage in the BLAST output
9
Analysis tool : EST selector Display : - frequency vs length (in ESTs) of contigs - list of ESTs in a contig Allows to select the best representative EST according to length and tissue type
10
First results On a set of 400 contigs representing 1000 ESTs Contig number :79 Contig size :743 Best matching fraction :0.43587786259541983 gb|AF178529.1|AF178529 Gallus gallus Rad54b (RAD54B) mRNA, compl... 571 e-160 gb|BC001965.1|BC001965 Homo sapiens, RAD54, S. cerevisiae, homol... 143 2e-31 ref|XM_005161.3| Homo sapiens RAD54, S. cerevisiae, homolog of,... 143 2e-31 gb|AF112481.1|AF112481 Homo sapiens RAD54B protein (RAD54B) mRNA... 143 2e-31 ref|NM_012415.1| Homo sapiens RAD54, S. cerevisiae, homolog of,... 143 2e-31 emb|AL133578.1|HSM801429 Homo sapiens mRNA; cDNA DKFZp434J1672 (... 143 2e-31 dbj|AP003534.1|AP003534 Homo sapiens genomic DNA, chromosome 8q2... 76 3e-11 gb|AC009623.6|AC009623 Homo sapiens chromosome 8, clone RP11-219... 40 1.7 Contig number :133 Contig size :740 Best matching fraction :0.9413109756097561 gb|AF178529.1|AF178529 Gallus gallus Rad54b (RAD54B) mRNA, compl... 1235 0.0 gb|BC001965.1|BC001965 Homo sapiens, RAD54, S. cerevisiae, homol... 184 5e-44 ref|XM_005161.3| Homo sapiens RAD54, S. cerevisiae, homolog of,... 184 5e-44 gb|AF112481.1|AF112481 Homo sapiens RAD54B protein (RAD54B) mRNA... 184 5e-44 ref|NM_012415.1| Homo sapiens RAD54, S. cerevisiae, homolog of,... 184 5e-44 emb|AL133578.1|HSM801429 Homo sapiens mRNA; cDNA DKFZp434J1672 (... 184 5e-44 dbj|AP003534.1|AP003534 Homo sapiens genomic DNA, chromosome 8q2... 76 3e-11 gb|AC084633.1|CBRG45G04 Caenorhabditis briggsae cosmid G45G04, c... 44 0.11 dbj|AB018110.1|AB018110 Arabidopsis thaliana genomic DNA, chromo... 44 0.11
11
References Gene Index analysis of the human genome estimates approximately 120,000 genes. Liang- Feng; Holt-Ingeborg, Pertea-Geo, Karamycheva-Svetlana, Salzberg-Steven-L, Quackenbush-John Nature-Genetics. June, 2000; 25 (2): 239-240. The TIGR Gene Indices: Reconstruction and representation of expressed gene sequences Quackenbush-John, Liang-Feng, Holt-Ingeborg, Pertea-Geo, Upton-Jonathan Nucleic-Acids- ResearchJan. 1, 2000; 28 (1): 141-145 IMAGEne I: Clustering and ranking of I.M.A.G.E. cDNA clones corresponding to known genes. Cariaso-M, Folta-P, Wagner-M, Kuczmarski-T, Lennon-G Bioinformatics-Oxford. Dec., 1999; 15 (12): 965-973. R. Larson, M. Hearst : Content analysis - Lecture from University of California, Berkeley School of information management and systems 1998. http://www.sims.berkeley.edu/courses/is202/f98/Lecture16/sld001.htmGib T. Ono, H. Hishigaki, A. Tanigami, T. Takagi - Automated extraction of information on protein- protein interaction from biological literature. Bioinformatics vol 17 no 2 - Oxford University Press 2001. I. Iliopoulos, A.J. Enright, C.A. Ouzounis - TEXTQUEST: document clustering of medline abstracts for concept discovery in molecular biology. EMBL Cmabridge Outstation, Cambridge CB10 ISD, UK.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.