Presentation is loading. Please wait.

Presentation is loading. Please wait.

Development of a Chicken Unigene Database Project No. 9 Mentors: Dr. Wellington Martins - Dr. Joan Burnside Animal Science Dept. University of Delaware.

Similar presentations


Presentation on theme: "Development of a Chicken Unigene Database Project No. 9 Mentors: Dr. Wellington Martins - Dr. Joan Burnside Animal Science Dept. University of Delaware."— Presentation transcript:

1 Development of a Chicken Unigene Database Project No. 9 Mentors: Dr. Wellington Martins - Dr. Joan Burnside Animal Science Dept. University of Delaware Jianshan Tang Ruoming Jin Department of CIS University of Delaware Lilian Lacoste DBI - French National School of Aeronautics and Space

2 Results • 2815 contigs • 6390 singlets 17,090 ESTs Phrap 9,205 cluster Phrap Clustering Result:

3 Second clustering method : using BLAST output Contig 1 BLAST output1 Contig 2 BLAST output2 Filtering Parsing Comparing Similarity function Similarity matrix

4 What ' s " gbc " ?  Graph Based Clustering  Clustering, a process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters.  Graph, the relation of the data could be expressed as graph  If there is a relation of two nodes, one edge connects them  Working in bioinformatics  Protein sequence clustering  EST clustering  A lot of other applications!  Objective of "gbc"  Support different input format  Efficiently support very large sparse graph clustering  Flexible to use by user

5 How to use " gbc "  Output  Cluster number, and all the nodes belongs to the cluster  Clique clustering  a clique is a completely connected subgraph  each maximal clique in the graph becomes a cluster  clusters many overlap  generally produces small but very tight clusters  Single-link clustering  A maximal connected subgraph becomes a cluster  produces larger but weaker clusters

6 A little about Implementation Works  Two clustering algorithm  Single-link  Clique  Graph Classes  Efficiently support dense/sparse graph  Provide the same interface without modifying clustering code

7 Analysis program Reset BLAST output Change matrix threshold Reset semantics Run analysis New contig set Number of contigs Comparison algorithm Clustering algorithm Results output Analysis tools Process log output

8 Analysis tools : contig information Display the BLAST output : - sequences references - sequences annotations - percentage of matching basepairs Display the list of contigs sorted according to their best matching percentage in the BLAST output

9 Analysis tool : EST selector Display : - frequency vs length (in ESTs) of contigs - list of ESTs in a contig Allows to select the best representative EST according to length and tissue type

10 First results On a set of 400 contigs representing 1000 ESTs Contig number :79 Contig size :743 Best matching fraction :0.43587786259541983 gb|AF178529.1|AF178529 Gallus gallus Rad54b (RAD54B) mRNA, compl... 571 e-160 gb|BC001965.1|BC001965 Homo sapiens, RAD54, S. cerevisiae, homol... 143 2e-31 ref|XM_005161.3| Homo sapiens RAD54, S. cerevisiae, homolog of,... 143 2e-31 gb|AF112481.1|AF112481 Homo sapiens RAD54B protein (RAD54B) mRNA... 143 2e-31 ref|NM_012415.1| Homo sapiens RAD54, S. cerevisiae, homolog of,... 143 2e-31 emb|AL133578.1|HSM801429 Homo sapiens mRNA; cDNA DKFZp434J1672 (... 143 2e-31 dbj|AP003534.1|AP003534 Homo sapiens genomic DNA, chromosome 8q2... 76 3e-11 gb|AC009623.6|AC009623 Homo sapiens chromosome 8, clone RP11-219... 40 1.7 Contig number :133 Contig size :740 Best matching fraction :0.9413109756097561 gb|AF178529.1|AF178529 Gallus gallus Rad54b (RAD54B) mRNA, compl... 1235 0.0 gb|BC001965.1|BC001965 Homo sapiens, RAD54, S. cerevisiae, homol... 184 5e-44 ref|XM_005161.3| Homo sapiens RAD54, S. cerevisiae, homolog of,... 184 5e-44 gb|AF112481.1|AF112481 Homo sapiens RAD54B protein (RAD54B) mRNA... 184 5e-44 ref|NM_012415.1| Homo sapiens RAD54, S. cerevisiae, homolog of,... 184 5e-44 emb|AL133578.1|HSM801429 Homo sapiens mRNA; cDNA DKFZp434J1672 (... 184 5e-44 dbj|AP003534.1|AP003534 Homo sapiens genomic DNA, chromosome 8q2... 76 3e-11 gb|AC084633.1|CBRG45G04 Caenorhabditis briggsae cosmid G45G04, c... 44 0.11 dbj|AB018110.1|AB018110 Arabidopsis thaliana genomic DNA, chromo... 44 0.11

11 References • Gene Index analysis of the human genome estimates approximately 120,000 genes. Liang- Feng; Holt-Ingeborg, Pertea-Geo, Karamycheva-Svetlana, Salzberg-Steven-L, Quackenbush-John Nature-Genetics. June, 2000; 25 (2): 239-240. • The TIGR Gene Indices: Reconstruction and representation of expressed gene sequences Quackenbush-John, Liang-Feng, Holt-Ingeborg, Pertea-Geo, Upton-Jonathan Nucleic-Acids- ResearchJan. 1, 2000; 28 (1): 141-145 • IMAGEne I: Clustering and ranking of I.M.A.G.E. cDNA clones corresponding to known genes. Cariaso-M, Folta-P, Wagner-M, Kuczmarski-T, Lennon-G Bioinformatics-Oxford. Dec., 1999; 15 (12): 965-973. • R. Larson, M. Hearst : Content analysis - Lecture from University of California, Berkeley School of information management and systems 1998. http://www.sims.berkeley.edu/courses/is202/f98/Lecture16/sld001.htmGib • T. Ono, H. Hishigaki, A. Tanigami, T. Takagi - Automated extraction of information on protein- protein interaction from biological literature. Bioinformatics vol 17 no 2 - Oxford University Press 2001. • I. Iliopoulos, A.J. Enright, C.A. Ouzounis - TEXTQUEST: document clustering of medline abstracts for concept discovery in molecular biology. EMBL Cmabridge Outstation, Cambridge CB10 ISD, UK.


Download ppt "Development of a Chicken Unigene Database Project No. 9 Mentors: Dr. Wellington Martins - Dr. Joan Burnside Animal Science Dept. University of Delaware."

Similar presentations


Ads by Google