Development of a Chicken Unigene Database Project No. 9 Mentors: Dr. Wellington Martins - Dr. Joan Burnside Animal Science Dept. University of Delaware.

Slides:



Advertisements
Similar presentations
RNA-Seq based discovery and reconstruction of unannotated transcripts
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
Efficient Clustering of Large EST Data Sets on Parallel Computers CECS Bioinformatics Journal Club September 17, 2003 Nucleic Acids Research, 2003,
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Babelomics Functional interpretation of genome-scale experiments Barcelona, 28 November de 2007 Ignacio Medina David Montaner
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Tae-Hyung Kim 1 Gil-Mi Ryu 1,2 InSong Koh 2 Jong Park 3 1.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Simple Efficient Algorithm for MPQ-tree of an Interval Graph Toshiki SAITOH Masashi KIYOMI Ryuhei UEHARA Japan Advanced Institute of Science and Technology.
Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences.
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
Graph and Topological Structure Mining on Scientific Articles Fan Wang, Ruoming Jin, Gagan Agrawal and Helen Piontkivska The Ohio State University The.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Clustering Metabolic Networks Using Minimum Cut Trees Ryan Kellogg 1, Allison Heath 2, Lydia Kavraki 2,3 1 Carnegie Mellon University, Department of Electrical.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
INTRODUCTION ● Expressed sequence tags offer a low cost approach to gene discovery ● For a range of non-model organisms, ESTs represent the only sequence.
Genomics.
Developing a Software Package for Conceptualizing Molecular Findings Xinghua Lu, Harry Hocheiser & Vicky Chen Dept Biomedical Informatics.
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
EB3233 Bioinformatics Introduction to Bioinformatics.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
It will help in preparing for the exam to read:
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
BIOL 433 Plant Genetics Term 2, Instructors: Dr. George Haughn Dr. Ljerka Kunst BioSciences 2239BioSciences Tel
What is BLAST? Basic BLAST search What is BLAST?
Indexing genomic sequences 逢甲大學 資訊工程系 許芳榮. Outline Introduction Unique markers Multi-layer unique markers Locating SNP on genome Aligning EST to genome.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
What is BLAST? Basic BLAST search What is BLAST?
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Basics of BLAST Basic BLAST Search - What is BLAST?
Lettuce/Sunflower EST CGPDB project.
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics II
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
BIOL 433 Plant Genetics Term 2,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Alternative Splicing and my research report
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Development of a Chicken Unigene Database Project No. 9 Mentors: Dr. Wellington Martins - Dr. Joan Burnside Animal Science Dept. University of Delaware Jianshan Tang Ruoming Jin Department of CIS University of Delaware Lilian Lacoste DBI - French National School of Aeronautics and Space

Results • 2815 contigs • 6390 singlets 17,090 ESTs Phrap 9,205 cluster Phrap Clustering Result:

Second clustering method : using BLAST output Contig 1 BLAST output1 Contig 2 BLAST output2 Filtering Parsing Comparing Similarity function Similarity matrix

What ' s " gbc " ?  Graph Based Clustering  Clustering, a process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters.  Graph, the relation of the data could be expressed as graph  If there is a relation of two nodes, one edge connects them  Working in bioinformatics  Protein sequence clustering  EST clustering  A lot of other applications!  Objective of "gbc"  Support different input format  Efficiently support very large sparse graph clustering  Flexible to use by user

How to use " gbc "  Output  Cluster number, and all the nodes belongs to the cluster  Clique clustering  a clique is a completely connected subgraph  each maximal clique in the graph becomes a cluster  clusters many overlap  generally produces small but very tight clusters  Single-link clustering  A maximal connected subgraph becomes a cluster  produces larger but weaker clusters

A little about Implementation Works  Two clustering algorithm  Single-link  Clique  Graph Classes  Efficiently support dense/sparse graph  Provide the same interface without modifying clustering code

Analysis program Reset BLAST output Change matrix threshold Reset semantics Run analysis New contig set Number of contigs Comparison algorithm Clustering algorithm Results output Analysis tools Process log output

Analysis tools : contig information Display the BLAST output : - sequences references - sequences annotations - percentage of matching basepairs Display the list of contigs sorted according to their best matching percentage in the BLAST output

Analysis tool : EST selector Display : - frequency vs length (in ESTs) of contigs - list of ESTs in a contig Allows to select the best representative EST according to length and tissue type

First results On a set of 400 contigs representing 1000 ESTs Contig number :79 Contig size :743 Best matching fraction : gb|AF |AF Gallus gallus Rad54b (RAD54B) mRNA, compl e-160 gb|BC |BC Homo sapiens, RAD54, S. cerevisiae, homol e-31 ref|XM_ | Homo sapiens RAD54, S. cerevisiae, homolog of, e-31 gb|AF |AF Homo sapiens RAD54B protein (RAD54B) mRNA e-31 ref|NM_ | Homo sapiens RAD54, S. cerevisiae, homolog of, e-31 emb|AL |HSM Homo sapiens mRNA; cDNA DKFZp434J1672 ( e-31 dbj|AP |AP Homo sapiens genomic DNA, chromosome 8q e-11 gb|AC |AC Homo sapiens chromosome 8, clone RP Contig number :133 Contig size :740 Best matching fraction : gb|AF |AF Gallus gallus Rad54b (RAD54B) mRNA, compl gb|BC |BC Homo sapiens, RAD54, S. cerevisiae, homol e-44 ref|XM_ | Homo sapiens RAD54, S. cerevisiae, homolog of, e-44 gb|AF |AF Homo sapiens RAD54B protein (RAD54B) mRNA e-44 ref|NM_ | Homo sapiens RAD54, S. cerevisiae, homolog of, e-44 emb|AL |HSM Homo sapiens mRNA; cDNA DKFZp434J1672 ( e-44 dbj|AP |AP Homo sapiens genomic DNA, chromosome 8q e-11 gb|AC |CBRG45G04 Caenorhabditis briggsae cosmid G45G04, c dbj|AB |AB Arabidopsis thaliana genomic DNA, chromo

References • Gene Index analysis of the human genome estimates approximately 120,000 genes. Liang- Feng; Holt-Ingeborg, Pertea-Geo, Karamycheva-Svetlana, Salzberg-Steven-L, Quackenbush-John Nature-Genetics. June, 2000; 25 (2): • The TIGR Gene Indices: Reconstruction and representation of expressed gene sequences Quackenbush-John, Liang-Feng, Holt-Ingeborg, Pertea-Geo, Upton-Jonathan Nucleic-Acids- ResearchJan. 1, 2000; 28 (1): • IMAGEne I: Clustering and ranking of I.M.A.G.E. cDNA clones corresponding to known genes. Cariaso-M, Folta-P, Wagner-M, Kuczmarski-T, Lennon-G Bioinformatics-Oxford. Dec., 1999; 15 (12): • R. Larson, M. Hearst : Content analysis - Lecture from University of California, Berkeley School of information management and systems • T. Ono, H. Hishigaki, A. Tanigami, T. Takagi - Automated extraction of information on protein- protein interaction from biological literature. Bioinformatics vol 17 no 2 - Oxford University Press • I. Iliopoulos, A.J. Enright, C.A. Ouzounis - TEXTQUEST: document clustering of medline abstracts for concept discovery in molecular biology. EMBL Cmabridge Outstation, Cambridge CB10 ISD, UK.