First release of HOGENOM, a database of homologous genes from complete genome Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et.

Slides:



Advertisements
Similar presentations
GBrowse at TAIR Philippe Lamesch TAIR curator. Seqviewer.
Advertisements

Part I: Tips and Techniques from curators GBrowse at TAIR David Swarbreck.
DNAStructureandReplication. Transformation: Robert Griffith (1928)
Eukaryotic Intron Loss Tobias Mourier & Daniel C. Jeffares.
Human Genome Project What did they do? Why did they do it? What will it mean for humankind? Animation OverviewAnimation Overview - Click.
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
- A brief introduction in 4 hours -
P-POD The Princeton Protein Orthology Database Literature Discussion Tim Hulsen
WP 12 Contribution to Integr8 The HoGenom database : Families of homologous genes from complete genomes Work Package 12: Equipe Bioinformatique et Génomique.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
Molecular Evidence Using DNA, RNA or Protein Sequences to Classify Organisms.
Alternative splicing and evolution Daniel Jeffares.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Protein Sequence Classification Using Neighbor-Joining Method
Genomics Complete Genomes in The Public DataBases >100 Non-Eukaryotes Eukaryotes: Leishmania 257 Kb 79 orfs Plasmodium falciparum I 947 Kb 205.
Bioinformatics and Data Warehousing 1)Introduction to Bioinformatics 2)FASTA File Format 3)Searching Gene Sequences (BLAST) 4)Data Management in Biomedical.
Model Organisms and Databases. Model Organisms Characteristics of model organisms in genetics studies –Genetic history well known –Short life cycle; large.
EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI.
Comparative Genomics of the Eukaryotes
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Using The Gene Ontology: Gene Product Annotation.
« Databases of homologous gene families for comparatives genomics » Poster 23 - JOBIM Nantes - Juin 2009 Databases of homologous gene families for comparatives.
Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)
Meiosis Organisms that reproduce sexually have specialized cells called gametes (sex cells) Gametes are the result of a type of cell division called meiosis.
HOGENOM a phylogenomic database
Genomes School B&I TCD Bioinformatics May Genome sizes Completed eukaryotic nuclear genomes Type of organismSpeciesGenome size (10 6 base pairs)
The Complex Portal - relationship to Gene Ontology Sandra Orchard (IntAct)
Ontologies, data standards and controlled vocabularies.
This presentation was originally prepared by C. William Birky, Jr. Department of Ecology and Evolutionary Biology The University of Arizona It may be used.
IGEM 101: Session 7 4/2/15Jarrod Shilts 4/5/15Ophir Ospovat.
Proteins dictate function in an organism:
1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
Databases of homologous gene families: new developments and web interfaces. Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et Biologie.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Identification of Ortholog Groups by OrthoMCL Protein sequences from organisms of interest All-against-all BLASTP Between Species: Reciprocal best similarity.
Comparative genomics Haixu Tang School of Informatics.
Using blast to study gene evolution – an example.
Phylogenetic analysis taken from and es/MSAPhylogeny.htm.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Chapter 1 Introduction.
Construction of Substitution matrices
Lecture 21 – Genome Annotation & Sequenced Genomes Based on Chapther 8 Genomics: The Mapping and Sequencing of Genomes Copyright © 2010 Pearson Education.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
Chapter 11 Meiosis & Genetics What do you think meiosis makes?
WSSP Chapter 10 Literature Search Where do you learn about the function of your gene? atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
BLAST and Psi-BLAST and MSA Nov. 1, 2012 Workshop-Use BLAST2 to determine local sequence similarities. Homework #6 due Nov 8 Chapter 5, Problem 8 Chapter.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
What’s new in GO?. Priorities Annotation outreach Reference genomes User advocacy Ontology development Software.
Annotating with GO: an overview
Exploring Molecular Evolution
Basics of BLAST Basic BLAST Search - What is BLAST?
Generating Multiple Sequence Alignments with ClustalW
Evolution of Biochemical Pathways
Evolution of eukaryote genomes
Exploring Molecular Evolution
Purposes: To demonstrate the tendency of proteins to become longer with increase of organism complexity To study domain architecture of proteins and to.
Generating Multiple Sequence Alignments with Clustal Omega
Multiple sequence alignment & Phylogenetics Analysis
Identification of novel F-box proteins in Xenopus laevis
Volume 10, Issue 14, Pages R512-R513 (July 2000)
Phylogenetic trees of the closest eukaryotic homologs of clones AY and AY Phylogenetic trees of the closest eukaryotic homologs of clones.
Computational genomics
Presentation transcript:

First release of HOGENOM, a database of homologous genes from complete genome Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et Biologie Evolutive Université Claude Bernard - Lyon 1 Simon Penel, Laurent Duret, Pascal Calvat, Jean-François Dufayard, Guy Perrière, Manolo Gouy. POSTER JO 60

Homologous Genes Databases Research fields: Proteome/genome comparative analysis Phylogenetic studies Orthology/Paralogy relationship assignments Development of generic databases, specialised databases –HOVERGEN: families of homologous vertebrate genes –HOBACGEN: families of homologous bacterial genes –NureBase, RTKdb, Hoppsigen, Mitalib, Polymorphix..

Contents: Nucleic and protein sequences Sequence annotations Taxonomic data Protein multiple alignments Phylogenetic trees The HoGenom database: Homologous Genes Families from fully Sequenced Organisms European project TEMBLOR

The HoGenom database: Building of Database European Bioinformatic Institute Data selection 1 sequence  many species Proteome sets Rat etc. Mouse Human SwissProt TrEMBL TrEMBL-new Protein sequences 1 sequence  1 species

Filtering (SEG) Local pairwise alignments  The HoGenom database: Building of Database Similarity search BLASTP BLOSUM62 E ≤ Parralelised calculations at IN2P3

Clustering into families A B A C HSP ≥ 80 % length Similarity ≥ 50 % 1 : Clustering of complete sequences into families 2 : Including partial sequences to the families defined previously The HoGenom database: Building of Database C B A Cluster A, B, C Protein Family

Protein family ABCDEFGABCDEFG BIONJ Neighbor joining, Observed divergence Partial sequences: distance matrix with missing values Multiple alignment ABCDEFGABCDEFG Rooting: mid-point Phylogenetic tree G F E D C B A CLUSTAL W Default parameters Alignments and trees The HoGenom database: Building of Database

Arabidopsis thaliana (plant) Caenorhabditis elegans (nematod) Drosophila melanogaster (fly) Encephalitozoon cuniculi (microsporidia) Guillardia theta (alguae) Homo sapiens (man) Mus musculus (mouse) Rattus norvegicus (rat) Saccharomyces cerevisiae (yeast) Schizosaccharomyces pombe (fungus) proteins, cds families 31% 9% 60% 117 organisms The HoGenom database: Contents

WWW Query Query on sequences and families according to multiple criteria Cross Taxa Query on families according to complex taxonomic criteria Querying the databases

POSTER JO-60 à suivre…