Raymond Ripp, Julie D. Thompson, Frédéric Plewniak, Jean-Claude Thierry, Olivier Poch Laboratoire de BioInformatique et Génomique Intégratives du Département.

Slides:



Advertisements
Similar presentations
Protein sequence analysis is a key issue in post-genomic biology. High-throughput genome sequencing and assembly techniques, structural proteomics and.
Advertisements

Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Guillaume Berthommier¹, Dominique Santiard-Baron², Olivier Poch¹ and Raymond Ripp¹ ¹ Laboratoire de BioInformatique et Génomique Intégratives IGBMC (CNRS.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
From cDNA to integrative protein annotation and beyond: application to Alvinella pompejana cDNA collection Gagnière, N. 1, Bigot, Y. 2, Gaill, F. 3, Higuet,
Laura Cammas 1, Guillaume Berthommier 2, Raymond Ripp 2, Pascal Dollé 1 1 Component B, Departement of Physiological Genetics 2 Component T, Laboratoire.
Comparative analysis of ribosomal proteins in complete genomes: ribosome “striptease” in Archaea Odile Lecompte, Raymond Ripp, Jean-Claude Thierry, Dino.
Alvinella pompejana cDNA collection Gagnière, N. 1, Bigot, Y. 2, Brelivet, Y. 1, Busso, D. 3, Chénais, B. 4, Gaill, F. 5, Higuet, D. 6, Jollivet, D. 7,
L. Poidevin, W. Raffelsberger, R. Reddy, G. Berthommier, N. Gagnière, R. Ripp and O. Poch Laboratoire de BioInformatique et Génomique Intégratives IGBMC.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
A Naive Bayesian Classifier To Assign Protein Sequences to Protein Subfamilies Learning Set Test Set The development of high throughput technologies in.
Protein structure (Part 2 of 2).
The Protein Data Bank (PDB)
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Protein Modules An Introduction to Bioinformatics.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Protein and Function Databases
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
© Wiley Publishing All Rights Reserved.
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
Development of Bioinformatics and its application on Biotechnology
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
1 LSM2241 AY0910 Semester 2 MiniProject Briefing Round 5.
Solutions for the PLAZA genomics part of the SPICY workshop on genomics More information: Website:
Construction of Substitution Matrices
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Sackler Medical School
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
EB3233 Bioinformatics Introduction to Bioinformatics.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Finding genes in the genome
InterPro Sandra Orchard.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
What is BLAST? Basic BLAST search What is BLAST?
1 Aurélien Barré, 2 Pascal Sirand-Pugnet, 2 Xavier Foissac, 3 Eduardo P. C. Rocha, 1 Antoine de Daruvar and 2 Alain Blanchard 1 Centre de Bioinformatique.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Extending MAO : towards an Ontology of Genetic and Evolutionary Events Laboratory of Integrative BioInformatics and Genomics (LBGI), Department of Biology.
Introduction to PubChem BioAssay
Bioinformatics Overview
The Integrated Microbial Genome (IMG) systems
From: Phylogenetic Analysis of the ING Family of PHD Finger Proteins
Pipelines for Computational Analysis (Bioinformatics)
University of Pittsburgh
High-throughput Biological Data The data deluge
Department of Genetics • Stanford University School of Medicine
Predicting Active Site Residue Annotations in the Pfam Database
PIR: Protein Information Resource
Bioinformatics Vicki & Joe.
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
Explore Evolution: Instrument for Analysis
Volume 21, Issue 8, Pages (August 2014)
Different Genes ~ Protein Primary Structure
Welcome - webinar instructions
Presentation transcript:

Raymond Ripp, Julie D. Thompson, Frédéric Plewniak, Jean-Claude Thierry, Olivier Poch Laboratoire de BioInformatique et Génomique Intégratives du Département de Biologie et Génomique Structurales IGBMC (CNRS – UMR 7104), 1 rue Laurent Fries, Illkirch 67404, Strasbourg France The Identity Card associated to each target registered at EBI points to this web page showing the MACSIM files. Available features can be displayed on the multiple alignment : P-FAM and structural domains, conserved blocks, secondary structures, low complexity and transmembrane regions, bfunctional sites, sequence errors, splicing variants, etc. Additional information such as blast output, homologues description, phylogenetic tree, can be easily viewed or downloaded. An integrated software platform for target selection and characterisation References :  Lecompte,O., Thompson,J.D., Plewniak,F., Thierry,J. and Poch,O. (2001) Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene, 270, 17–30.  Plewniak F, Bianchetti L, Brelivet Y, Carles A, Chalmel F, Lecompte O, Mochel T, Moulinier L, Muller A, Muller J, Prigent V, Ripp R, Thierry JC, Thompson JD, Wicker N, Poch O. PipeAlign: A newtoolkit for protein family analysis.Nucleic Acids Res. 31,  Plewniak,F., Thompson,J.D. and Poch,O. (2000) Ballast: blast post-processing based on locally conserved segments. Bioinformatics, 9, 750–759.  Thompson,J.D., Plewniak,F., Thierry,J. and Poch,O. (2000) DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Res., 15, 2919–2926.  Thompson,J.D., Plewniak,F., Thierry,J. and Poch,O. (2003) RASCAL: Rapid scanning and correction of multiple sequence alignment programs. Bioinformatics, 19,  Thompson,J.D., Plewniak,F., Ripp,R., Thierry,J.C. and Poch,O. (2001) Towards a reliable objective function for multiple sequence alignments. J. Mol. Biol., 4, 937–951.  Wicker,N., Perrin,G.R., Thierry,J.C. and Poch,O. (2001) Secator: a program for inferring protein subfamilies from phylogenetic trees. Mol. Biol. Evol., 8, 1435–1441.  Thompson, J.D., Prigent, V., Poch, O. (2004) LEON: multiple aLignment Evaluation Of Neighbours. Nucleic Acids Res. 32,  Wicker N, Dembele D, Raffelsberger W, Poch O. (2002) Density of points clustering, application to transcriptomic data analysis.Nucleic Acids Res., 30, PipeAlign is a five step process ranging from the search for sequence homologues in protein and 3D structure databases to the definition of the hierarchical relationships within and between subfamilies. General overview of Gscope Gscope is our high throughput integration and analysis platform allowing genome investigation, database searches, running automatically various analysis and correlation. It allows data management and visualisation through a userfriendly interface. Protein analysis based on homology rely on the validated multiple alignment of complete sequences computed by DbClustal within the PipeAlign process. Beside the analysis of isolated sequences, Gscope provides interesting clustering schemes about sets of nucleic or peptidic sequences, focusing especially on structural genomics insights. The first step of Gscope is to create a database containing the basal information for each sequence of the project. Then database searches are performed using each protein sequence. This defines sets of orthologs used in further analysis. MACSIM highlights following features : Functional residues, Domain organisation 3D structure environment, Mutagenesis experiments, Comparative genomics (phylo. distribution) Abstract. In order to fully understand the potential biomedical role of a target protein, such diverse data as the type of organism, domain organisation, splicing variants, 2D/3D structures and mutations and their associated illnesses, must be organised into an information network for presentation to the experimentalist. The Gscope genomic annotation and analysis platform has been developed to allow automatic, high-throughput data collection, cross-validation and analysis of such heterogeneous information in a single, integrated environment. The integration of the protein in the context of the complete family is the essential first step in this process. Gscope therefore incorporates the PipeAlign protein family analysis toolkit in order to construct high quality, clustered multiple alignments of a potential target and its homologues identified by in-depth database searches. This provides the basis for the definition of the hierarchical relationships within and between subfamilies and for the reliable integration of all the structural and functional information available for the protein family. A new program, MACSIM, has been developed whose primary goal is to validate the quality of the data mined from the public databases and to propagate this information to the target of interest. The Gscope platform has been used to perform PipeAlign analyses of all targets in the SPINE target database and an “identity card” has been created for each potential target. These “identity cards” are provided in XML format and are accessible via the SPINE Web Site at Gscope integrates tools for the design, ordering and database management of oligos, inserts, pcr products, recombinants, sequence verification. These data can be easily sent to any Laboratory Information Management System. MACSIM perfoms propagation of pertinent information from sets of well annotated sequences their homologs detected within the multiple alignment. All Spine Targets are registered at the EBI Web Site Their description and characterisation, task status and associated information are hosted and updated at EBI. The Identity Card link points to the IGBMC Spine Target Website. The XML files of the Spine Targets were downloaded from and processed by Gscope at IGBMC using their protein sequence, gene name and definition. Results are hosted at the IGBMC Website and protected by a login and password. Specific additional analysis can be done on request, in particular DNA sequence analysis (codon usage, GC content, chromosome localisation,...)