P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
Phylogenetic reconstruction
PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen BeNeLux BioInformatics Conference 2006.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
MDI Retraite 2007 Evolution of the immune system from model organism to man Tim Hulsen 1, Wilco W.M. Fleuren 1, Peter M.A. Groenen 2 1 CMBI, Radboud University.
Benchmarking Orthology in Eukaryotes Nijmegen Tim Hulsen.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics and Phylogenetic Analysis
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
The Protein Data Bank (PDB)
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
An update on ongoing projects within Biorange SP Biorange Project Meeting Leiden, September 15 Tim Hulsen.
How to use the web for bioinformatics Ethan Strauss X 1171
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Pathway Assignments. The assignment – Annotating Pathways KEGG Pathway Database.
Genomics in Drug Organon, Oss Tim Hulsen.
1 LSM2241 AY0910 Semester 2 MiniProject Briefing Round 5.
PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.
Bioinformatics and Computational Biology
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Automatic and manual sequence alignment Inferring phylogenetic trees Mining web-based databases Estimating rates of molecular evolution Testing evolutionary.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
GeneConnect Use Cases and Design August 3, GeneConnect Database IDs are linked by Direct Annotation, Inferred Annotation, or Sequence Alignment.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
Using BLAST to Identify Species from Proteins
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Pipelines for Computational Analysis (Bioinformatics)
Using BLAST to Identify Species from Proteins
Ensembl Genome Repository.
Comparative Genomics.
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
Explore Evolution: Instrument for Analysis
Multiple sequence alignment & Phylogenetics Analysis
Unit Genomic sequencing
Problems from last section
Using BLAST to Identify Species from Proteins
Welcome - webinar instructions
Introduction to Bioinformatics
Presentation transcript:

P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics Course (CMPUT 606), Instructed by Prof. Guohui Lin, Computing Science Department, University of Alberta, Winter 2009 Tim Hulsen et.al., Nucleic Acids Research, 2009, Vol. 37, Database issue

I NTRODUCTION Phylogenetic patterns Show the presence or absence of certain genes in a set of whole genome sequences Can be used to determine sets of genes that occur only in certain evolutionary branches More Common as increasing amounts of orthology data have become available Phylogenetic Patterns Search tools are available for querying proteins, but not for querying genes 2

P HYLO P AT PhyloPat is a database which offers the possibility of querying the Ensembl database using any phylogenetic pattern Functionalities : Gene neighborhood view Anticorrelating patterns Support of Entrez ‘ Gene IDs Direct sequence retrieval of members of a phylogenetic lineage 3

E NSEMBL Human genome 3 billion base-pairs 35,000 genes The genome alone is of little use Locations and relationships of individual genes Manual annotation Ensembl Ensembl (freely accessible) Sequence data is fed into a software "pipeline“ Creates a set of predicted gene locations Saves them in a MySQL database Originally focus on Human Now includes mouse, fruitfly, zebrafish, plants, fungi, … 4

P HYLO P AT - D ATABASE C ONTENT A set of phylogenetic lineages Complete set of orthologies Collected All 39 species’ genes in Ensembl 741 species pairs genes orthologous relationships one-to-one one-to-many many-to-many Ensembl ortholog detection pipeline Similarity values by Best reciprocal hits and best score ratio (WU BLASTP) Graph of gene relations and Clustering Multiple alignment (MUSCLE ) Phylogenetic tree (TreeBeST ) Orthologous relationships 5

P HYLO P AT - D ATABASE C ONSTRUCTION Generating phylogenetic lineages Determining evolutionary order Using the NCBI Taxonomy Phylogenetic tree  Phylogenetic lineages For each gene in the first species Look for orthologs in the other species Add all orthologs to the phylogenetic lineage Check for orthologs themselves, until no additional orthologies were found for any of the genes Repeat for all genes in all 39 species that were not yet connected to any phylogenetic lineage 6

W EB A PPLICATION A web interface Query the PhyloPat MySQL database Phylogenetic lineages Phylogenetic patterns 7

O MNIPRESENT - O LIGOPRESENT - P OLYPRESENT G ENES Omnipresent Genes present in all 39 species phylogenetic pattern ‘ ’ (or MySQL regular expression ‘^1+$’) 688 omnipresent genes Which most likely have important functions, since they are present in all species. Oligopresent Genes that exist in only one or two species Which species are evolutionary most related Polypresent Genes that are missing in only one or two species Measure for evolutionary relatedness 8

A NTICORRELATING PATTERNS Patterns that are exactly opposite Phylogenetic lineages with anticorrelating patterns can be functionally completely different, but could also be highly similar in function ‘ ’ ‘ ’ These genes can be analogous i.e. performing a similar function without being evolutionary related. 9

G ENE N EIGHBORHOOD Inferring ‘true’ orthology Orthologous conservation of gene neighborhood Human gene ENSG Has two predicted orthologs in chimpanzee: gene ENSPTRG gene ENSPTRG Only correspond to the gene neighborhoods of gene ENSPTRG , for nine of the nearest neighbors Inferring functional annotation Build hypotheses about the processes or pathways that genes might be involved in 10

FASTA- FORMAT S EQUENCE F ILES Both the pattern search output and the gene neighborhood view contain links to FASTA files of the peptide sequences 11

D ISCUSSION AND C ONCLUSION PhyloPat is useful in Orthology detection Evolutionary studies Gene annotation Complex Queries It is possible to determine A species set that should be included (1), A species set that should be excluded (0) A species set which presence is indifferent (*) Using of regular expression queries Easy-to-use web interface Relies only on one database (Ensembl) 12

D ISCUSSION AND C ONCLUSION (C ONT.) Gene neighborhood view Locating evolutionary-related genomic clusters of genes Detecting the ‘true orthologs’ within large sets of predicted orthologs Functional annotating less well known genes PhyloPat will be updated with each major Ensembl release to ensure up-to-date and reliable phylogenetic lineages (species added) 13

L INEAGE INFORMATION OF PP

Q UESTIONS 15