Nucleotide sequence alignments in Compara Stephen Fitzgerald

Slides:



Advertisements
Similar presentations
Ensembl Compara Perl API Stephen Fitzgerald EBI - Wellcome Trust Genome Campus, UK compara.
Advertisements

Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
Comparative genomics Joachim Bargsten February 2012.
Linux Platform  Download the source tar ball from the BLAST source code link  ncbi-blast src.tar.gz  Compilation  cd /BLASTdirectory/c++ ./configure.
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
How to access genomic information using Ensembl August 2005.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Genome Annotation BCB 660 October 20, From Carson Holt.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Aequatus Browser, an open-source web-based tool developed at TGAC to visualise homologous gene structures among differing species or subtypes of a common.
© Wiley Publishing All Rights Reserved.
Mouse Genome Sequencing
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
Gene prediction in flies ● Background ● Gene prediction pipeline ● Resources.
Genome Alignment. Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment.
Genome alignment Usman Roshan. Applications Genome sequencing on the rise Whole genome comparison provides a deeper understanding of biology – Evolutionary.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Bulk data files // TeraGrid uses for Genome Databases GMOD meet, June 2006 Don Gilbert,
K Phone: Web: A Software Package for the Design and Analysis of Microbial Functional.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
GMOD/GBrowse_syn Sheldon McKay Reactome Ontario Institute for Cancer Research.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
Data Mining in Ensembl with BioMart Giulietta Spudich.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE.
Tutorial 3 BLAST 1. BLAST tutorial How to use BLAST Score vs. E-value Exercise Cool story of the day: How Alzheimer is studied in yeast 2.
By Michael Han Sanger Wormbase Group SAB 2008 Comparative Genomics with.
Protein Homologue Clustering and Molecular Modeling L. Wang.
Fgenes++ pipelines for automatic annotation of eukaryotic genomes Victor Solovyev, Peter Kosarev, Royal Holloway College, University of London Softberry.
Doug Raiford Phage class: introduction to sequence databases.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
What is BLAST? Basic BLAST search What is BLAST?
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Aequatus-vis Anil Thanki Scientific Programmer –
Data Loading into Ensembl Database TGAC Browser
Considerations for multi-omics data integration Michael Tress CNIO,
A New Interface to GeneKeyDB Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng.
Galaxy for analyzing genome data Hardison October 05, 2010
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Kerstin Lindblad-Toh1 et al.
The Transcriptional Landscape of the Mammalian Genome
Gramene Technical Improvements
Genome alignment Usman Roshan.
Comparative Genomics.
Pipelines for Computational Analysis (Bioinformatics)
ENCODE Pseudogenes and Transcription
BLAST.
BLAT Blast Like Alignment Tool
What’s New At Hypercube
Problems from last section
What’s New At Hypercube
Welcome - webinar instructions
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

Nucleotide sequence alignments in Compara Stephen Fitzgerald

What is Ensembl Compara? A single database which contains precalculated comparative genomics data Access via perl API and mysql A production system for generating that database (not in this presentation)‏

Compara database & the Ensembl core databases Since there is minimal primary data inside Compara, to gain full access to the data external links with core DBs must be re- established Example: compara_63 must be linked with the Ensembl core_63 databases Proper REGISTRY configuration is critical. load_registry_from_db is probably the best choice here

Sequence types and outputs Nucleotide sequence Pairwise alignments Multiple alignments Syntenic regions Protein sequence Families Protein trees Homologues

Sequence types and outputs Nucleotide sequence Pairwise alignments Multiple alignments Syntenic regions Protein sequence Families Protein trees Homologues

Pipelines and outputs for nucleotide sequence Pairwise alignmentsMultiple alignmentsSyntenic regions BlastztBLATMercator & PECANEnredo-Pecan-OrtheusBlastz

Pipelines and outputs for nucleotide sequence Pairwise alignmentsMultiple alignmentsSyntenic regions BlastztBLAT Mercator & Pecan Enredo-Pecan-OrtheusBlastz mammals vertebrates : more distant homologies

Pipelines and outputs for nucleotide sequence Pairwise alignmentsMultiple alignmentsSyntenic regions BlastztBLAT Mercator & Pecan Enredo-Pecan-OrtheusBlastz mammals vertebrates : more distant homologies

Generating multiple alignments We build homology maps for multiple alignments using Mercator : A graph based program, which uses exon sequences as anchors. It does not allow for the alignment of duplicated regions in a genome. Enredo : Also graph based. Use conserved regions from pairwise blastz alignments of whole genomes as anchors. It does allow for the alignment of duplicated regions. Alignment is done using Pecan. Ancestral sequences are generated using Ortheus.

MSA in Compara way eutherian mammals Ensembl 63 Mercator Pecan 19-way amniota veretebrates 5-way fish 12-way eutherian mammals EPO 2x EPO 3-way birds 6-way primate

Alignments are stored in the genomic_align and genomic_align_block tables gorilla_gorilla/MT/ gacat-ttaactaaaac-ccc macaca_mulatta/MT/ aacatcttaactaaacg-ccc pan_troglodytes/MT/ gatac-ttaacttaaaccccc pongo_pygmaeus/MT/ actac-ctaactaaaac-ccc homo_sapiens/MT/ gacat-ttaactaaaac-ccc * ***** ** *** GACATTTAACTAAAACCCC 5MD11MD3M AACATCTTAACTAAACGCCC 17MD3M GATACTTAACTTAAACCCCC 5MD15M ACTACCTAACTAAAACCCC 5MD11MD3M GACATTTAACTAAAACCCC 5MD11MD3M 5 genomic_align entries 1 genomic_align_block Sequences from core A small example :

Low coverage genomes cannot be fully assembled Resulting assembly is too scattered to be used with Enredo Run EPO on high-coverage genomes only Map 2X genomes using pairwise alignments Adding low-coverage (2X) genomes

Gerp Constrained Elements Stretches of the alignment with a high conservation Constrained elements and coding exons 74% of coding exons are associated with constr. elem. 22% of constr. elem. are associated with coding exons Cooper et al. Genome Research, 2005

ensembl-dev mailing list and HelpDesk ensembl-dev mailing list is great for questions around the API and the DB HelpDesk is very helpful Give detailed info on what you are trying to do Check that you have the modules installed ($PERL5LIB pointing to them)

Ensembl Compara Team: Javier Kathryn Matthieu Leo Stephen Miguel