Download presentation
Presentation is loading. Please wait.
Published byJason Hart Modified over 8 years ago
1
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06
2
Outline Introduction to FlyBase Introduction to Ensembl Using web databases to assist annotation of novel sequences
3
Introduction to FlyBase Available at http://www.flybase.orghttp://www.flybase.org
4
Introduction to FlyBase FlyBase is primarily funded by the National Institutes of Health FlyBase consortium includes Drosophila researchers and computer scientists at Harvard University, Indiana University, and University of Cambridge, plus scientists worldwide In addition to the main site at www.flybase.org, there are also many mirror siteswww.flybase.org
5
What is FlyBase? It is a comprehensive database of genetic and molecular data for many Drosophila species: Information on genes and mutant alleles Expression and function of gene products Genetic, cytological, molecular map information Data from Berkeley Drosophila Genome Project Data from European Drosophila Genome Project
6
Introduction to Ensembl Available at http://www.ensembl.orghttp://www.ensembl.org
7
What is Ensembl? Ensembl is a joint project between the European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute Ensembl seeks to develop an automated system for the production and maintenance of annotations on eukaryotic genomes These annotations should also be easily accessible to researchers
8
What is Ensembl? While originally developed for eukaryotes, the Ensembl system has also been used to analyze prokaryotic genomes EBI Genome Review (archaea and bacteria) Most recent version is v38 (Apr 2006) Genomes available include human, chimp, mouse, dog, C. elegans, fruit fly, honey bee, mosquito among others
9
Ensembl Gene Annotation System All Ensembl gene predictions are based on experimental evidence Predictions based on manually curated Uniprot/Swissprot/Refseq databases UTR’s are annotated only if they are supported by EMBL mRNA records Val Curwen, et al. The Ensembl Automatic Gene Annotation System Genome Res., May 2004; 14: 942 - 950.
10
List of available species in the FlyBase BLAST service to use in a search for sequences homologous to your query Exon View in Ensembl: used to obtain sequence of a gene, exon-by-exon Using Web Databases for Annotation
11
Motivations for using FlyBase Learn the biological functions of the gene of interest Use FlyBase BLAST service to detect sequence homology to Drosophila species or species related to Drosophila Motivations for using Ensembl Obtain records of gene from multiple databases Obtain coding sequence of each exon of a gene
12
Walkthrough Typical use of web databases is to identify putative homolog to a D. melanogaster gene We have a novel 20 kb sequence from D. erecta Using RepeatMasker, we masked all drosophila- specific repeats from the sequence Using blastx, we searched this sequence against the Swissprot database blastx results indicate our sequence is similar to the Paired-box protein (Pax6) in D. melanogaster
13
Function of Pax-6 Clicking on the accession number of the first hit in the blastx output shows that Pax-6 is also known as eyeless We can learn more about eyeless using the FlyBase web site @ http://flybase.orghttp://flybase.org Type in eyeless in the search field, then click on the hit “ey” (#17)
14
Function of Pax-6 This brings up the gene report for eyeless in D. melanogaster We find that eyeless is important for brain and eye development It is expressed in embryo, larva, and adult Phenotypic changes in mutants include changes in the antenna, arista, and eye of the fruit fly
15
Finding Homologs in Other Species Click on the BLAST button to access the BLAST service Search our masked sequence against D. melanogaster, D. yakuba, D. mojavensis, D. virilis genome assemblies using blastn Most of the species, other than D. melanogaster, are unannotated. Nonetheless, this is useful for finding putative orthologs and for discovering regulatory regions using multiple sequence alignments
16
Using the Ensembl Database Navigate to Ensembl @ http://www.ensembl.orghttp://www.ensembl.org Click on “Drosophila melanogaster” to access the data specific for this species In the search box, type in the name “eyeless” then click “Go” We find only one match - CG1464 (the eyeless protein)
17
Transcripts of eyeless There are four different isoforms of eyeless in D. melanogaster We would typically annotate the most “comprehensive” isoform In this case, isoform D The Fruitfly GeneView provides a general overview of the gene structure and function of eyeless Links to FlyBase, RefSeq, Swiss-Prot, EMBL records of eyeless are also available on this page.
18
Obtaining Transcript Sequence Click on “Exon Info” for the transcript CG1464-RD This bring us to the exon report for this transcript 9 exons, 3024 bps, 898 residues The sequence is shown with each exon in its own block. Sequence is color-coded: Purple = UTR’s Black = Coding DNA sequences (CDS) Blue = intronic sequences Green = upstream or downstream sequences
19
Obtaining Peptide Sequence Click on the link “Protein Information” to obtain the peptide sequence of CG1464-RD This bring us to the protein report for this transcript “Protein Family” section shows that there are six gene members in this species Clicking on the link brings up the Family view - allows visualization of multiple sequence alignments of members of this family The peptide sequence has the following color-code: Black/Blue = Alternating text color for exons Red = Residue overlap splice site Green = Synonymous SNP Yellow = Non-synonymous SNP
20
Next Step Annotate the exact boundaries of each exon in our D. erecta sequence based on sequence homology to D. melanogaster eyeless gene Use exon-by-exon BLAST search with BLAST 2 Sequences (bl2seq)
21
Questions?
22
Walk- through example
23
Determining Exon Boundaries Use bl2seq to determine exon boundaries of the putative ortholog in our D. erecta sequence Go to www.ncbi.nlm.nih.gov/blast/ and select bl2seqwww.ncbi.nlm.nih.gov/blast/ Copy D. erecta sequence and paste into the Sequence 1 box. Copy the first exon of DM eyeless and paste into the Sequence 2 box. Change program to tblastx. Click “BLAST”
24
Determining Exon Boundaries We find that the first exon corresponds to bases 19307- 19414 in our sequence We can repeat the previous steps to locate the other exons in our sequence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.