Generic Database
What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic literature Interact with other Database Generic Usable by everyone
GeneDB – An Overview Aim – To provide a database to house the data from the many sequencing projects that the Sanger Institute has been involved in. The database had to be: Generic, flexible enough to handle sequence from diverse organisms Curatable, capable of being manually edited by annotators and curators Intuitive and user friendly Capable of housing new data types, easily expandable Searchable, allow users complete flexibility in searching, selecting and downloading whatever information they want Interactive, community feedback
SpeciesGenome sizeStatusCurated Leishmania major 33600In FinishingYes Leishmania infantum k reads 5 X Yes Trypanosoma b. brucei 35000In FinishingYes Trypanosoma vivax k reads ~6 X Yes Trypanosoma cruzi ~41000 In Finishing 19 XNo? GeneDB November Datasets Total number of organisms – 26 Number of protozoa - 12 Leishmania braziliensis ~ k reads 5 X Yes Trypanosoma congolense ~ k reads ~5 X Yes Trypanosoma b. gambiense ~ k reads ~5 X Yes Kinetoplastids
a)Basic information – on the selected gene b)Location – The chromosome number, coordinates, gene length and a graphical map c) Curated and/or automatic annotation d)Predicted peptide properties statistics on the predicted protein, known or predicted domains and motifs
e)Gene Ontology – Annotation using the GO controlled vocabulary. f)Database cross references are linked to other public databases g)Curated orthologs – database links to manually selected orthologous genes h)Similarity information and the respective database links i)Swiss-Prot annotations – for this protein and keywords j)Contact – feedback forms for curators and technical queries
Orthologs and Paralogues in GeneDB Tri-tryp orthologs Predicted by clustering and Reciprocal BLAST Paralogs or families Predicted using BLAST P and TribeMCL 4 BLAST e value cutoffs TribeMCL Enright A.J., Van Dongen S., Ouzounis C.A; Nucleic Acids Res. 30(7): (2002)
Help
(
Sequence viewer and annotation tool
How to access data: keyword searching sequence searching/ motif search complex querying browsable catalogues, product, domain browsable contig/chromosome maps GO (gene ontology) - AmiGO across species
Searching GeneDB Simple Query Sequence search analysis Browse Catologues
Chromosome/contig maps
Search multiple datasets over multiple organisms, Uses more than one BLAST algorithm if appropriate Produces an intermediate results page, listing summary of the top 5 hits of all searches If protein sequence used will also display predicted Pfam protein families found Access full BLAST search result from intermediate page OMNIBLAST
Complex querying
Complex querying with boolean search tool
Cross species search for nucleoside transporter By name or ID By product By protein domain
AmiGO – local Gene Ontology (GO) browser
Proteomics Tool Select the dataset Select restriction enzyme Enter peptide mass data
Protein motif search
Data downloads Any search result that gives a list History of any boolean queries
Contiguous sequence Generate download list by adding to gene basket
Leishmania major Stats Trypanosoma brucei stats
Gene Naming
GeneDB reference guide Papers: Trends in Parasitology, (10) January 2004 issue of Nucleic Acids Research Feed back forms for technical and biological queries More information