Download presentation
Presentation is loading. Please wait.
Published byNoel Matthew Douglas Modified over 9 years ago
1
Generic Database
2
What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic literature Interact with other Database Generic Usable by everyone
3
GeneDB – An Overview Aim – To provide a database to house the data from the many sequencing projects that the Sanger Institute has been involved in. The database had to be: Generic, flexible enough to handle sequence from diverse organisms Curatable, capable of being manually edited by annotators and curators Intuitive and user friendly Capable of housing new data types, easily expandable Searchable, allow users complete flexibility in searching, selecting and downloading whatever information they want Interactive, community feedback
4
SpeciesGenome sizeStatusCurated Leishmania major 33600In FinishingYes Leishmania infantum 33600 280k reads 5 X Yes Trypanosoma b. brucei 35000In FinishingYes Trypanosoma vivax 30000300k reads ~6 X Yes Trypanosoma cruzi ~41000 In Finishing 19 XNo? GeneDB November 2004 - Datasets www.genedb.org Total number of organisms – 26 Number of protozoa - 12 Leishmania braziliensis ~ 33600 361k reads 5 X Yes Trypanosoma congolense ~30000 262k reads ~5 X Yes Trypanosoma b. gambiense ~30000 188k reads ~5 X Yes Kinetoplastids
5
WWW.genedb.org
7
a)Basic information – on the selected gene b)Location – The chromosome number, coordinates, gene length and a graphical map c) Curated and/or automatic annotation d)Predicted peptide properties statistics on the predicted protein, known or predicted domains and motifs
8
e)Gene Ontology – Annotation using the GO controlled vocabulary. f)Database cross references are linked to other public databases g)Curated orthologs – database links to manually selected orthologous genes h)Similarity information and the respective database links i)Swiss-Prot annotations – for this protein and keywords j)Contact – feedback forms for curators and technical queries
9
Orthologs and Paralogues in GeneDB Tri-tryp orthologs Predicted by clustering and Reciprocal BLAST Paralogs or families Predicted using BLAST P and TribeMCL 4 BLAST e value cutoffs TribeMCL Enright A.J., Van Dongen S., Ouzounis C.A; Nucleic Acids Res. 30(7):1575-1584 (2002)
10
Help
13
(http://godatabase.org/cgi-bin/go.cgi?query=GO%3A0006166)
14
Sequence viewer and annotation tool
16
How to access data: keyword searching sequence searching/ motif search complex querying browsable catalogues, product, domain browsable contig/chromosome maps GO (gene ontology) - AmiGO across species
18
Searching GeneDB Simple Query Sequence search analysis Browse Catologues
19
Chromosome/contig maps
20
Search multiple datasets over multiple organisms, Uses more than one BLAST algorithm if appropriate Produces an intermediate results page, listing summary of the top 5 hits of all searches If protein sequence used will also display predicted Pfam protein families found Access full BLAST search result from intermediate page OMNIBLAST
23
Complex querying
24
Complex querying with boolean search tool
25
Cross species search for nucleoside transporter By name or ID By product By protein domain
26
AmiGO – local Gene Ontology (GO) browser
28
Proteomics Tool Select the dataset Select restriction enzyme Enter peptide mass data
29
Protein motif search
31
Data downloads Any search result that gives a list History of any boolean queries
32
Contiguous sequence Generate download list by adding to gene basket
33
Leishmania major Stats Trypanosoma brucei stats
34
Gene Naming
36
GeneDB reference guide Papers: Trends in Parasitology, 2002 18 (10) 465-67 January 2004 issue of Nucleic Acids Research Feed back forms for technical and biological queries More information http://www.genedb.org/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.