Download presentation
Presentation is loading. Please wait.
Published byMarcus Harper Modified over 9 years ago
1
EMBL Outstation — The European Bioinformatics Institute The EMBL Database Helen Parkinson Nottingham University 2001
2
EMBL Outstation — The European Bioinformatics Institute EBI, Wellcome Genome Campus, Hinxton, Cambridge, UK EMBL Nucleotide Sequence Database
3
EMBL Outstation — The European Bioinformatics Institute The European Bioinformatics Institute Databases www.ebi.ac.uk F EMBL Nucleotide Sequence Database F Protein Databases (SWISS-PROT & TREMBL) F Molecular Structure Database (EBI-MSD) F Radiation Hybrid Database (RhDB) F Immunogenetics Database (IMGT) F Ensembl plus >70 additional specialized databases on EBI’s FTP server ftp://ftp.ebi.ac.uk/pub/databases
4
EMBL Outstation — The European Bioinformatics Institute The European Bioinformatics Institute Services and Research www.ebi.ac.uk F SRS, sequence retrieval system Research into complete genomes F Research into protein sequence analysis and structure prediction F Microarray research and new database F Industry Programme http://www.ebi.ac.uk/Groups/index.html
5
EMBL Outstation — The European Bioinformatics Institute The European Bioinformatics Institute EMBL Database Curation Activites www.ebi.ac.uk F Biological and bioinformatic support for users F Biological annotation and provision of accession numbers F Development of features and qualifiers for new datatypes F Updating and correction of entries - Cleanup F Development and testing of new database tools F Liason with collaborating databases to maintain synchrony
6
EMBL Outstation — The European Bioinformatics Institute International Nucleotide Sequence Databases DDBJ/EMBL/GenBankNCBI EBI NIG Genome DataDirect SubmissionsPatent Literature EMBL Nucleotide Sequence Database DNA Databank of JapanGenBank
7
EMBL Outstation — The European Bioinformatics Institute
8
Sequence Data from Patent LiteratureGenBank EMBL DDBJ EMBL Nucleotide Sequence Database DNA Databank of Japan EPO USPTOJPO Release 64 (Sep 2000) entries: 207,677 bases: 67,411,887
9
EMBL Outstation — The European Bioinformatics Institute Direct Submissions F mandatory submission policy F 60 - 70 % to be held confidential F 3009 - 4000 direct subs/month F exponential growth in submissions Researcher Journal Database Curator Sequence submission Accession number Manuscript Accession # Publication
10
EMBL Outstation — The European Bioinformatics Institute Researcher Journal EMBL Curator Sequence Submission Data Submission Publication Webin ID Acc. # EMBL-NEW Database Acc. # Manuscript Direct Submissions Dataflow
11
EMBL Outstation — The European Bioinformatics Institute www.ebi.ac.uk/submissions/webin.html Vector Screening Service Context-sensitive ‘Help’ Bulk Alignments 1. Submitter details 2. Sequence and description 3. Source information 4. Citation information 5. Feature information (coding regions, signals,etc.) 6. Final validation and submission WWW Submission System
12
EMBL Outstation — The European Bioinformatics Institute Webin Sequence Features Central Page
13
EMBL Outstation — The European Bioinformatics Institute Webin Sequence Features Central Page
14
SEQUIN Submission System F multi-platform (Mac/PC/Unix) stand-alone software tool F allows submissions to EMBL, GenBank and DDBJ F Available from EBI: u SEQUIN program u detailed downloading and installation instructions u plus general information u in ftp://ftp.ebi.ac.uk/pub/software/sequin/
15
EMBL Outstation — The European Bioinformatics Institute actggtgaccaggta tgacgtactactctag aactgcctgactacg catcttcagcatcttgt EMBL database correction Y98000 Y98001 Y98002 Y98003 Y98004 Y98005 accession number notification (e-mail) update actggtgaccaggta tgacgtactactctag aactgcctgactacg ** E R R O R ** actggtgaccaggta tgacgtactactctag aactgcctgactacg catcttcagcatcttgt Submission (WWW, email, post) Rejection (e.g.additional information required) preview actggtgaccaggta tgacgtactactctag aactgcctgactacg catcttcagcatcttgt actggtgaccaggta tgacgtactactctag aactgcctgactacg catcttcagcatcttgt Gene=abcD Product=enzyme X Author(s)=A. Smith Publication=Nature Status=in press actggtgaccaggta tgacgtactactctag aactgcctgactacg catcttcagcatcttgt annotation (update) error report form update request Direct Submissions Dataflow EMBL-EBI Research Institute
16
EMBL Outstation — The European Bioinformatics Institute Genome data acquisition F F Submission through genome project accounts F Retrieval of unfinished sequence from ftp server F Exchange with DDBJ and Genbank Peter Sterk, EMBL Hinxton Outstation, the European Bioinformatics Institute, March 1999
17
EMBL Outstation — The European Bioinformatics Institute Genome Projects Dataflow Peter Sterk, EMBL Hinxton Outstation, the European Bioinformatics Institute, October 1998
18
EMBL Outstation — The European Bioinformatics Institute Human Draft Genome Ensembl http://www.ensembl.org provides automatic annotation of the human draft genome data includes confirmed peptides&cDNAs, predicted peptides & repeatts, map & SNPs Genome MOT http://www.ebi.ac.uk/Databases/Genome_MOT/genome_mot.html presents status of a number of large eukaryotic genome sequencing projects provides access to individual EMBL database entries updated daily EMBL Release ftp://ftp.ebi.ac.uk/pub/databases/embl/release/ draft sequence data included in EMBL Database HTG and HUM divisions
19
EMBL Outstation — The European Bioinformatics Institute
20
Monitoring the progress of major genome projects: the Genome MOT F Collaboration with Sanger Centre F Updated weekly F Data source: EMBL database http://www.ebi.ac.uk/Databases/Genome_MOT/genome_mot.html Curr. Opin. Biotechnol. 9:116-120(1998)
21
EMBL Outstation — The European Bioinformatics Institute Calculation of the Genome MOT F Finished sequences, present in EMBL database F Genomic DNA (no RNAs, cDNAs, ESTs, STSs) F H.sapiens, C.elegans. A.thaliana, M. musculus S.pombe, broken down according to chromosome F Redundancies taken into account Cut-off: 1000 bp Redundancies estimated with CLEANUP, Grillo et al., CABIOS 12:1-8(1996) Peter Sterk, EMBL Hinxton Outstation, the European Bioinformatics Institute, March 1999
22
EMBL Outstation — The European Bioinformatics Institute
23
Sanger Centre Sequencing Projects Human Genome ProjectChromosome 1, 6, 9, 10, 11, 13, 20, 22, X, Worm Caenorhabditis elegans Yeast Schizosaccharomyces pombe Candida albicans Microbial Genomes Bordetella parapertussis, Bordetella pertussis Burkholderia pseudomallei Campylobacter jejuni Clostridium difficile Corynebacterium diphtheriae Mycobacterium bovis, Mycobacterium leprae, Mycobacterium tuberculosis Neisseria meningitidis Salmonella typhi Staphylococcus aureus Streptococcus pyogenes Streptomyces coelicolor Yersinia pestis Protozoa Dyctiostelium discoideum Leishmania major Plasmodium falciparum Trypanosoma brucei FlyDrosophila melanogaster non-human Vertebrates Mus musculus (mouse) Gallus gallus (chicken)
24
EMBL Outstation — The European Bioinformatics Institute Selected other Genome Projects F Mouse chr. X, Oxford/HGMP-RC, UK F Arabidopsis thaliana (ESSA), European consortium/MIPS, Germany F Homo sapiens, GBF, Germany F European Drosophila Mapping Consortium, UK F Anopheles gambiae, Pasteur, France F Miscell. microbial genomes, Pasteur, France F Homo sapiens EST, MIPS, Germany F Mouse EST Project, GENOSCOPE, France F Homo sapiens EST, Padova, Italy in progress: prokaryotic > 100 eukaryotic > 80
25
EMBL Outstation — The European Bioinformatics Institute Homo sapiens
26
EMBL Outstation — The European Bioinformatics Institute Data Management and Curation Accession Numbers X46455 AJ343321 F Sequence Identifiers nucleotide sequence identifierExample: SV AJ400848.1 protein sequence identifier Example: /protein_id="CAB88705.1" F Data confidentiality and release dates F Integration with external databases Database X-references databases /db_xref TrEMBLSWISS-PROT MaizeDB FLYBASE IMGT MENDEL MGD TRANSFAC SGD EPD Total # of links >2,8 million
27
EMBL Outstation — The European Bioinformatics Institute Interoperability EPD Eukaryotic Promoters Flybase D. melanogaster SubtiList B. subtlis MaizeDB Zea mays WormPep C. elegans REBASE Restriction enzymes StyGene Salmonella typhimurium Transfac Transcription factors EMBL Nucleotide Sequence Database SWISS-PROT + TrEMBL MSD 3D Structures ECDC E. coli map GCRDb G-coupled Receptors EcoGene E. coli SGD Yeast DictyDB Dictyostelium discodium ENZYME Enzyme Nomenclature OMIM Human ECO2DBASE (2D) Maize-2DPAGE (2D) Aarhus/Ghent (2D) YPD Yeast HSSP 3D Similarities Harefield (2D) Prosite Pattens & Profiles
28
EMBL Outstation — The European Bioinformatics Institute Data Management and Curation F Hardware –VMS –UNIX Digital UNIX 2 Alphaservers 8400 (12/4 CPUs) –network of PCs F Relational Database Management System (ORACLE) F Database Schema facilitating integration and interoperability with other databases
29
EMBL Outstation — The European Bioinformatics Institute Biological Data Curation F annotation of new submission data F quality Control-sequence, FT, proofreading F creation of database entries F updates / Corrections F curation of Genome Project Data F curation of data classes (e.g. immunoglobulins, TCR etc) F Classification of species in collaboration with taxonomy @ ncbi F production of annotation examples F development and testing of submission tools F writing submitter documentation F liason with linked databases F liason with genome projects
30
EMBL Outstation — The European Bioinformatics Institute Database Entry Structure description taxonomic source information submitter info reference citation biological features sequence
31
EMBL Outstation — The European Bioinformatics Institute Description / Source ID BMGLUCKIN standard; DNA; PRO; 1362 BP. XX AC AJ000005; XX SV AJ000005.1 XX DT 22-JUL-1997 (Rel. 52, Created) DT 17-JAN-1998 (Rel. 54, Last updated, Version 2) XX DE Bacillus megaterium glk gene XX KW glk gene; glucose kinase. XX OS Bacillus megaterium OC Bacteria; Firmicutes; Bacillus/Clostridium group; OC Bacillaceae; Bacillus. XX
32
EMBL Outstation — The European Bioinformatics Institute Submitter Reference / Citation XX RN [1] RP 1-1362 RA Spaeth C.; RT ; RL Submitted (01-JUL-1997) to the EMBL/GenBank/DDBJ RL databases. RL Spaeth C., Institut fuer Mikrobiologie, Biochemie und RL Genetik, Lehrstuhl fuer Mikrobiologie, Staudtstr. 5, RL 91058 Erlangen, GERMANY. XX RN [2] RA Spaeth C., Kraus A., Hillen W.; RT "Contribution of glucose kinase to glucose repression RL of xylose utilization in Bacillus megaterium"; RL J. Bacteriol. 179:7603-7605(1997). XX
33
Feature Table FH Key Location/Qualifiers FH FT source 1..1362 FT /organism="Bacillus megaterium" FT /db_xref="taxon:1404" FT /sequenced_mol="DNA" FT CDS 270..1244 FT /codon_start=1 FT /db_xref="SPTREMBL:O31392" FT /evidence=EXPERIMENTAL FT /transl_table=11 FT /gene="glk" FT /product="glucose kinase" FT /EC_number="2.7.1.2" FT /protein_id="CAA03848.1" FT /translation="MNMDDKWLVGVDLGGTTIKMAF...
34
TREMBL entry ID O31392 PRELIMINARY; PRT; 324 AA. AC O31392; DT 01-JAN-1998 (TREMBLREL. 05, CREATED) DT 01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) DE GLUCOSE KINASE (EC 2.7.1.2). GN GLK. OS BACILLUS MEGATERIUM. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. RN [1] RX MEDLINE; 98053881. RA SPAETH C., KRAUS A., HILLEN W.; RT "Contribution of glucose kinase to glucose repression of xylose RT utilization in Bacillus megaterium."; RL J. Bacteriol. 179:7603-7605(1997). DR EMBL; AJ000005; CAA03848.1; -. DR PFAM; PF00480; ROK; 1. DR PROSITE; PS01125; ROK; 1. KW Transferase. SQ SEQUENCE 324 AA; 33899 MW; 665E9F19 CRC32; MNMDDKWLVG VDLGGTTIKM AFINHYGEII HKWEINTDVS EQGRKIPTDI AKAIDKKLND LGEVKSRLVG IGIGAPGPVN FANGSIEVAV NLGWEKFPIK DILEVETSLP VVVDNDANIA AIGEMWKGAG DGAKDLLCVT LGTGVGGGVI ANGEIVQGVN GAAGEIGHIT SIPEGGAPCN CGKTGCLETI ASATGIVRLT MEELTETDKP SELRTVLEQN GQVTSKDVFD AARSKDGLAM HVVDKVAFHL GLALANSANA LNPEKIVLGG GVSRAGEVLL APVRDYFKRF AFPRVAQGAE LAIATLGNDA GIIGGAWLVK SYFE //
35
EMBL Outstation — The European Bioinformatics Institute Sequence SQ Sequence 1362 BP; 446 A; 211 C; 325 G; 380 T; 0 other; TGACACTTTG AGTTATCTCA AAATAAATGA ATCATATCAC CTAAAAAATA GGATAAAGGT 60 GACAGAATGA ATACGTTATA TGATGTACAA CAATTATTAA AGTCCTTCGG CATTTTTATA 120 TACGTGGGCG ATCGTATTGC TGATTTAGAG CTGATGGAAG CGGAAGTAAA AGAGTTATAT 180 CAGTCTAACT TGATTGATGT ACGTGATTAC CAAATGGCAA TTCTTTTGCT TCGTCGAGAG 240 TTAAAACAAC AAAAAGAGAA AAAGGATGAA TGAACATGGA TGACAAATGG TTAGTTGGAG 300 TTGATTTAGG CGGTACAACA ATAAAAATGG CCTTTATTAA TCATTATGGA GAAATCATTC 360 ACAAGTGGGA AATTAATACG GATGTGAGCG AGCAAGGCCG TAAAATTCCA ACGGATATTG 420 CAAAAGCAAT TGATAAAAAG TTAAACGATC TTGGAGAAGT AAAATCAAGG TTAGTAGGAA 480 TTGGCATTGG TGCACCGGGG CCTGTCAACT TTGCAAACGG TTCGATTGAA GTAGCTGTCA 540 ATTTAGGTTG GGAAAAATTC CCTATAAAAG ATATCTTGGA AGTAGAAACT TCTCTTCCTG 600 TTGTAGTAGA CAATGATGCA AACATTGCAG CGATTGGAGA AATGTGGAAG GGTGCTGGAG 660 ACGGAGCAAA AGATTTACTT TGCGTTACGC TTGGCACAGG CGTTGGCGGT GGCGTCATTG 720 CAAACGGTGA AATTGTACAA GGCGTAAATG GAGCCGCTGG TGAGATCGGG CACATTACTT 780 CTATTCCTGA AGGCGGGGCA CCGTGTAACT GCGGTAAAAC CGGCTGTTTA GAAACCATTG 840 CTTCAGCAAC TGGAATTGTA CGTTTAACAA TGGAAGAATT AACGGAAACG GACAAACCAA 900 GTGAGCTTCG CACAGTGTTA GAACAAAATG GACAAGTTAC ATCTAAAGAT GTATTTGATG 960 CAGCTCGTTC AAAAGACGGG TTAGCTATGC ATGTTGTAGA TAAAGTTGCT TTTCATTTAG 1020 GTCTAGCACT AGCAAACTCT GCTAATGCAT TAAACCCTGA GAAGATCGTT CTAGGCGGCG 1080 GTGTGTCTCG TGCAGGCGAG GTATTACTTG CACCGGTAAG AGATTATTTC AAACGTTTTG 1140 CATTTCCTCG CGTAGCGCAA GGTGCTGAAC TAGCAATCGC TACTTTAGGA AACGATGCGG 1200 GAATTATTGG AGGAGCTTGG TTAGTTAAAT CTTATTTTGA ATAATAAGCA AGAATCTAAC 1260 TGAGATAAAA AAGCGCTTTG ACATTTAGTC AAAGCGCTTT TTTATCATGC ATCTTTTCAA 1320 TCTTTACATA TACATAGTGT AAAGGAGTGA AGATTATGCA AA 1362
36
EMBL Outstation — The European Bioinformatics Institute Database Divisions
37
EMBL Outstation — The European Bioinformatics Institute Data Distribution Genbank, DDBJ EMBnet nodes other mirrors Peter Sterk, EMBL Hinxton Outstation, the European Bioinformatics Institute, October 1998
38
EMBL Outstation — The European Bioinformatics Institute Data Access Network serviceswww, e-mail, ftp F Access to the most up-to-date data collection via Internet and WWW F Sequence Retrieval System (SRS) Network Browser for Databanks in Molecular Biology integrating and linking the main nucleotide and protein databases plus many specialized databases CD-ROM F Database releases are produced quarterly - distributed on CD- ROM.
39
EMBL Outstation — The European Bioinformatics Institute Accessing Genome Data Completed Genomes Webserver http://www.ebi.ac.uk/genomes/ F High-Throughput Genome Sequences (HTG phases 0 - 3) ftp://ftp.ebi.ac.uk/pub/databases/embl/release F Ensembl http://www.ensembl.org/ F Genome MOT http://www.ebi.ac.uk/Databases/Genome_MOT/ F CON(struct) division ftp://ftp.ebi.ac.uk/pub/databases/genomes
40
EMBL Outstation — The European Bioinformatics Institute Database searching fasta, blast, blitz For sequence similarity searching a variety of tools are available for external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT. Sequence Analysis clustalw multiple sequence alignment and inference of phylogenies genemark gene prediction
41
EMBL Outstation — The European Bioinformatics Institute Uses of nucleotide and derivative protein databases F discovery of novel genes F identification of homologous genes and additional members of gene families F analysis of alternative splicing F chromosomal localisation of genes F detection of polymorphisms (SNPs) F comparative genomics –molecular evolution –comparisons of human/mouse DNA to identify genes unique to one or more complex organisms –humans/fruit fly/nematodes to identify genes essential for all multicellular organisms –human/yeast DNA to identify genes related to functions essential for all eukaryotic cells F regulation of gene expression –role of vast majority of DNA (junk" DNA) e.g: introns with role in the function of transfer RNA critical to protein synthesis twintrons etc –multiple genes (exact number of genes/interaction ?)
42
EMBL Outstation — The European Bioinformatics Institute Release Production
43
EMBL Outstation — The European Bioinformatics Institute Database Growth status: 12-OCT-2000 EMBL nucleotides: 10,290,670,274 EMBL entries: 9,113,333
44
EMBL Outstation — The European Bioinformatics Institute
45
Acknowledgements EBI EMBL database staff, biologists and programmers Collaborators DDBJ, GenBank, Sanger Centre, EPO, plus many other projects and databases Peter Sterk, EMBL Hinxton Outstation, the European Bioinformatics Institute, October 1998
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.