Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 EMBL Outstation — The European Bioinformatics Institute Automatic and Reliable Functional Annotation of Proteins.

Similar presentations


Presentation on theme: "1 EMBL Outstation — The European Bioinformatics Institute Automatic and Reliable Functional Annotation of Proteins."— Presentation transcript:

1 1 EMBL Outstation — The European Bioinformatics Institute Automatic and Reliable Functional Annotation of Proteins

2 2 EMBL Outstation — The European Bioinformatics Institute The Target Database F Your data F Uncharacterized F Any kind of data –Protein sequences –Gene sequences –etc. F Our target: TrEMBL Target

3 3 EMBL Outstation — The European Bioinformatics Institute The External Database F Collection of conditions –Sequence patterns –Profiles –HMMs –E.C. numbers –Protein clusters F Example: –PROSITE –Pfam Target XDB

4 4 EMBL Outstation — The European Bioinformatics Institute Direct Transfer F Search target F Transfer annotation to target database F Example: Look up E.C. number and add recommended enzyme name Target XDB

5 5 EMBL Outstation — The European Bioinformatics Institute Multiple Sources F Usually more than one external database is used F Combine the different results Target XDB

6 6 EMBL Outstation — The European Bioinformatics Institute Conflicts F Contradiction F Inconsistencies F Synonyms F Redundancy

7 7 EMBL Outstation — The European Bioinformatics Institute Translation F Use a translator to map XDB language to target language Target XDB

8 8 EMBL Outstation — The European Bioinformatics Institute Translation Examples  ENZYME  TrEMBL CA L-ALANINE=D-ALANINE CC -!- CATALYTIC ACTIVITY: L-ALANINE= CC D-ALANINE.  PROSITE  TrEMBL /SITE=3,heme_iron FT METAL IRON  Pfam  TrEMBL FT DOMAIN zf_C3HC4 FT ZN_FING C3HC4-TYPE

9 9 EMBL Outstation — The European Bioinformatics Institute Automatic Translation F Introduction a standard/reference database F Must be: –highly reliable –well-curated F Example: SWISS-PROT TargetStandard XDB

10 10 EMBL Outstation — The European Bioinformatics Institute Extract Reference Entries F Use XDB to extract entries from standard database  Example: Pfam:PF00509 Hemagglutinin HEMA_IAVI7/P03435 HEMA_IANT6/P03436 HEMA_IAAIC/P03437 HEMA_IAX31/P03438 HEMA_IAME2/P03439 HEMA_IAEN7/P03440 HEMA_IABAN/P03441 HEMA_IADU3/P03442 HEMA_IADA1/P03443 HEMA_IADMA/P03444 HEMA_IADM1/P03445 HEMA_IADA2/P03446 HEMA_IASH5/P03447 TrEMBLSWISS-PROT Pfam

11 11 EMBL Outstation — The European Bioinformatics Institute Extract Common Annotation 132 entries read 131 ID HEMA_XXXXX 125 DE HEMAGGLUTININ PRECURSOR. 6 DE HEMAGGLUTININ. 131 GN HA 130 CC -!- FUNCTION: HEMAGGLUTININ IS RESPONSIBLE FOR ATTACHING THE 130 CC VIRUS TO CELL RECEPTORS AND FOR INITIATING INFECTION. 125 CC -!- SUBUNIT: HOMOTRIMER. EACH OF THE MONOMER IS FORMED BY TWO 125 CC CHAINS (HA1 AND HA2) LINKED BY A DISULFIDE BOND. 75 DR HSSP; P03437; 1HGD. 31 DR HSSP; P03437; 1DLH. 131 KW HEMAGGLUTININ; GLYCOPROTEIN; ENVELOPE PROTEIN 102 KW SIGNAL 1 KW COAT PROTEIN; POLYPROTEIN; 3D-STRUCTURE 130 FT CHAIN HA1 CHAIN. 107 FT CHAIN HA2 CHAIN. 102 FT SIGNAL

12 12 EMBL Outstation — The European Bioinformatics Institute Store Common Annotation F Store the used pattern and the extracted common annotation in a separate database TargetStandard XDB Common

13 13 EMBL Outstation — The European Bioinformatics Institute Add Annotation to Target F Extract entries from target F Add common annotation to the entries TargetStandard XDB Common

14 14 EMBL Outstation — The European Bioinformatics Institute Modelling of the Rules F Definition of condition types F Definition of action types F Encoding the logic F Storage and retrieval of the rules u Version control u Monitoring the results

15 15 EMBL Outstation — The European Bioinformatics Institute Formal Language for the Rules  # Comment #RULE RU000001 #DATE 1997-04-23  ? Condition ?PSAC PS00057 ?SPOC PLANTA  ! Action !SPDE L-LACTATE DEHYDROGENASE !ECNO 1.1.1.27

16 16 EMBL Outstation — The European Bioinformatics Institute Implementation of Condition Types F Every condition type must be implemented  Example: Perl routine for ‘?PSAC’: has the protein a link to a given prosite entry? sub condition_PSAC { my $ac = shift; return /^DR PROSITE; $ac/m; }

17 17 EMBL Outstation — The European Bioinformatics Institute Implementation of Action Types F Every action type must be implemented  Example: Add enzyme code to the entry. sub action_ECNO { my $ecno = shift; s/^DE.*$/$& (EC $ecno)/m; } or insert into Trembl2Enzyme values (acc,ecno);

18 18 EMBL Outstation — The European Bioinformatics Institute Encoding the Logic  Any logical expression like a AND (b OR c) BUT NOT d can be written without brackets as a AND b AND NOT d OR a AND c AND NOT d  Rules can be identifed by their conditions ”a&b&-d|a&c&-d”

19 19 EMBL Outstation — The European Bioinformatics Institute Automatic Annotation of TrEMBL F Extract conditions from XDB F Group SWISS-PROT by conditions F Extract common annotation F Group TrEMBL by conditions F Add common annotation to TrEMBL TrEMBLSWISS-PROT PROSITE RuleBase Pfam ENZYME

20 20 EMBL Outstation — The European Bioinformatics Institute Results: RuleBase F Source: PROSITE patterns u 262 rules u 597 conditions u 1099 actions F Result: u 2951 of 29330 new TrEMBL 5 entries u 1443 of 15078 new TrEMBL 6 entries u 9658 of 106330 existing TrEMBL 5 entries u 3254 of 140635 existing TrEMBL 6 entries

21 21 EMBL Outstation — The European Bioinformatics Institute Results: Keywords in TrEMBL

22 22 EMBL Outstation — The European Bioinformatics Institute Results: TrEMBL Annotation

23 23 EMBL Outstation — The European Bioinformatics Institute Discussion F Stable and reliable, successfully added 68000 lines to TrEMBL F Carefully set thresholds, therefore low coverage F Restricted language better than free text  Feed-back loop SWISS-PROT  TrEMBL F Rules may be implemented in set-oriented language F Position specific annotation may be improved by alignments F Independent of hierarchy F Based on multiple entries

24 24 EMBL Outstation — The European Bioinformatics Institute Dynamic Updates

25 25 EMBL Outstation — The European Bioinformatics Institute Where to get TrEMBL ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb/

26 26 EMBL Outstation — The European Bioinformatics Institute Credits SWISS-PROT at EBI F Rolf Apweiler F Sergio Contrino F Wolfgang Fleischmann F Henning Hermjakob F Viv Junker F Fiona Lang F Claire O'Donovan F Michele Magrane F Maria Jesus Martin F Nicoletta Mitaritonna F Steffen Moeller F Stephanie Kappus Collaborators F Amos Bairoch F Alain Gateau F Jean-Jacques Codani F Keith Tipton F MGD F Flybase F Pfam F Network of > 200 external experts


Download ppt "1 EMBL Outstation — The European Bioinformatics Institute Automatic and Reliable Functional Annotation of Proteins."

Similar presentations


Ads by Google