Presentation is loading. Please wait.

Presentation is loading. Please wait.

Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’

Similar presentations


Presentation on theme: "Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’"— Presentation transcript:

1 Corrections

2

3 - The cacao genome is currently being sequenced http://www.ncbi.nlm.nih.gov/genomes/leuks.cgi - Human Chromosome 1 sequence Search ‘Genome’ with ‘homo sapiens chromosome 1’ RefSeq has an entry for each human chromosome (genome reference): these entries are numbered NC_000001 to NC_000024 Mapviewer view: http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=9606&chr=1http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=9606&chr=1 Entrez nucleotide view: http://www.ncbi.nlm.nih.gov/nuccore/224589800http://www.ncbi.nlm.nih.gov/nuccore/224589800 - Measles virus complete genome (RefSeq) http://www.ncbi.nlm.nih.gov/nuccore/NC_001498 - Nucleic acid sequences available for human erythropoietin (EPO) (GenBank and RefSeq) http://www.ncbi.nlm.nih.gov/nucleotide/?term=homo+sapiens+erythropoietin+EPO mRNA coding for human EPO (in RefSeq) http://www.ncbi.nlm.nih.gov/nuccore/NM_000799.2

4

5

6 Look for Escherichia coli strain K-12 substrain W3110 complete genome (Query @ NCBI 'Genome')Query @ NCBI 'Genome' Query ‘Genome’ with ‘Escherichia coli strain K-12 substrain W3110’’ http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&Cmd=ShowDetailView&TermToSearch=19221

7 CU466930.1 http://www.ncbi.nlm.nih.gov/nuccore/167729007 Corresponding proteins in UniProtKB (query with CU466930):CU466930 http://www.uniprot.org/uniprot/?query=CU466930&sort=score Corresponding proteins in NCBInr (query ‘nucleotide’ with CU466930 and follow the link to ‘protein’CU466930 http://www.ncbi.nlm.nih.gov/nuccore?Db=protein&DbFrom=nuccore&Cmd=Link&LinkNam e=nuccore_protein&IdsFromResult=167729007

8 AAFZ00000000.1 http://www.ncbi.nlm.nih.gov/nuccore/60175893 No protein yet : no annotated CDS available (or not yet submitted by the authors)!

9

10 EPO protein in different databases UniProtKB/Swiss-Prot (reviewed) http://www.uniprot.org/uniprot/P01588 UniProtKB/TrEMBL (unreviewed) http://www.uniprot.org/uniprot/B7ZKK5 UniParc (follow the link from the UniProtKB sequence section) http://www.uniprot.org/uniparc/UPI0000033477 NCBInr (see next slide) http://www.ncbi.nlm.nih.gov/protein/?term=homo+sapiens+erythropoietin+EPO RefSeq (see next slide) http://www.ncbi.nlm.nih.gov/protein/NP_000790.2http://www.ncbi.nlm.nih.gov/protein/NP_000790.2? Ensembl (follow the link from the UniProt entry) http://www.ensembl.org/Homo_sapiens/Transcript/ProteinSummary?g=ENSG00000130427;r=7:100318423-100321323;t=ENST00000252723

11

12

13 http://www.uniprot.org/uniprot/P04150 -Which server? UniProt -Which database? UniProtKB/Swiss-Prot -What is the function of the protein? Receptor for glucocorticoids (GC). -How many different post-translational modifications (PTMs) ? Phosphorylated, Sumoylated, Ubiquitinated - How many phosphorylation sites? 10 sites -What are the associated GO terms? http://www.uniprot.org/uniprot/P04150#section_terms -What is the evidence for the existence of the protein(s)? http://www.uniprot.org/uniprot/P04150#section_attributehttp://www.uniprot.org/uniprot/P04150#section_attribute: at protein level -What is the corresponding mRNA? X03225X03225 for example -How many different protein sequences are available for the corresponding gene? http://www.uniprot.org/uniprot/P04150#section_alternativehttp://www.uniprot.org/uniprot/P04150#section_alternative: 9 isoforms

14 http://www.ncbi.nlm.nih.gov/protein/NP_000167.1? -Which server? NCBI -Which database? RefSeq -What is the function of the protein? receptor for glucocorticoids - How many different post-translational modifications (PTMs) ? Phosphorylated, Sumoylated, Ubiquitinated - How many phosphorylation sites? 14 sites -What are the associated GO terms? Not directly available -What is the evidence for the existence of the protein(s)? This information is not available -What is the corresponding mRNA? ‘The reference sequence was derived from AC091925.3, X03225.1 and AC004782.1. ‘AC091925.3X03225.1AC004782.1 - How many different protein sequences are available for the corresponding gene? This information is not directly available; go to Entrez Gene. http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Retrieve&dopt=full_report&list_uids=2908

15

16 Find the UniProt entry corresponding to: -RefSeq NP_036231 -GI:584682 -Find the GI numbers corresponding to UniProtKB P04150 -Why are there so many GI numbers ?

17 Find the UniProt entry corresponding to: -GI:584682 -Find the GI numbers corresponding to UniProtKB P04150 -Why are there so many GI numbers ?

18 Find the GI numbers corresponding to UniProtKB P04150

19 Why are there so many GI numbers ? -Because there are several protein sequences corresponding to this gene due to alternative splicing events etc.; sequences have also been updated/modified; -The merging policy at NCBI (one sequence one entry) is not same as the one at UniProt (one entry one gene one species)

20

21 Look for the URL corresponding to the following queries: -Mouse proteins localized in the nucleus Query: taxonomy:"Mus musculus [10090]" AND annotation:(type:location nucleus) URL: http://www.uniprot.org/uniprot/?query=taxonomy%3A%22Mus+musculus+[10090]%22+AND+annotation%3A%28type%3Alocation+nucleus%29&sort=score http://www.uniprot.org/uniprot/?query=taxonomy%3A%22Mus+musculus+[10090]%22+AND+annotation%3A%28type%3Alocation+nucleus%29&sort=score -Proteins for which a 3D structure is known (Hint: they have a cross-reference to PDB) Query: database:(type:pdb) URL: http://www.uniprot.org/uniprot/?query=database%3A%28type%3Apdb%29&sort=score http://www.uniprot.org/uniprot/?query=database%3A%28type%3Apdb%29&sort=score -What is the query corresponding to this URL: http://www.uniprot.org/uniprot/?query=taxonomy%3A9606+AND+keyword%3A"complete+proteome http://www.uniprot.org/uniprot/?query=taxonomy%3A9606+AND+keyword%3A"complete+proteome “ Query: taxonomy:9606 AND keyword:"complete proteome“ -> human complete proteome - Modify the URL to get the same result, but for Escherichia coli (strain K12) Query: taxonomy:83333 AND keyword:"complete proteome“ URL: http://www.uniprot.org/uniprot/?query=taxonomy%3A83333+AND+keyword%3A"complete+proteome“ http://www.uniprot.org/uniprot/?query=taxonomy%3A83333+AND+keyword%3A"complete+proteome

22

23 - Yeast (Saccharomyces cerevisiae) proteins found in the nucleus in UniProtKB/Swiss-Prot. Query: organism:4932 AND annotation:(type:location AND nucleus) AND reviewed:yesorganism:4932 AND annotation:(type:location AND nucleus) AND reviewed:yes - How many of them have a nuclear localization which is 'experimentally proven' ? Query: organism:"Saccharomyces cerevisiae [4932]" ANDorganism:"Saccharomyces cerevisiae [4932]" AND annotation:(type:location nucleus confidence:experimental) AND reviewed:yes 1655 and 1392 entries respectively (query done in January, the 19th) - Download the list of corresponding accession numbers / protein names / gene names (use 'customize display').

24 -The set might not be complete: not all proteins have been tested to be localized in the nucleus ! And UniProtKB might not have annotated all the experiments showing that the yeast proteins are nuclear.

25

26 P00001 @ UniProtKB

27 P00001 @ NCBInr

28 An 'old' publication cites a protein sequence with accession number (AC) o00597: Could you find it ?

29 You did a proteomics analysis in December 2007 without any match. You repeat the analysis in April 2008 and get entry with AC P0C6S9 as the best match. Why ? P0C6S9 was first created in April 2008 in UniProtKB

30

31 Compare the GO terms associated with mouse and human erythropoeitin (EPO)

32 Have a look to the different GO evidence tagsGO evidence tags How many GO terms have been 'inferred by direct assay (IDA)' to the human EPO gene ?

33 hierarchy of the GO term 'apoptosis''apoptosis'

34 These GO terms are associated with insulin !

35 Several proteins have been identified in a proteomic experiment. Which GO terms do they share? (GI numbers of the identified proteins: 16130093, 20664033, 1789812, 89110178, 85677033, 27574045, 89111003, 229597766).

36 GO terms in common…

37 Searching databases with Blast

38

39 Protein Sequence Databases The ‘alternative’ sequence(s) not ‘directly available’ for a lot of tools, including protein identification tools, Blast, depending on the server !…. Murcia, February, 2011

40 Blast P04150 against Swiss-Prot / homo sapiens @ UniProt Isoform sequences Murcia, February, 2011Protein Sequence Databases

41 Blast P04150 against Swiss-Prot / homo sapiens @ NCBI The isoform sequences (from Swiss-Prot) are not present in the NCBI protein database ! The.x number (P06401.4) correspond to the version number of the sequence…not to an alternatively spliced sequence ! Murcia, February, 2011

42

43 Blast A tool associated with the standard options to search sequences in UniProt databases

44

45

46 The metal binding (Fe) site is conserved between HBB human and pea leghemoglobin!

47

48 Murcia, February, 2011Protein Sequence Databases Detailed BLAST results

49

50

51 InterPro : other shema (Graphical view from UniProtKB)

52 InterPro shema PFAM Graphical view

53 Prosite Graphical view Not kepted in the InterPro overview !

54 Do a Blast with the sequence of the domain 'C-type lectin' of protein P28175 against UniProtKB/Swiss-Prot. Discuss the overview of your results. Look at the other domains present in the most similar sequences.

55

56 UniProt: Color code for identity scores (not alignment !) Blast p28175 (complete seq) @ Swiss-Prot

57

58 UniProt: Color code for identity scores (not alignment !)

59 Blast p28175 (complete seq) @ NCBI against Swiss-Prot NCBI: Color key for alignment scores

60 NCBI Swiss-Prot does not contain the alternative sequences (i.e. P28175-2) – !! NCBI gives the ‘version number’ of the Swiss-Prot sequence (i.e. Q8BU25.2)….

61

62 N-linked glycosylation (GlcNac): Look at the Swiss-Prot annotation (in a random ‘glycosylated’ entry)

63 Query UniProtKB

64 Taxonomic distribution

65 TPNLINDTME

66 Multiple alignment (ClustalW) -[LAPIQ]-N-[HAYRCS]-[ST]-[KLESGM]

67 Logo

68

69 N-glycosylation does not occur in Bacteria: …false positive !

70

71 28 protein (within the set of 1000 proteins) are glycosylated according to the UniProtKB annotation…!

72 Not easy to find that there is never a P after the N glycvosylation site (needs a lot of sequences….) !

73

74

75

76

77

78 C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H

79 Pattern scan

80

81

82 The pattern missed a second Zn finger in the same protein i.e. Q24174 Pattern Profile

83 The pattern: C - X(2,4) - C - X(3) - [LIVMFYWC] - X(8) - H - X(3,5) – H Should includes: YRCVLCGTVAKSRNSLHSHMSrQHRGIST C-X(2,4)-C-X(3)-[LIVMFYWCA]-X(8)-H-X(3,5)-H

84

85 Yes ! But: The pattern becomes less restrictive. But you get more sequences which should not be here. As the results are limited to 1000, the number of hits is not the same…

86 Discriminators (Signatures, descriptors) for the Zinc finger C2H2 type domain can be found in Prosite (Pattern and Profile) and Pfam (HMM)

87

88 DoublecortinKinase

89 The doublecortin domain is associated with many different domains (not only kinase)

90

91

92 Seq 1 Seq 2 Patient with cardiovascular disease Lost of the protein kinase ATP binding site !

93

94 Step 1: scan UniProtKB/Swiss-Prot with the pattern Use the ‘scanprosite’ tool at http://www.expasy.org/tools/scanprosite/

95

96 At the bottom of the Scan prosite result page:

97 Step 2: Retrieve the 103 human entries @ UniProt (go at the bottom of the Scan Prosite result page; Matched UniProtKB entries)

98 -> 19 candidates to be manually checked …. Step 3: Retrieve the sequences annotated as being ‘phosphorylated on a Thr’

99 The end


Download ppt "Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’"

Similar presentations


Ads by Google