SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data
Annotation Given that one has found one of more genes that are differentially expressed, there are a number useful things to know What is the putative function? What pathways are know to contain this gene? What other proteins interact with the given protein? etc. May 12, 2015BST 226 Statistical Methods for Bioinformatics2
Two-color array example May 12, 2015BST 226 Statistical Methods for Bioinformatics3 > alldata[1,] [1] [16] > geneID[1,] Name ID 1 NM_ discoidin domain receptor family, member
May 12, 2015BST 226 Statistical Methods for Bioinformatics4 Official SymbolDDR2 provided by HGNCHGNC Official Full Name discoidin domain receptor tyrosine kinase 2 provided by HGNCHGNC Primary sourceHGNC:2731HGNC:2731 Locus tagRP11-572K18.1 See relatedEnsembl:ENSG ; HPRD:01868; MIM:191311; Vega:OTTHUMG Ensembl:ENSG ;HPRD:01868;MIM:191311;Vega:OTTHUMG Gene typeprotein coding RefSeq statusREVIEWED OrganismHomo sapiensHomo sapiens LineageEukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo Also known asTKT; MIG20a; NTRKR3; TYRO10 SummaryReceptor tyrosine kinases (RTKs) play a key role in the communication of cells with their microenvironment. These molecules are involved in the regulation of cell growth, differentiation, and metabolism. In several cases the biochemical mechanism by which RTKs transduce signals across the membrane has been shown to be ligand induced receptor oligomerization and subsequent intracellular phosphorylation. This autophosphorylation leads to phosphorylation of cytosolic targets as well as association with other molecules, which are involved in pleiotropic effects of signal transduction. RTKs have a tripartite structure with extracellular, transmembrane, and cytoplasmic regions. This gene encodes a member of a novel subclass of RTKs and contains a distinct extracellular region encompassing a factor VIII-like domain. Alternative splicing in the 5' UTR results in multiple transcript variants encoding the same protein. [provided by RefSeq, Jul 2008]
Affy Example May 12, 2015BST 226 Statistical Methods for Bioinformatics5 > source(" > biocLite("annaffy") > biocLite("hgu95av2.db") > library(annaffy) > library(affy) Loading required package: Biobase Loading required package: tools … Loading required package: GO Loading required package: KEGG
May 12, 2015BST 226 Statistical Methods for Bioinformatics6 > probeids <- featureNames(eset)[allp2.adj <.05] > probeids[1:5] [1] "1005_at" "1009_at" "1034_at" "1035_g_at" "1045_s_at" > symbols <- aafSymbol(probeids,"hgu95av2.db") Loading required package: hgu95av2 > symbols[1] An object of class "aafList" [[1]] An object of class “aafSymbol” [1] "DUSP1" > getText(symbols[1]) [1] "DUSP1" > descs <- aafDescription(probeids,"hgu95av2.db")[1] > getText(descs)[1] [1] "dual specificity phosphatase 1" > gos <- aafGO(probeids,"hgu95av2.db")
May 12, 2015BST 226 Statistical Methods for Bioinformatics7 > gos[1] An object of class "aafList" [[1]] An object of class "aafGO" [[1]][[1]] An object of class "protein amino acid "Biological "IEA" [[1]][[2]] An object of class "response to oxidative "Biological "TAS" [[1]][[3]] An object of class "cell "Biological "IEA”
May 12, 2015BST 226 Statistical Methods for Bioinformatics8 [[1]][[4]] An object of class "non-membrane spanning protein tyrosine phosphatase "Molecular "TAS" [[1]][[5]] An object of class "protein "Molecular "IPI" [[1]][[6]] An object of class "hydrolase "Molecular "IEA"
May 12, 2015BST 226 Statistical Methods for Bioinformatics9 [[1]][[7]] An object of class "MAP kinase tyrosine/serine/threonine phosphatase "Molecular "IEA" There are actually 33 terms
GO Evidence Codes IEA = inferred from electronic annotation (e.g., BLAST). Uncurated TAS = traceable author statement (i.e., someone said so). May 12, 2015BST 226 Statistical Methods for Bioinformatics10
IDA = inferred from direct assay IEP = inferred from expression pattern IGI = inferred from genetic interaction IMP = inferred from mutant phenotype IPI = inferred from physical interaction ISS = inferred from sequence similarity NAS = non-traceable author statement ND = no biological data available NR = not recorded May 12, 2015BST 226 Statistical Methods for Bioinformatics11
Online Access > gbs <- aafGenBank(probeids,"hgu95av2.db") > getURL(gbs[[1]]) [1] " fcgi?cmd=search&db=nucleotide&term=X68277% 5BACCN%5D&doptcmdl=GenBank" > lls <- aafLocusLink(probeids,"hgu95av2.db") > getURL(lls[[1]]) [1] " Db=gene&Cmd=DetailsSearch&Term=1843" May 12, 2015BST 226 Statistical Methods for Bioinformatics12
Abstracts > pmids <- aafPubMed(probeids,"hgu95av2.db") > pmids[[1]] An object of class "aafPubMed" [1] [13] > pmids[1] An object of class “aafPubMed” [1] [9] [17] [25] [33] [41] > browseURL(getURL(lls[[1]])) May 12, 2015BST 226 Statistical Methods for Bioinformatics13
Abstracts > pmids <- aafPubMed(probeids,"hgu95av2.db") > pmids[[1]] An object of class "aafPubMed" [1] [9] [17] [25] [33] [41] [49] [57] [65] [73] [81] [89] [97] [105] [113] [121] [129] [137] [145] [153] May 12, 2015BST 226 Statistical Methods for Bioinformatics14
Direct Browsing > browseURL(getURL(lls[[1]])) > browseURL(getURL(gbs[[1]])) > browseURL(getURL(pmids[1])) May 12, 2015BST 226 Statistical Methods for Bioinformatics15
Top Genes > probeids.ord <- featureNames(eset)[order(allp2.adj)] > getText(aafSymbol(probeids.ord[1:10],"hgu95av2.db")) [1] "S100A2" "" "RPLP1" "GM2A" "" "RPS17" "GAPDH" "COPA" [9] "PSPHP1" "" > getText(aafDescription(probeids.ord[1:10],"hgu95av2.db")) [1] "S100 calcium binding protein A2" [2] "" [3] "ribosomal protein, large, P1" [4] "GM2 ganglioside activator" [5] "" [6] "ribosomal protein S17" [7] "glyceraldehyde-3-phosphate dehydrogenase" [8] "coatomer protein complex, subunit alpha" [9] "phosphoserine phosphatase pseudogene 1" [10] "" May 12, 2015BST 226 Statistical Methods for Bioinformatics16
> aafGO(probeids.ord[7],"hgu95av2.db") An object of class "microtubule cytoskeleton "Biological "ISS" [[1]][[2]] An object of class "glyceraldehyde-3-phosphate dehydrogenase (NAD+) (phosphorylating) "Molecular "ISS" [[1]][[3]] An object of class "glyceraldehyde-3-phosphate dehydrogenase (NAD+) (phosphorylating) "Molecular "NAS" [[1]][[4]] An object of class "protein "Molecular "IPI" May 12, 2015BST 226 Statistical Methods for Bioinformatics17 There are actually 41 terms