Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented at University of Texas, Health Science Center – San Antonio 25 March 2015
Session 2 Part 1 Pathway and functional analyses (String manipulation in R, InnateDB, MouseMine) Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
List of proteins We start wit a list of proteins obtained from mass spectrometry (MS). A sample protein_MS.xlsx file is provided in the workshop material. It was exported from Scaffold software and contributed by Dr. Janice Deng. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Converting xlsx to csv format In Excel, click on File>Save as... Choose .csv format. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Extracting and saving RefSeq IDs Open R and run the following. You can copy from the cheat sheet. ## Import the csv file in R. r1 <- read.csv("protein_MS.csv",stringsAsFactors=FALSE) ## We consider only this column. numbers <- r1[,"Accession.Number"] ## Extracting and saving RefSeq IDs. inds <- grep(numbers,pattern="ref") numbers <- numbers[inds] numbers <- gsub(numbers,pattern=".*ref\\|",replacement="") numbers <- gsub(numbers,pattern="\\|.*",replacement="") ## No row or column names. No quotations. write.table(numbers,file="refseq.csv",row.names=FALSE,col.names=FALSE, sep=",", quote=FALSE) Choose .csv format. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Extracting and saving RefSeq IDs Open R and run the following. You can copy from the cheat sheet. ## Import the csv file in R. r1 <- read.csv("protein_MS.csv",stringsAsFactors=FALSE) ## We consider only this column. numbers <- r1[,"Accession.Number"] ## Extracting and saving RefSeq IDs. inds <- grep(numbers,pattern="ref") numbers <- numbers[inds] numbers <- gsub(numbers,pattern=".*ref\\|",replacement="") numbers <- gsub(numbers,pattern="\\|.*",replacement="") ## No row or column names. No quotations. write.table(numbers,file="refseq.csv",row.names=FALSE,col.names=FALSE, sep=",", quote=FALSE) numbers is one column from r1 matrix. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Extracting and saving RefSeq IDs Open R and run the following. You can copy from the cheat sheet. ## Import the csv file in R. r1 <- read.csv("protein_MS.csv",stringsAsFactors=FALSE) ## We consider only this column. numbers <- r1[,"Accession.Number"] ## Extracting and saving RefSeq IDs. inds <- grep(numbers,pattern="ref") numbers <- numbers[inds] numbers <- gsub(numbers,pattern=".*ref\\|",replacement="") numbers <- gsub(numbers,pattern="\\|.*",replacement="") ## No row or column names. No quotations. write.table(numbers,file="refseq.csv",row.names=FALSE,col.names=FALSE, sep=",", quote=FALSE) At each step, type “numbers” to follow the process. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
The csv file Open the refseq.csv file and make sure it is in appropriate format, e.g., no row or column names. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Click on Pathway Analysis and then Analysis InnateDB Use to perform pathway and network analysis. http://www.innatedb.com/ Click on Pathway Analysis and then Analysis Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Uploading data to InnateDB Click on Upload a file. Upload the refseq.csv file you created using R. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Uploading data to InnateDB Choose Cross-reference ID Choose Ensembl and click on OK. Click on Column 1. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Pathway overrepresentation analysis Pathway analysis Pathway overrepresentation analysis Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Settings Leave the defaults Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Pathway analysis results Keep mouse on a column to see overlap wit the pathways. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Pathway analysis results Click to choose database. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Pathway analysis results Click on Details to see the overlap. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Pathway analysis results The overlap. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Other analyses Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Gene Ontology Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Gene Ontology results Click to choose. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Moving the mouse over nodes highlights interactions. Network analysis Moving the mouse over nodes highlights interactions. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Click on advanced to upload the file. MouseMine Use to see pathway and functional analyses results on one page. http://www.mousemine.org Click on advanced to upload the file. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
UniPort keywords Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Click on the number of Matches for more details. Pathway enrichment Click on the number of Matches for more details. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
You can download your favorite table. Pathway enrichment You can download your favorite table. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Click on a protein name for more details. Pathway enrichment Click on a protein name for more details. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
Pathway enrichment Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016
References: InnateDB Lynn, David J., et al. "InnateDB: facilitating systems‐level analyses of the mammalian innate immune response." Molecular systems biology 4.1 (2008). MouseMine Motenko, H., Neuhauser, S.B., O'Keefe, M., and Richardson, J.E., MouseMine: a new data warehouse for MGI. Mamm Genome, 2015. 26(7-8):325-30. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016