Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics for biologists (2)

Similar presentations


Presentation on theme: "Bioinformatics for biologists (2)"— Presentation transcript:

1 Bioinformatics for biologists (2)
Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented at University of Texas, Health Science Center – San Antonio 25 March 2015

2 Session 2 Part 1 Pathway and functional analyses (String manipulation in R, InnateDB, MouseMine) Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

3 List of proteins We start wit a list of proteins obtained from mass spectrometry (MS). A sample protein_MS.xlsx file is provided in the workshop material. It was exported from Scaffold software and contributed by Dr. Janice Deng. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

4 Converting xlsx to csv format
In Excel, click on File>Save as... Choose .csv format. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

5 Extracting and saving RefSeq IDs
Open R and run the following. You can copy from the cheat sheet. ## Import the csv file in R. r1 <- read.csv("protein_MS.csv",stringsAsFactors=FALSE) ## We consider only this column. numbers <- r1[,"Accession.Number"] ## Extracting and saving RefSeq IDs. inds <- grep(numbers,pattern="ref") numbers <- numbers[inds] numbers <- gsub(numbers,pattern=".*ref\\|",replacement="") numbers <- gsub(numbers,pattern="\\|.*",replacement="") ## No row or column names. No quotations. write.table(numbers,file="refseq.csv",row.names=FALSE,col.names=FALSE, sep=",", quote=FALSE) Choose .csv format. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

6 Extracting and saving RefSeq IDs
Open R and run the following. You can copy from the cheat sheet. ## Import the csv file in R. r1 <- read.csv("protein_MS.csv",stringsAsFactors=FALSE) ## We consider only this column. numbers <- r1[,"Accession.Number"] ## Extracting and saving RefSeq IDs. inds <- grep(numbers,pattern="ref") numbers <- numbers[inds] numbers <- gsub(numbers,pattern=".*ref\\|",replacement="") numbers <- gsub(numbers,pattern="\\|.*",replacement="") ## No row or column names. No quotations. write.table(numbers,file="refseq.csv",row.names=FALSE,col.names=FALSE, sep=",", quote=FALSE) numbers is one column from r1 matrix. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

7 Extracting and saving RefSeq IDs
Open R and run the following. You can copy from the cheat sheet. ## Import the csv file in R. r1 <- read.csv("protein_MS.csv",stringsAsFactors=FALSE) ## We consider only this column. numbers <- r1[,"Accession.Number"] ## Extracting and saving RefSeq IDs. inds <- grep(numbers,pattern="ref") numbers <- numbers[inds] numbers <- gsub(numbers,pattern=".*ref\\|",replacement="") numbers <- gsub(numbers,pattern="\\|.*",replacement="") ## No row or column names. No quotations. write.table(numbers,file="refseq.csv",row.names=FALSE,col.names=FALSE, sep=",", quote=FALSE) At each step, type “numbers” to follow the process. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

8 The csv file Open the refseq.csv file and make sure it is in appropriate format, e.g., no row or column names. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

9 Click on Pathway Analysis and then Analysis
InnateDB Use to perform pathway and network analysis. Click on Pathway Analysis and then Analysis Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

10 Uploading data to InnateDB
Click on Upload a file. Upload the refseq.csv file you created using R. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

11 Uploading data to InnateDB
Choose Cross-reference ID Choose Ensembl and click on OK. Click on Column 1. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

12 Pathway overrepresentation analysis
Pathway analysis Pathway overrepresentation analysis Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

13 Settings Leave the defaults
Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

14 Pathway analysis results
Keep mouse on a column to see overlap wit the pathways. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

15 Pathway analysis results
Click to choose database. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

16 Pathway analysis results
Click on Details to see the overlap. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

17 Pathway analysis results
The overlap. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

18 Other analyses Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

19 Gene Ontology Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

20 Gene Ontology results Click to choose.
Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

21 Moving the mouse over nodes highlights interactions.
Network analysis Moving the mouse over nodes highlights interactions. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

22 Click on advanced to upload the file.
MouseMine Use to see pathway and functional analyses results on one page. Click on advanced to upload the file. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

23 UniPort keywords Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

24 Click on the number of Matches for more details.
Pathway enrichment Click on the number of Matches for more details. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

25 You can download your favorite table.
Pathway enrichment You can download your favorite table. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

26 Click on a protein name for more details.
Pathway enrichment Click on a protein name for more details. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

27 Pathway enrichment Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

28 References: InnateDB Lynn, David J., et al. "InnateDB: facilitating systems‐level analyses of the mammalian innate immune response." Molecular systems biology 4.1 (2008). MouseMine Motenko, H., Neuhauser, S.B., O'Keefe, M., and Richardson, J.E., MouseMine: a new data warehouse for MGI. Mamm Genome, (7-8): Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016


Download ppt "Bioinformatics for biologists (2)"

Similar presentations


Ads by Google