Download presentation
Presentation is loading. Please wait.
Published byArlene Pierce Modified over 9 years ago
1
Using GC content to distinguish Phytophthora sequences from tomato sequences
2
Mission #1 Calculate the GC content of each sequence in the Phytophthora-tomato interactome We will use a perl script to accomplish the mission.
3
Preparation Download the perl script (gc.pl) from the class web site and store it in C:/BioDownload folder
4
Open cygwin, or command prompt (Vista users), or terminal (Mac users) Change directory (cd) to the BioDownload folder perl gc.pl PhytophSeq1.txt phyto_gc.out Running the script
5
In cygwin (Windows users) or terminal (Mac users) grep --perl-regexp ”\t” -c phytoph_gc.out grep ”>” -c PhytophSeq1.txt You should get the same number from the two commands. The number should be 3921. Results
6
The output file GC content column Name column
7
Build a histogram of the values of GC content We will use R program to accomplish this mission. Mission #2
8
http://www.r-project.org
11
Mac users
12
All Windows users
13
XP users Vista users
15
getwd() to know which folder you are in now
16
setwd(“c:/BioDownload”) to change the working directory to C:/BioDownload setwd(“/path/to/biodownload”) for Mac users
17
data<-read.table(“phytoph_gc.out”,sep=“\t”,header=FALSE) to read in the data in the file phytoph_gc.out (your file name may be different)
18
data[1:10,] to see the first 10 lines of the vector “data”
19
gc<-data[,2] to assign the values from the 2 nd column of “data” to a new vector “gc”
20
summary(gc) to get the summary of the values in the vector “gc”
21
hist(gc,breaks=58) to draw a histogram of the values in “gc” vector Breaks indicates how many cells you want for the histogram. It was calculated as 78.7 (max) - 21.2 (min). It means the bin of the histogram is ~ 1 GC value
22
hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”) to make the histogram look better
23
>pdf(“gc_histogram.pdf”) >hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”) >dev.off() To output the histogram to a PDF file.
24
location file
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.