Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using GC content to distinguish Phytophthora sequences from tomato sequences.

Similar presentations


Presentation on theme: "Using GC content to distinguish Phytophthora sequences from tomato sequences."— Presentation transcript:

1 Using GC content to distinguish Phytophthora sequences from tomato sequences

2 Mission #1 Calculate the GC content of each sequence in the Phytophthora-tomato interactome We will use a perl script to accomplish the mission.

3 Preparation Download the perl script (gc.pl) from the class web site and store it in C:/BioDownload folder

4 Open cygwin, or command prompt (Vista users), or terminal (Mac users) Change directory (cd) to the BioDownload folder perl gc.pl PhytophSeq1.txt phyto_gc.out Running the script

5 In cygwin (Windows users) or terminal (Mac users) grep --perl-regexp ”\t” -c phytoph_gc.out grep ”>” -c PhytophSeq1.txt You should get the same number from the two commands. The number should be 3921. Results

6 The output file GC content column Name column

7 Build a histogram of the values of GC content We will use R program to accomplish this mission. Mission #2

8 http://www.r-project.org

9

10

11 Mac users

12 All Windows users

13 XP users Vista users

14

15 getwd() to know which folder you are in now

16 setwd(“c:/BioDownload”) to change the working directory to C:/BioDownload setwd(“/path/to/biodownload”) for Mac users

17 data<-read.table(“phytoph_gc.out”,sep=“\t”,header=FALSE) to read in the data in the file phytoph_gc.out (your file name may be different)

18 data[1:10,] to see the first 10 lines of the vector “data”

19 gc<-data[,2] to assign the values from the 2 nd column of “data” to a new vector “gc”

20 summary(gc) to get the summary of the values in the vector “gc”

21 hist(gc,breaks=58) to draw a histogram of the values in “gc” vector Breaks indicates how many cells you want for the histogram. It was calculated as 78.7 (max) - 21.2 (min). It means the bin of the histogram is ~ 1 GC value

22 hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”) to make the histogram look better

23 >pdf(“gc_histogram.pdf”) >hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”) >dev.off() To output the histogram to a PDF file.

24 location file


Download ppt "Using GC content to distinguish Phytophthora sequences from tomato sequences."

Similar presentations


Ads by Google