Stubbs Lab Bioinformatics - 3 Review RNA-Seq Analysis Overview Alignment using Tophat2 Nov 22, 2016 Joe Troy
Agenda Review of tools and Linux commands Overview of the RNA-Seq Analysis Aligning short reads (.fastq files) with Tophat2 to create alignment files (accepted_hits.bam)
Also, to create bigwigs for UCSC track hubs, we use some UCSC software.
Linux commands (review and new) cp copy. copy file ex: cp oldfile.txt newfile.txt copy folder ex: cp –R old_folder new_folder df –h See how much disk space is on the server cd change to new folder. ex: cd my_new_folder pwd print working directory, show the current folder ls –lh list contents with details (l), show file size & date as human readable (h) rm PERMANENTLY remove a file or folder. ex: rm my_file.txt removes a file named “my_file.txt” in the current working director. ex: rm -r myfolder removes a folder, and all of its contents named “myfolder” in the current working directory. ex: rm *.txt removes all file ending with ‘.txt’. ex: rm * removes everything in the current working directory BE CAREFUL. screen Screen allows you to start a “sub-process” on stubbslab.igb.illinois.edu, exit that subprocess while it continues to run (allowing you to disconnect from stubbslab.igb.illinois.edu), and reattach to the process at a later time. sh Used to start a shell script. ex: sh main_script_tophat_16Gso.sh
RNA-Seq data analysis Context and Overview
INPUT: .tgz file(s) from ftp.biotec.illinois.edu INPUT: .fastq short read files OUTPUT: “accepted_hits.bam” file from each “.fastq file” OUTPUT: .fastq short read files Retrieve and un-compress short read files Align Reads to genome Next Step: review alignment stats sftp command Tophat 2 script tar command
Terminal is used to access the Linux command line on a MAC
Instructions to alignment short reads with tophat2 INSTRUCTION SLIDE 1 Josephs-MacBook-Pro:~ josephtroy$ ssh jmtroy2@stubbslab.igb.illinois.edu jmtroy2@stubbslab.igb.illinois.edu's password: Last login: Mon Nov 21 20:15:51 2016 from c-73-73-226-74.hsd1.il.comcast.net [jmtroy2@stubbslab ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.6T 4.2T 156G 97% / /dev/sda2 95G 14G 77G 16% /var /dev/sdb1 289M 29M 246M 11% /boot tmpfs 32G 0 32G 0% /dev/shm /dev/sdb2 275G 116G 145G 45% /var/lib/mysql [jmtroy2@stubbslab ~]$ screen
Instructions to alignment short reads with tophat2 INSTRUCTION SLIDE 2 [jmtroy2@stubbslab ~]$ cd /home/share/example_rna_seq_project_16Gso/ [jmtroy2@stubbslab example_rna_seq_project_16Gso]$ ls -1 code_010_tophat2 code_020_alignment_summary_report code_030_MDS_plots code_040_create_track_hub_bigwigs code_050_cpm_means_report code_060_differential_expression_w_edgeR fastq_files output_010_tophat2_RUN_20161121_092530 [jmtroy2@stubbslab example_rna_seq_project_16Gso]$ cd code_010_tophat2/ [jmtroy2@stubbslab code_010_tophat2]$ ls main_script_tophat_16Gso.sh [jmtroy2@stubbslab code_010_tophat2]$ sh main_script_tophat_16Gso.sh Start of Tophat … NOW HOLD DOWN THE CONTROL KEY AND PRESS a, THEN PRESS d, TO DETACH FROM THE SCREEN SESSION
Instructions to alignment short reads with tophat2 DEMONSTRATION SLIDE 3 [jmtroy2@stubbslab ~]$ screen -ls There is a screen on: 11559.pts-2.stubbslab (Detached) 1 Socket in /var/run/screen/S-jmtroy2. [jmtroy2@stubbslab ~]$ screen -r 11559 [end of tophat] [jmtroy2@stubbslab code_010_tophat2]$ exit [end of tophat] [jmtroy2@stubbslab code_010_tophat2]$ screen -ls No Sockets found in /var/run/screen/S-jmtroy2.
Review tophat2 output in Cyberduck
align_summary.txt NOTE: The “Mapped” rate of 99.9% is this high because of the way the example fastq files were created for the training exercise. The fastq files were created with only those reads already mapped to chromosome 5.
/home/share/example_rna_seq_project_16Gso/code_010_tophat2/main_script_tophat_16Gso.sh (1 of 2)
/home/share/example_rna_seq_project_16Gso/code_010_tophat2/main_script_tophat_16Gso.sh (2 of 2)