Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stubbs Lab Bioinformatics - 2 Retrieving sequence data files and Linux commands Nov 17, 2016 Joe Troy.

Similar presentations


Presentation on theme: "Stubbs Lab Bioinformatics - 2 Retrieving sequence data files and Linux commands Nov 17, 2016 Joe Troy."— Presentation transcript:

1 Stubbs Lab Bioinformatics - 2 Retrieving sequence data files and Linux commands
Nov 17, 2016 Joe Troy

2 Agenda Schedule meetings for upcoming sessions
Retrieving Sequence files (.tgz) from the biotec server Demonstration Exercise Linux commands to know.

3 Retrieving Sequence files (.tgz) from the biotec server
Steps Review /excel from the Biotechnology Center (from Alvaro Hernandez ) “sftp” .tgz files to stubbslab or biocluster server Uncompress .tgz files to get .fastq files Not covered in this presentation – but .tgz files or the .fastq files need to be archived.

4 INPUT: .tgz file(s) from ftp.biotec.illinois.edu
INPUT: .fastq short read files OUTPUT: “accepted_hits.bam” file from each “.fastq file” OUTPUT: .fastq short read files Retrieve and un-compress short read files Align Reads to genome sftp command Tophat 2 script tar command

5 Email/excel from the Biotechnology Center (Slide 1)

6 Email/excel from the Biotechnology Center (slide 2)

7 “sftp” .tgz files to stubbslab or biocluster server
DEMONSTRATION SLIDE 1 Josephs-MacBook-Pro:~ josephtroy$ ssh password: Last login: Fri Nov 11 10:17: from sloofma2.igb.illinois.edu ~]$ pwd /home/a-m/jmtroy2 ~]$ # make a project folder ~]$ mkdir projects ~]$ cd projects projects]$ mkdir _project1 projects]$ cd _project1 _project1]$ mkdir project_input_data _project1]$ cd project_input_data

8 “sftp” .tgz files to stubbslab or biocluster server
DEMONSTRATION SLIDE 2 project_input_data]$ sftp Connecting to ftp.biotec.illinois.edu... password: sftp> ls _P tgz _P tgz P tgz Stubbs tgz Stubbs_160203_P tgz Stubbs_P tgz Stubbs_P tgz Stubbs_P tgz Stubbs_P tgz Stubbs_P tgz Stubbs_P tgz Stubbs_P tgz sftp> get P tgz Fetching /export/home/ljstubbs/P tgz to P tgz /export/home/ljstubbs/P tgz 100% 8327MB 23.5MB/s 05:54 sftp> exit project_input_data]$ ls P tgz project_input_data]$

9 Uncompress .tgz files to get .fastq files
DEMONSTRATION SLIDE 3 project_input_data]$ tar -zxvf P tgz 381Fish_D5pm_ctrl_H3K27ac_CGATGT_L001_R1_001.fastq 382Fish_D5pm_ctrl_H3K27ac_TGACCA_L001_R1_001.fastq 383Fish_D5pm_ctrl_input_ACAGTG_L001_R1_001.fastq 384Fish_D5pm_EX_H3K27ac_GCCAAT_L001_R1_001.fastq 385Fish_D5pm_EX_H3K27ac_CAGATC_L001_R1_001.fastq 417Fish_D5pm_EX_input_CTTGTA_L001_R1_001.fastq 387Fish_D9am_ctrl_H3K27ac_ATCACG_L001_R1_001.fastq 388Fish_D9am_ctrl_H3K27ac_TTAGGC_L001_R1_001.fastq 389Fish_D9am_ctrl_input_ACTTGA_L001_R1_001.fastq 390Fish_D9am_EX_H3K27ac_GATCAG_L001_R1_001.fastq 391Fish_D9am_EX_H3K27ac_TAGCTT_L001_R1_001.fastq 392Fish_D9am_EX_input_GGCTAC_L001_R1_001.fastq 393P1_H3K27ac_AGTCAA_L001_R1_001.fastq 394P1_H3K27ac_AGTTCC_L001_R1_001.fastq 395P1_WT1_ATGTCA_L001_R1_001.fastq 396P1_WT1_CCGTCC_L001_R1_001.fastq 397P1_input_GTAGAG_L001_R1_001.fastq project_input_data]$

10 Terminal is used to access the Linux command line on a MAC

11 Moving files between machines (laptop, PC, server) with command line sftp
For this exercise open up terminal on MAC In Finder, go to the Applications/Utilities folder and open the “Terminal” application. This should open up a new “Terminal” window. Type the commands in dark blue below THIS EXERCISE WILL USE ABOUT 13Gig on your laptop/PC This exercise is continued on the next 2 slides Josephs-MacBook-Pro:~ josephtroy$ mkdir -p projects/ project1/project_input_data Josephs-MacBook-Pro:~ josephtroy$ cd projects/ project1/project_input_data Josephs-MacBook-Pro:project_input_data josephtroy$ pwd /Users/josephtroy/projects/ project1/project_input_data Josephs-MacBook-Pro:project_input_data josephtroy$ sftp password: Connected to stubbslab.igb.illinois.edu. sftp> cd /home/share/example_rna_seq_project_16Gso sftp> cd fastq_files

12 Moving files between machines (laptop, PC, server) with command line sftp
Exercise continued – Slide 2 sftp> ls F_CB_1_WT_GTAGAG_L00M_R1_001.fastq F_CB_2_WT_TGACCA_L00M_R1_001.fastq F_CB_3_WT_ACAGTG_L00M_R1_001.fastq F_CB_4_16Gso_TAGCTT_L00M_R1_001.fastq F_CB_5_16Gso_GAGTGG_L00M_R1_001.fastq F_CB_6_16Gso_ATGTCA_L00M_R1_001.fastq Gso_WT_aug_2016_chrom5.tgz M_CB_1_WT_GGTAGC_L00M_R1_001.fastq M_CB_2_WT_CAGATC_L00M_R1_001.fastq M_CB_3_WT_CTTGTA_L00M_R1_001.fastq M_CB_4_16Gso_AGTCAA_L00M_R1_001.fastq M_CB_5_16Gso_AGTTCC_L00M_R1_001.fastq M_CB_6_16Gso_GTCCGC_L00M_R1_001.fastq head.fstq test_tophat sftp> get Gso_WT_aug_2016_chrom5.tgz Fetching /home/share/example_rna_seq_project_16Gso/fastq_files/Gso_WT_aug_2016_chrom5.tgz to Gso_WT_aug_2016_chrom5.tgz /home/share/example_rna_seq_project_16Gso/fastq_files/Gso_WT_aug_2016_chrom5.tgz % 775MB 3.5MB/s 03:40 sftp> exit Josephs-MacBook-Pro:project_input_data josephtroy$ ls Gso_WT_aug_2016_chrom5.tgz Josephs-MacBook-Pro:project_input_data josephtroy$

13 Moving files between machines (laptop, PC, server) with command line sftp
Exercise continued – Slide 3 Josephs-MacBook-Pro:project_input_data josephtroy$ ls Gso_WT_aug_2016_chrom5.tgz Josephs-MacBook-Pro:project_input_data josephtroy$ tar -zxvf Gso_WT_aug_2016_chrom5.tgz x F_CB_1_WT_GTAGAG_L00M_R1_001.fastq x F_CB_2_WT_TGACCA_L00M_R1_001.fastq x F_CB_3_WT_ACAGTG_L00M_R1_001.fastq x F_CB_4_16Gso_TAGCTT_L00M_R1_001.fastq x F_CB_5_16Gso_GAGTGG_L00M_R1_001.fastq x F_CB_6_16Gso_ATGTCA_L00M_R1_001.fastq x M_CB_1_WT_GGTAGC_L00M_R1_001.fastq x M_CB_2_WT_CAGATC_L00M_R1_001.fastq x M_CB_3_WT_CTTGTA_L00M_R1_001.fastq x M_CB_4_16Gso_AGTCAA_L00M_R1_001.fastq x M_CB_5_16Gso_AGTTCC_L00M_R1_001.fastq x M_CB_6_16Gso_GTCCGC_L00M_R1_001.fastq Josephs-MacBook-Pro:project_input_data josephtroy$ ls -lh

14 Linux commands to know William Pearson (who teaches the Computational and Comparative Genomics course at CSHL with Lisa) has command line lecture notes at: Linux/Unix command line cheat sheets are available on the web. Find one you like.

15 Linux commands to know -First open Terminal application on Mac-

16 Linux commands to know pwd, ls, ls –lh, ls -1, mkdir, cd, sh, head, tail
pwd print working directory, show the current folder ls list contents of the current folder ls –lh list contents with details (l), show file size & date as human readable (h) ls -1 list content names in a one column list mkdir make a new folder in the current folder. ex: mkdir my_new_folder cd change to new folder. ex: cd my_new_folder sh run a bash script. ex: sh main_script_tophat_16Gso.sh head show the beginning of a file. ex: head –n 20 my_fastq_file.fastq tail show the end of a file. ex: tail –n 20 my_fastq_file.fastq

17 Linux commands to know cp, mv, grep, wc, exit, module load, more, df -h
cp copy. copy file ex: cp oldfile.txt newfile.txt copy folder ex: cp –R old_folder new_folder mv move (rename). move file ex: mv oldfile.txt newfile.txt move folder ex: mv old_folder new_folder grep find string in a file. ex: grep aust2 gene_list.txt wc stands for “word count”, but used often to count lines in a file count lines in file ex: wc –l gene_list.txt exit exit terminal session or sftp session module load load software. ex: module load samtools/1.2 module avail see a list of available software modules more one way to see contents of a file. ex: more gene_lists.txt cat another way to see contents of a file. ex cat gene_lists.txt df –h See how much disk space is on the server


Download ppt "Stubbs Lab Bioinformatics - 2 Retrieving sequence data files and Linux commands Nov 17, 2016 Joe Troy."

Similar presentations


Ads by Google