Facilitator: Richard Bruskiewich NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools March 15th, 2012 BioSci room B9242 Facilitator: Richard Bruskiewich Adjunct Professor, MBB
Learning Objectives Linux revisited Quick dive into the Open-Bio pool (BioPython) A first look at NGS data: NCBI short read archive Processing NGS: FASTX tool kit et al. Visualization: IGV
Files and Permission Linux user permissions: owner, group, or others Owner/user is the person who created the file “OWNS” the file / directory Group is a team of people that’s associated together GROUP project / Team work Others is just other people on the server Each file / directory can have it’s permission set to (r)ead, (w)rite, or e(x)ecute
chmod: change file permissions Do a long listing (ls –l) dr-x-wxrw- Separated into four sections (d)(r - x)(- w x)(r w -) Examples: chmod o+x foo.txt grant ‘execute’ permission to ‘others’ on foo.txt chmod g-rw foo.txt remove ‘read’ and ‘write’ permission from group chmod ugo+rwx foo.txt grant all rights to everyone To change the user/group (‘owner’) of a file: chmod ubuntu:ubuntu foo.txt directory or file (-) user (owner) group others
a few useful tips… Hitting “tab” will auto-complete file or program names (or suggest possible names) Up arrow will let you return to previous commands Editing of text files: “nano” is an easier alternative to “emacs”, but less powerful alternatively, use SSH client to transfer files on your Windows desktop, edit them in Windows, then transfer back BUT: make sure you use a text editor that knows the difference between a Windows and a Linux text file (e.g. Notepad++)
Some more useful basic Linux commands “cd” changes your directory, e.g. ‘cd /usr/local’ “man” display manual for command, e.g. ‘man ‘ls’ “pwd” tells you the directory you are currently in (= working directory) “history” will list recent commands, enumerated with line numbers. By; typing an exclamation point with the line number (e.g. !123), you can redo the command
Accessing remote servers “ssh” – Secure Shell ssh –i private_keypair user@host “scp” – Secure CoPy ssh –i private_keypair [user@host:]sourcefile [user@host:]targetfile Where user is the account (default: local user) and host is the internet name of the computer (defaults: local host)
OpenBio Case Study: BioPython http://biopython.org/wiki/Biopython http://biopython.org/DIST/docs/tutorial/Tutorial.html
NGS Bioinformatics Workshop 1 NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools First look at ngs data
http://www.ncbi.nlm.nih.gov/sra/
http://hannonlab.cshl.edu/fastx_toolkit/ Linux, MacOSX or Unix only
Get the precompiled binary wget http://hannonlab.cshl.edu/fastx_toolkit/Ã fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2 bunzip2 fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2 tar –xvf fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar sudo mv bin/* /usr/local/bin
FASTX tool kit I FASTQ-to-FASTA converter FASTQ Information Convert FASTQ files to FASTA files. FASTQ Information Chart Quality Statistics and Nucleotide Distribution FASTQ/A Collapser Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts) FASTQ/A Trimmer Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise). FASTQ/A Renamer Renames the sequence identifiers in FASTQ/A file. FASTQ/A Clipper Removing sequencing adapters / linkers
FASTX tool kit II FASTQ/A Reverse-Complement FASTQ/A Barcode splitter Producing the Reverse-complement of each sequence in a FASTQ/FASTA file. FASTQ/A Barcode splitter Splitting a FASTQ/FASTA files containing multiple samples FASTA Formatter Changes the width of sequences line in a FASTA file FASTA Nucleotide Changer Converts FASTA sequences from/to RNA/DNA FASTQ Quality Filter Filters sequences based on quality FASTQ Quality Trimmer Trims (cuts) sequences based on quality FASTQ Masker Masks nucleotides with 'N' (or other character) based on quality
www.bioinformatics.bbsrc.ac.uk/projects/download.html http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
Integrative Genomics Viewer http://www.broadinstitute.org/igv/