Computing on TSCC Make a folder for the class and move into it –mkdir –p /oasis/tscc/scratch/username/biom262_harismendy –cd /oasis/tscc/scratch/username/biom262_harismendy.

Slides:



Advertisements
Similar presentations
ChIP-seq analysis Ecole de bioinformatique AVIESAN – Roscoff, Jan 2013.
Advertisements

DNAseq analysis Bioinformatics Analysis Team
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
CSE 303 Lecture 2 Introduction to bash shell
T UTORIAL OF U NIX C OMMAND & SHELL SCRIPT S 5027 Professor: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015.
Pathogen Informatics 21 st Nov 2014 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics.
Grep, comm, and uniq. The grep Command The grep command allows a user to search for specific text inside a file. The grep command will find all occurrences.
Using Unix Shell Scripts to Manage Large Data
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion.
NGS Analysis Using Galaxy
Introduction to RNA-Seq and Transcriptome Analysis
Advanced File Processing
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
MES Genome Informatics I - Lecture V. Short Read Alignment
DAY 1. GENERAL ASPECTS FOR GENETIC MAP CONSTRUCTION SANGREA SHIM.
NGS data analysis CCM Seminar series Michael Liang:
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
Chapter 9 Basic File Processing. Displaying File Contents cat, cat w/append tac nl pr more less head tail.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 2a – A Unix Command Sampler (Courtesy of David Notkin, CSE 303)
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
Comparative transcriptomics of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
Calling Somatic Mutations using VarScan
Introduction of the ChIP-seq pipeline Shigeki Nakagome November 16 th, 2015 Di Rienzo lab meeting.
UNIX commands Head More (press Q to exit) Cat – Example cat file – Example cat file1 file2 Grep – Grep –v ‘expression’ – Grep –A 1 ‘expression’ – Grep.
Archive overview and projects too. Important links Need to sign up for “library cards” – hp Then you.
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Setting up visualization. Make output folder for visualization files Log into vieques $ ssh
Canadian Bioinformatics Workshops
DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM. INDEX  Day 2  Get familiar with NGS  Understanding of NGS raw read file  Quality issue  Alignment/Mapping.
Unix Tools Tawatchai Iempairote November 22, 2011.
Konstantin Okonechnikov Qualimap v2: advanced quality control of
Tutorial of Unix Command & shell scriptS 5027
Introductory RNA-seq Transcriptome Profiling
Using command line tools to process sequencing data
Day 5 Mapping and Visualization
Lesson 5-Exploring Utilities
Stubbs Lab Bioinformatics - 2 Retrieving sequence data files and Linux commands Nov 17, 2016 Joe Troy.
Login The Login prompt provides access to the files located on the server.
Dowell Short Read Class Phillip Richmond
RNA Sequencing Day 7 Wooohoooo!
MGmapper A tool to map MetaGenomics data
Integrative Genomics Viewer (IGV)
Variant Calling Workshop
Stubbs Lab Bioinformatics - 3 Review RNA-Seq Analysis Overview Alignment using Tophat2 Nov 22, 2016 Joe Troy.
Linux command line basics III: piping commands for text processing
GE3M25: Data Analysis, Class 4
MiSeq Validation Pipeline
CSE 374 Programming Concepts & Tools
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Commands
Tutorial of Unix Command & shell scriptS 5027
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
Tutorial of Unix Command & shell scriptS 5027
BF528 - Biological Data Formats
ChIP-Seq Data Processing and QC
Next Gen. Sequencing Files and pysam
Maximize read usage through mapping strategies
Information processing after resequencing
Next Gen. Sequencing Files and pysam
Next Gen. Sequencing Files and pysam
BF528 - Sequence Analysis Fundamentals
Computational Pipeline Strategies
Presentation transcript:

Computing on TSCC Make a folder for the class and move into it –mkdir –p /oasis/tscc/scratch/username/biom262_harismendy –cd /oasis/tscc/scratch/username/biom262_harismendy Make symbolic link to data folder –ln -s /projects/ps-yeolab/biom262_2016/data/ harismendy_data Adjust your environment to have access to tools –Source exportPATH.txt Use screen –screen –S name to create –^A ^D to detach –Screen –x [PID].name to attach –exit to kill

UNIX commands Count number of lines –wc –l TSG.bed Select lines with “TP53” –grep ‘TP53’ TSG.bed Sort according to column 4 –sort –k 4 TSG.bed Select genes on chromosome 1 –awk ‘$1==“chr1”’ TSG.bed Calculate gene length –awk ‘$5=$3-$2’ TSG.bed

Working with Intervals Look at the file more harismendy_data/class/CGC.exons.bed Count number of lines wc –l harismendy_data/class/CGC.exons.bed Count the number of genes cut –f 4 harismendy_data/class/CGC.exons.bed | sort | uniq | wc –l Count number of intervals per gene cut –f 4 harismendy_data/class/CGC.exons.bed | sort | uniq –c | sort –r | more Calculate the size of each interval and sort by size awk ‘{size=$3-$2; print $0,size}’ harismendy_data/class/CGC.exons.bed | sort – n –r –k 5 Calculate the average interval size awk ‘{sum=sum+$3-$2} END {print sum/NR}’ harismendy_data/class/CGC.exons.bed

fastqc Inspect the fastq file zcat harismendy_data/class/file.fastq.gz | more Read the help menu fastqc –h Create a output directory mkdir fastqc Run fastqc fastqc –o fastqc harismendy_data/class/CGC.exons.bed/file.fastq.gz Transfer the results to your desktop Open results with web-browser

BWA alignment and BAM files Read the doc bwa mem or Start a Screen screen –S username Alignment + convert to sorted bam bwa mem harismendy_data/resources/hg19_lite.fa –t 1 R1.fq.gz R2.fq.gz | samtools view –buSh - > sample.bam Sort and index the bam file samtools sort –m 2G sample.bam sample.sorted samtools index sample.sorted.bam

Remove Duplicate reads java -jar /opt/biotools/picard/picard.jar MarkDuplicates INPUT=harismendy_data/class/P21.sorted.bam OUTPUT=P21.sorted.rmdup.bam METRICS_FILE=myrmdupMetrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true

Stats and Slices Calculate flag statistics samtools flagstat harismendy_data/class/P21.sorted.bam > P21.flagstat.txt How many “gapped reads” ? samtools view harismendy_data/class/P21.sorted.bam | awk ‘$6~/[ID]/’ | wc -l Subset the reads from an interval samtools view –bh –L harismendy_data/class/TSG.bed harismendy_data/P21.sorted.bam > P21.sorted.TSG.bam or samtools view -bh harismendy_data/P21.sorted.bam chr1: > P21.NOTCH2.bam

Visualize Alignments in IGV Transfer NOTCH2 BAM to your laptop (WinSCP, Fugu, cyberDuck, scp) Start IGV Use human hg19 genome reference

BEDTOOLS coverage Read the doc Read examples Calculate coverage depth over CGC genes (chr1 only) bedtools coverage -hist -abam harismendy_data/class/AA2253B_groupRealigned.chr1.bam -b harismendy_data/class/CGC.exons.chr1.bed > AA2253B.CGC.hist.cov.txt Fraction of CGC bp covered at >30x ? grep ’^all' AA2253B.CGC.hist.cov.txt| awk '$2>30' | awk '{sum=sum+$5} END {print sum}'

Calculate HS Metrics java -jar /opt/biotools/picard/picard.jar CalculateHsMetrics BAIT_INTERVALS=harismendy_data/resources/humanV4- baits_hg19_lite.interval_list TARGET_INTERVALS=harismendy_data/resources/resource/humanV4- targets_hg19_lite.interval_list INPUT=harismendy_data/class/P21.sorted.bam OUTPUT=P21.HsMetrics.txt