Short Read Workshop Day 1 - Experimental Design Example 1: How to log in to vieques
Syllabus Day 1: Intro to Illumina & Library QC Instructors: Jamie Kershner/Daniel Malmer Day 2: Intro Linux Instructors: Joey Azofeifa/Daniel Malmer Day 3: Moving, clusters, notes Instructors: Joey Azofeifa/Daniel Malmer Day 4: Read QC and Read Trimming Instructors: Amber Sorenson/Daniel Malmer Friday catch-up day Day 5: Mapping and visualization Instructors: Jess Vera/Phil Richmond Day 6: Resequencing (genome-seq) Instructors: Phil Richmond/Aaron Odell Day 7: RNA-seq and differential expression Instructors: Aaron Odell/Jess Vera Day 8: Peak calling, ChIP-seq analysis Instructors: Amber Sorenson/Tim Read Friday catch-up day Week 1Week 2
Course structure Before class: – Videos ( – Firefox does not always work. Use Chrome, Internet Explorer or Safari During class (1-5 pm) – Examples with example files (fastq, sam, bam, bed) After class: – Homework
You will be learning “command line” This is not a GUI Spelling and Capitalization matter! Google can help you learn “command line unix” – The names for commands are not always intuitive (you will learn many this week)
Login Mac or unix Open Terminal by clicking on the magnifying glass in the corner and typing terminal and hitting enter ssh –X – Type password – When/if it says “enter passphrase” just hit enter 2x!!! PC – open Putty – Type vieques.colorado.edu under the host name – Hit open – Type identikey – Type password – When/if it says “enter passphrase” just hit enter 2x!!!
Check your access pwd ls mkdir mine touch temp.txt ls ls -lahtr cd mine pwd ls -lahtr cd.. pwd
VPN (Virtual Private Network) If you are not on campus to connect to vieques you must have vpn installed and turn it on before you log in! – Otherwise the login will just hang. internet-services/vpn/
Why use a server (cluster) Many computers can often do work much faster than one computer can. – built to run on a server/cluster (multi-threading) Some things take lots of space or memory Most bioinformatic programs are written for Unix/Linux – Because this platform is by programmers for programmers Often installing is a pain!
Which server can I use? Biofrontiers – Vieques CU Boulder – RC (Resource computing) Pro: they deal with up keep/installing and its mostly free Con: Under maintenance a lot, built more for physics problems Off campus – We don’t know… (Ask department?, Cloud computing?)
Big verse small Class will use small files for the sake of time. If you are using human or mouse data you will likely have much larger files.
Coffee Break… Videos read-class VPN internet-services/vpn/ `
Your project What experiment have you done or are you planning to do? What method are you using? Which sequencer? How many lanes will you sequence? Will you barcode? Inline or in adapter? Will you do Single vs. paired end? How many replicates will you do? What controls are you using? How much disk space will you use? Where will you process the data?
Some general guidelines Estimated coverage (DNA): Estimated reads (RNA): Coverage = Reads * Read Length Genome size ApplicationCoverage DNA de novo assembly100X DNA Resequencing~30X SNP analysis10X to 30X Genome sizeDifferential Expression Small (<20Mb)~10M Medium (~100Mb) ~20M Large (>1Gb)~30M
Common sequencer output MiSeq: – V2 kits: 15M clusters/run Run options: 1x50, 2x150, 2x250 – V3 kits: 25M clusters/run Run options: 1x150, 2x75, 2x300 NextSeq – HO kits: 400M clusters/run Run options: 1x75, 2x75, 2x150 – MO kits: 130M clusters/run Run options: 2x75, 2x150 HiSeq v3 – 200M clusters/lane – Run options: 1x50, 2x100
In class Problem 1, RNA-Seq You are performing an RNA-seq experiment to look at differential expression in mice – How do you check your RNA input? What are you looking for? – What are the different considerations for library prep?
In Class Problem 1, RNA-Seq You are performing an RNA-seq experiment to look at differential expression in mice – How do you check your RNA input? What are you looking for? Concentration: by Qubit for accuracy, also check DNA concentration Quality of total RNA: by Bioanalyzer, RIN>8 is generally recommended – What are the different considerations for RNA library prep? Stranded or not? How to remove ribosomal RNA? What kit? Multiplexing? How much sequencing do you need to do (#reads/sample)?
In Class Problem 1, RNA-Seq You are performing an RNA-seq experiment to look at differential expression in mice – How do you check your RNA input? What are you looking for? – What are the different considerations for library prep? You are running 3 different conditions each with 3 biological replicates – Which sequencing platform and run type do we need? Can we do this experiment in a single run/lane?
In class Problem 2 You want to resequence the human genome to find a dominant heterozygous mutation. – How much sequencing coverage do you need? – What sequencing format is the best? – How many reads do you need? – Do you need multiples runs or lanes?