High Throughput Sequencing Tutorial 6 High Throughput Sequencing
HTS tools and analysis Visualization - IGV Analysis platform – Galaxy Tuning up the pipelines
Working with IGV
http://www.broadinstitute.org/igv/
Why and how to work with IGV
Base qualities, comparison between samples
False positive indels
Same mapping statistics – different meaning What might cause this low percentage of mapping?
The sample contains a high percentage of contamination The sample is very different from the reference genome
One image is worth a thousand words…
Structural Variations Large deletion in the sample compared to the reference genome
Galaxy
https://main.g2.bx.psu.edu/
Use your account name and password to login to Galaxy:
Uploading data to Galaxy
Mapping, filtering and conversion to BAM
Mapping
Filter SAM file
Convert SAM to BAM
Variant calling
Create pileup
Find variants
Tuning up the pipelines
How can mapping parameters affect the results 1 mismatch per read 5 mismatches per read
False positives vs. true negatives One pipeline for all projects? False positives vs. true negatives 3-bases insertion
How can you tune your analysis? Try different programs. Mapping: Change mapping parameters Use non-unique mappings Don’t filter duplicates Variants: Change variant filtration Change variant merging – penetrance, different heredity, low coverage in one individual… Look for bigger variants: big insertions/ deletions, inversions, copy number variations etc. Gene expression: Change the test threshold