Results report: _roreriPE_AGTCAA_L008_R1_all. fastq

Slides:



Advertisements
Similar presentations
HS 67 - Intro Health Statistics Describing Distributions with Numbers
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Understanding and Comparing Distributions 30 min.
The Standard Normal Curve Revisited. Can you place where you are on a normal distribution at certain percentiles? 50 th percentile? Z = 0 84 th percentile?
Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
Measures of Central Tendency Psych 101 with Professor Michael Birnbaum.
Measures of Central Tendency
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
Chapter 5 – 1 Chapter 5: Measures of Variability The Importance of Measuring Variability The Range IQR (Inter-Quartile Range) Variance Standard Deviation.
NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
6 - 1 Basic Univariate Statistics Chapter Basic Statistics A statistic is a number, computed from sample data, such as a mean or variance. The.
1 Tendencia central y dispersión de una distribución.
Today’s Questions How can we summarize a distribution of scores efficiently using quantitative (as opposed to graphical) methods?
Basic Statistics Foundations of Technology Basic Statistics © 2013 International Technology and Engineering Educators Association, STEM  Center for Teaching.
From Last week.
Measures of Central Tendency & Spread
POPULATION DYNAMICS Required background knowledge:
Chapter 3 Descriptive Measures
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.
Table of Contents 1. Standard Deviation
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Analyze Data USE MEAN & MEDIAN TO COMPARE THE CENTER OF DATA SETS. IDENTIFY OUTLIERS AND THEIR EFFECT ON DATA SETS.
Unit 3 Lesson 2 (4.2) Numerical Methods for Describing Data
Welcome to MM150 Seminar 9: Statistics, Part II To resize your pods: Place your mouse here. Left mouse click and hold. Drag to the right to enlarge the.
Unit 3: Averages and Variations Week 6 Ms. Sanchez.
QC and pre-assembly analyses
Descriptive Statistics for one variable. Statistics has two major chapters: Descriptive Statistics Inferential statistics.
Adapter and quality trimming Mick Watson Director of ARK-Genomics The Roslin Institute.
Case Closed The New SAT Chapter 2 AP Stats at LSHS Mr. Molesky The New SAT Chapter 2 AP Stats at LSHS Mr. Molesky.
CCGPS Coordinate Algebra Unit 4: Describing Data.
STATISICAL ANALYSIS HLIB BIOLOGY TOPIC 1:. Why statistics? __________________ “Statistics refers to methods and rules for organizing and interpreting.
Section 2 Standard Units and Areas under the Standard Normal Distribution.
Normal Probability Distributions Normal Probability Plots.
Canadian Bioinformatics Workshops
Exploratory Data Analysis
a graphical presentation of the five-number summary of data
Introduction to Hypothesis Test – Part 2
Descriptive Measures Descriptive Measure – A Unique Measure of a Data Set Central Tendency of Data Mean Median Mode 2) Dispersion or Spread of Data A.
Figure 1. The overall workflow of RNA-seq QC
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
1 Random, normal, es =
Calculating Median and Quartiles
The FASTQ format and quality control
Other Normal Distributions
An Example of {AND, OR, Given that} Using a Normal Distribution
آشنايی با اصول و پايه های يک آزمايش
One-Way Analysis of Variance: Comparing Several Means
Distributions (Chapter 1) Sonja Swanson
Basic Statistical Terms
Describing Data with Numerical Measures
Descriptive Statistics: Describing Data
The absolute value of each deviation.
Algebra I Unit 1.
Mean Absolute Deviation
CENTRAL MOMENTS, SKEWNESS AND KURTOSIS
Homework: pg. 119 #3,4; pg. 122 #6-8 3.) A. Judy’s bone density score is about one and a half standard deviations below the average score for all women.
Statistics 2 Lesson 2.7 Standard Deviation 2.
MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3
Mean Absolute Deviation
Advanced Algebra Unit 1 Vocabulary
Mean Absolute Deviation
Statistics Review MGF 1106 Fall 2011
Biostatistics Lecture (2).
Consider the following problem
Presentation transcript:

Results report: 002421_roreriPE_AGTCAA_L008_R1_all. fastq Results report: 002421_roreriPE_AGTCAA_L008_R1_all.fastq.gz FastQC Report AGRY 60000: Genomics

Warnig and failure in the last base of graphic. Problems: Warnig and failure in the last base of graphic. Clear with: TRAILING: Cut bases off the end of a read, if below a threshold quality. Warning A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. Failure This module will raise a failure if the lower quartile for any base is less than 5 or if the median for any base is less than 20. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

The average quality above 27. No problems: The average quality above 27. Warning A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. Failure An error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

No problems, the diference between A-T and C-G are less than 10% Warning This module issues a warning if the difference between A and T, or G and C is greater than 10% in any position. Failure This module will fail if the difference between A and T, or G and C is greater than 20% in any position. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

No problems with GC distribuition Warning A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. Failure This module will indicate a failure if the sum of the deviations from the normal distribution represents more than 30% of the reads. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

No problems with N content Warning This module raises a warning if any position shows an N content of >5%. Failure This module will raise an error if any position shows an N content of >20% . http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

warning – sequences of variable length Problems: warning – sequences of variable length And failure: no sequence with 0 lenght. Clear with MAXINFO: An adaptive quality trimmer which balances read length and error rate to maximise the value of each read . Warning This module will raise a warning if all sequences are not the same length. Failure This module will raise an error if any of the sequences have zero length. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

No problems. Warning This module will issue a warning if non-unique sequences make up more than 20% of the total. Failure This module will issue a error if non-unique sequences make up more than 50% of the total. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

Problems: sequence 1 overrepresented (0.37%). Clear with ILLUMINACLIP: Cut adapter and other illumina-specific sequences from the read. Warning This module will issue a warning if any sequence is found to represent more than 0.1% of the total. Failure This module will issue an error if any sequence is found to represent more than 1% of the total http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

Problems: warning kmers overepresented in the sequence . Clear with:--Kmers Warning This module will issue a warning if any k-mer is enriched more than 3 fold overall, or more than 5 fold at any individual position. Failure This module will issue a error if any k-mer is enriched more than 10 fold at any individual base position. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

Results report: 002421_roreriPE_AGTCAA_L008_R2_all. fastq Results report: 002421_roreriPE_AGTCAA_L008_R2_all.fastq.gz FastQC Report

Warnig and failure in bases 72-95. Problems: Warnig and failure in bases 72-95. Clear with: TRAILING: Cut bases off the end of a read, if below a threshold quality, or/ also CROP: Cut the read to a specified length by removing bases from the end. Warning A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. Failure This module will raise a failure if the lower quartile for any base is less than 5 or if the median for any base is less than 20. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

The average quality are above than 27. No problems: The average quality are above than 27. Warning A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. Failure An error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

Problems: diference between A-T greater then 10% in the position 1-2 . Clear with: LEADING: Cut bases off the start of a read, if below a threshold quality. Warning This module issues a warning if the difference between A and T, or G and C is greater than 10% in any position. Failure This module will fail if the difference between A and T, or G and C is greater than 20% in any position. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

No problems with GC distribuition. Warning A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. Failure This module will indicate a failure if the sum of the deviations from the normal distribution represents more than 30% of the reads. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

No problems with N content. Warning This module raises a warning if any position shows an N content of >5%. Failure This module will raise an error if any position shows an N content of >20%. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

warning – sequences of variable length Problems: warning – sequences of variable length And failure: no sequence with 0 lenght. Clear with MAXINFO: An adaptive quality trimmer which balances read length and error rate to maximise the value of each read. Warning This module will raise a warning if all sequences are not the same length. Failure This module will raise an error if any of the sequences have zero length . http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

No problems. Warning This module will issue a warning if non-unique sequences make up more than 20% of the total. Failure This module will issue a error if non-unique sequences make up more than 50% of the total. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

Problems: sequence 1 overrepresented (0.41%) Clear with ILLUMINACLIP: Cut adapter and other illumina-specific sequences from the read. Warning This module will issue a warning if any sequence is found to represent more than 0.1% of the total. Failure This module will issue an error if any sequence is found to represent more than 1% of the total http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

Problems: warning kmers overepresented in the sequence. Clear with:--Kimers. Warning This module will issue a warning if any k-mer is enriched more than 3 fold overall, or more than 5 fold at any individual position. Failure This module will issue a error if any k-mer is enriched more than 10 fold at any individual base position. http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

Resources used: Zhou X, Rokas A. Prevention, diagnosis and treatment of high-throughput sequencing data pathologies. Mol Ecol. 2014 Apr;23(7):1679-700. doi:10.1111/mec.12680. http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf http://dnacore.missouri.edu/PDF/FastQC_Manual.pdf