AGEseq: Analysis of Genome Editing by Sequencing Liang-Jiao Xue, Chung-Jui Tsai Molecular Plant Volume 8, Issue 9, Pages 1428-1430 (September 2015) DOI: 10.1016/j.molp.2015.06.001 Copyright © 2015 The Author Terms and Conditions
Figure 1 The Workflow and Output of AGEseq. (A) Experimental considerations (left panel) and data analysis process (right panel) of AGEseq. Amplicons derived from different alleles and/or homologous genes can be obtained using degenerate primers. Amplicons from different samples or unrelated genes can be indexed, normalized, and pooled for sequencing. AGEseq supports multiple sequencing platforms and file types, and calls BLAT for sequence alignment against user-supplied reference sequences. AGEseq then identifies indels and SNPs to report genome editing events. (B) Example of the design file. Multiple genes and alleles can be included. (C) Example of an AGEseq output file with annotated genome editing events. Results are populated in eight columns as follows. A, input file name; B, gene/allele name; C and D, target sequence (C) from the design file that best matches the amplicon sequence in D; E, number of reads shown in D; F and G, target (F) and read (G) sequences displayed according to their alignment. Dashes are introduced to denote indels between the pair. “B” and “E” in F and G denote the beginning and end of the alignment; H, patterns of indels, if any, denoted by position (first integer) of insertion (I) or deletion (D), followed by the number (second integer) of nucleotides affected; I, patterns of SNPs, if any, denoted by the nucleotide position(s). By default, only indels are considered as genome editing events, as SNPs may arise from sequence errors introduced by PCR during library preparation and/or sequencing, or by base callers. For illustration purposes, only representative events with the highest and lowest read support are shown for each allele. Red boxed areas are examples of alignment artefacts that warrant manual inspection. Each analysis group (one allele from one sample) ends with read statistics that include “total reads” in the input file, “total hits” matching all target sequences in the design file, “sub hits” matching the specific target, and all “indel hits” for the given target. See Supplemental Text for further explanation. (D) Summary of AGEseq analysis is provided for each sample. Columns A–G are as above. Column H (indel or WT rate %) is calculated as the fraction of “indel or WT hits” over “sub hits”. Column I denotes the editing patterns as + (insertion) or – (deletion) with the number of nucleotides affected, or WT-like. See Supplemental Text for further explanation. (E) The summary information from (D) can be easily converted to a table suitable for publication. Allelic detection frequencies (% Reads) were recalculated by dividing the “indel hits” of each allele over the sum of the “sub hits” from both alleles. For off-target assessment, it is more appropriate to report the predominant, unedited (WT-like) event, as shown for the paralogous 4CL5. (F) The wood discoloration phenotype of line 61 due to biallelic mutations in 4CL1 that affect lignin biosynthesis (from Zhou et al., 2015). Scale bar represents 1 cm. Molecular Plant 2015 8, 1428-1430DOI: (10.1016/j.molp.2015.06.001) Copyright © 2015 The Author Terms and Conditions