Presentation is loading. Please wait.

Presentation is loading. Please wait.

-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.

Similar presentations


Presentation on theme: "-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large."— Presentation transcript:

1 -1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large amounts of data on the genomic and transcriptomic level. The raw output of these sequencing technologies are short (or not so short) reads that correspond to the templates found in a given sample; in the case of transcriptome the short reads will correspond to sequences found in a given mRNA. Later, alignment algorithms are used to find the location of the reads on a reference genome. Finally, one can visualize the “pile up” of reads in a particular region by looking at coverage plots. The higher the plot, the more expressed a transcript is. For the purpose of the following exercises, remember that the sequences originate from transcriptome sample (mRNA) and therefore only contains information about the exons and UTRs. In a more visual way … imagine this transcript is present in the sample 5’ UTR Exon 1 Exon 2 Exon 3 3’ UTR Reads belonging to the transcript are produced by the sequencing process. When the reads come out as raw data, there is no information about where they belong on the reference genome. What is more, all reads from several different transcripts come out together. An alignment algorithm finds where they belong in the reference genome based on similarity matches. The plot shown above the gene models represents the number of reads that align to the genome at each base position. This allows us to identify coding regions: exons (yellow) and UTRs(white).

2 -2- Module 3: RNA-Seq Introduction In euglenozoans, and a wide range of other eukaryotes including many groups of metazoa, mRNAs are post-transcriptionally modified by the addition of a conserved 5’ spliced-leader RNA. In kinetoplastids this occurs as part of the processing of poly-cistronic mRNA into mono- cistronic transcripts. Spliced leaders can be used as universal sites for trapping and reverse transcribing and sequencing the 5’ends of mRNAs, allowing for Solexa sequencing of cDNAs containing 5’ spliced leaders and downstream sequence. The spliced leader sequences can then be trimmed from the generated sequences and the reads aligned against a reference genome as would be done with traditional RNAseq methods. Mapping of spliced leader sequences provides valuable information on gene structure, allowing for the determination of splice leader addition sites (often more than one per gene), and corresponding 5’ UTRs. This information can be used to adjust the boundaries of CDS features: if a major splice addition site occurs within the range of a CDS feature we can assume that CDS part of the gene beings downstream of this and can be trimmed to the next methionine. In the example below (.bam file courtesy of P. Myler), a pile-up of reads defines a major splice addition site. As you will see in the following exercises, coverage plots can be used to find major and minor splice addition sites across a chromosome Spliced Leader Sequencing SL Addition Site 5’ UTR CDS Gene Reads

3 -3- Module 3: RNA-Seq Exercise 1 - Using Spliced-Leader RNAseq data to inform stuctural annotation In the following example we will use spliced leader RNAseq data provided by the Myler lab to create new sequence features - a splice site, a 5’ UTR and to modify the coordinates of a gene model. Open the Lmjchr3.embl file Select Read BAM from file menu Read in LmjFSL.LmjF.TTG.chr3.bam

4 -4- Module 3: RNA-Seq After loading in the.bam file, a new window will appear showing the aligned reads. Note that BAMview will treat reads mapping to identical positions as duplicate reads. These will appear as only a single green read in the ‘zoomed’ out view of Artemis. To get a better view of the pile-up of reads in a given region, zoom in to a region containing mapped read. This will result in a view similar to the one on the previous slide. Duplicate reads (green) Read coverage ‘Pile-up’ view Bamview will also generate coverage plots that will provide you with a general idea of how many reads have mapped across a length of chromosome. To view a coverage plot, right click on the Bamview window, then select Show>Coverage plot. Taking a broad view of the entire chromosome, you should be able to see which genes have RNAseq evidence for their expression, and get a rough idea of where the splice leader addition site is. You can then select the reads in the Bamview window and zoom for a view of individual reads. Zoom into sequence level to see individual reads in BAMview window Single reads (black)

5 -5- Module 3: RNA-Seq Read coverage ‘Pile-up’ view 1. Click and drag to select the reads in the region where there is a coverage peak 2. Use the scroll bar to zoom in to a view of individual reads Create features for both splice acceptor sites and 5’UTRs by selecting the desired based pairs, then Create>Feature from Selected Base Range. Select ‘splicesite’ or ‘5’ UTR’ as the key. Finally adjust the size of the gene by opening the feature editor and changing the 5’ coordinates to match those of the 5’ UTR 3. Create ‘splicesite’ feature 4. Create 5’ UTR feature 5. Adjust gene coordinates to beginning of 5’ UTR by dragging feature or opening feature editor and changing coordinates

6 -6- Module 3: RNA-Seq Using a combination of coverage plot and a invidividual read view: 1.Find 5 genes with one spliced leader addition site upstream of the putative start codon. Create a ‘splicesite’ feature, and a 5’ UTR feature for the gene. Then adjust the gene coordinates to include the 5’ UTR 2.Find five gene with two or more splice addition sites, create ‘splicesite’ features for these and create two ‘alternative’ 5’ UTR features for both of these. 3.Find 5 genes where the spliced leader addition site overlaps with the predicted CDS. Adjust the gene boundaries to the next methionine downstream of the spliced leader addition site, create’splicesite’ features and 5’UTRs for these. 4.Find an expressed pseudogene.


Download ppt "-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large."

Similar presentations


Ads by Google