Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on.

Presented by Samuel Chapman

Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on a single strand. Capable of producing about 400,000 reads of around 250 bp each! The process takes half a day and costs only several thousand dollars.

Pyrosequencing-Intro To sequence bacterial communities, the reads are often generated using known, conserved flanking regions as primers for a homologous region. PCR is used to amplify the number of copies of the desired region. The middle is where the variation among the population lies. The numbers of the sequences increase, but the proportions for each species are the same.

Pyrosequencing- Intro These regions are homologous, but only the conserved primer regions are the same. The middle areas can be different. These regions will be our sequencing reads.

Pyrosequencing-Methods Each separate DNA sample is put onto a bead. PCR is then performed, so that each bead has one kind of sample. Each bead is put into one of hundreds of thousands of separate wells, so that each well has a distinct sample (although two wells may have identical samples). The DNA on the beads is single-stranded, and the primer is attached, allowing for extension. Enzymes and chemicals are added so that, every time a new base is added, light is released.

Pyrosequencing- Methods

Bases are added to the sequences by covering the well plate with a nucleotide, washing it away, then doing the same thing with the other three, then starting over. Ex: A..T..C..G | A..T..C..G | A..T..C..G where ‘..’ represents washing and ‘|’ denotes a new cycle. NOTICE: if a sequence has two or more of a letter in a row, all of those will be added in one step. If more than one letter is added at once, more light will be emitted from that well.

Pyrosequencing-Methods Each well can be monitored for the amount of light it emits at each nucleotide step (how long the “homopolymer” is). The sequence of emissions is called a flowgram. Naively, an intensity of 0 means a homopolymer of length 0, intensity of 1 a homopolymer of length 1, intensity of 2 a homopolymer of length 2… HOWEVER, the intensity is rather a distribution, and can therefore lead to errors such as insertions and deletions.

Example from paper Consider a known sequence, ACTGGGG. The order of nucleotide addition is T..A..C..G Intensities “should” produce 0, 1, 1, 0| 1, 0, 0, 4 Observed flowgram was.18, 1.03, 1.02,.70 | 1.12,.07,.14, 4.65. This suggested the sequence ACGTGGGGG, because.70 and 4.65 rounded up are 1 and 5. Therefore, it is better to use distributions to more accurately predict the sequence.

Intensity distribution created using known sequences (from paper)

Dealing with the noisy data Using the intensity data, a “distance” measure was defined, which reflected the probability that each flowgram represented a particular sequence. All distances were applied to a mixture model, and an iterative expectation maximization algorithm was employed to gradually bring the flowgrams into agreement with the “true” data. Artifacts such as PCR chimeras were dealt with using the Mallard algorithm.

Flowgram preclustering Assumption: the likelihood of the flowgrams is represented by the mixture model. Each sequence is a different part of the mixture and has it’s own probability. σ is the cluster size of flowgrams around a sequence fi is the density of the observed flowgrams about a sequence Sj is a particular sequence

Flowgram preclustering The likelihood of the dataset, D, of N flowgrams indexed i: τ j is each sequence’s relative frequency

Preclustering analogy The flowgrams are clustered, with the size of each cluster, σ, being 5 flowgrams. We guess that each cluster represents one sequence. This is just an analogy, because the mixture is not two-dimensional like this.

Expectation maximization Assume matrix Z, with rows representing flowgrams, columns representing sequences. z i, j =δ i,m(i), where m(i) is the sequence that generated flowgram i. Complete data likelihood is:

Expectation maximization Define z’ i, j as z i, j given model parameters. Expectation step: calculate z’ i, j given model parameters Maximization step: calculate new parameters such that LC is maximized according to z’ i, j. Stop when the improvement between steps falls below a cutoff, c.

Expectation maximization analogy Choose a beginning sequence (red square) in each cluster. There are many such clusters in the model. The black circles are flowgrams in the cluster. Expectation: calculate the parameters, such as likelihood that these flowgrams generate the sequence. Maximization: calculate a new sequence that is closer to the “real” sequence based on the flowgrams. You can see here that the sequence moves to a more likely position to the flowgrams. In the paper, the aggregate distance is calculated for all sequences.

Expectation maximization E step (calculating new z’ i, j) M step (calculating new relative frequencies, τ j,and then sequences

A visual example of the process

Testing the algorithm The pyrosequencing algorithm was tested on 16s rRNA from 90 known microbial clones. After sequencing, the samples were grouped phylogenetically into operational taxonomic units (OTUs) and the accuracy compared to real life. The sequence difference threshold for the creation of a separate OTU had to be larger than the noise (see next slide)

OTU assignment The assignment of OTUs depends on the required threshold of difference for a separate OTU. A higher difference results in fewer OTUs, because species become clustered together. A threshold that is below the noise level could result in the same species becoming two different OTUs.

Results

Take-home message The noise reduction algorithm employed by this paper resulted in more accurate sequence assignment. Average linking is better at handling noise.

Questions?

Acknowledgments www.wikipedia.org Pyrosequence pic: http://jeb.biologists.org/content/vol210/issue9/images/larg e/JEB001370F2.jpeg

Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on.

Similar presentations

Presentation on theme: "Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on.

Similar presentations

Presentation on theme: "Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on."— Presentation transcript:

Similar presentations

About project

Feedback