Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding new nirK genes in metagenomic data

Similar presentations


Presentation on theme: "Finding new nirK genes in metagenomic data"— Presentation transcript:

1 Finding new nirK genes in metagenomic data

2 What is nirK? -one kind of nitrite reductase
nirK is nitrite reductase, a gene involved in denitrification. Denitrification is an essential part of Nitrogen Cycling. The following are important member of Nirogen Cycling.

3 Nitrogen Cycling This picture shows the procedures in Nitrogen Cycling: Nitrogen fixation, Nitrification, Denitrification. They are important for global nitrogen equillibrium. In general, denitrification occurs where oxygen , a more energetically favorable electron acceptor than these molecules, is depleted, and Bacteria respire nitrate as a substitute terminal electron acceptor. Due to the high concentration of oxygen is our atmosphere, denitrification only take place in environments where oxygen consumption exceeds the rate of oxygen supply, such as in some soils and groundwater, wetlands, poorly ventilated corners of ocean, and in seafloor sediments. Nirk is the nitrite reductase reduce nitrite to nitric oxide.

4 +5 +3 +2 +1 In denitrification, Nitrate(+5) is reduced to Nitrite(+3), to Nitric oxidase(+2), to nitrous oxidase(+1), to Nitrogen(0) with different denitrifiers. Nirtrite reductase is the one that turn nitrite into nitric oxide or nitrous oxide, which are first gaseous product in denitrification, so it is the key enzyme and has numerous sequences available now. Nitrous oxide is an important factor for global warming and ozone depletion. For a 100 year period global warming potential, nitrous oxide has 298 times more impact per unit weight than carbon dioxide. In general, denitrification occurs where oxygen , a more energetically favorable electron acceptor than these molecules, is depleted, and Bacteria respire nitrate as a substitute terminal electron acceptor. Due to the high concentration of oxygen is our atmosphere, denitrification only take place in environments where oxygen consumption exceeds the rate of oxygen supply, such as in some soils and groundwater, wetlands, poorly ventilated corners of ocean, and in seafloor sediments. We collect soil sample from KBS LTER(Long Term Ecological Research), where NirS is not detected. So nirK is selected as our target gene.

5 Metagenomic Datasets 2 Samples from Agricultural soil, 2 sequencing runs per sample( by roche 454 pyrosequecing technique) 2 Samples from Forest soil, 2 sequencing runs per sample( by roche 454 pyrosequecing technique ) Data are from Tom Schmidt Lab The reason Why Agricultural soil and Forest soil are chosen is that there might be decrease or increase in denitrifiers in soil for fertilizer(nitrate) added into soil.

6 Methods Start with sequence similarity search softwares-------HMMER
HMMER : an implementation of profile hidden Markov models (profile HMMs) for biological sequence analysis Profie HMMs are built from multiple sequence alignment made of known members of a given protein family by alignment tool Profile HMMs has global and local mode. Local mode is used in my research.

7 Advantage over BLAST HMMs have a formal probabilistic basis: use probability theory to guide how all the scoring parameters should be set HMMS have consistent theory behind gap and insertion scores But much slower than BLAST Useful on searching or annotation of domain structures of protein; finding sequences of proteins sequence family.

8 HMMER components HMMER has components: to build profile HMM---hmmbuild
to search a profile against sequence database---hmmsearch and to align sequences according to a existing profile---hmmalign Hmmbuild, hmmcalibrate, hmmsearch, hmmalign are mainly used.

9 Mutiple alignment format
Fungene pipe line download 6 Good known nirKs clustalw Mutiple alignment format blast hmmbuild Against soildata 6 different and well characterized nirK genes are made into a profile HMM, search against soil data. Blast is the most popular sequence similarity search tool, so I am interested to see the search result difference between two tools. BlAST nirK result Potential nirKs hmmsearch Profile HMM compare Against soil data hmmcalibrate

10 Blast and Hmmer results
input files: /u/gjr/nirk2/ma1w2_run1_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w2_run1_dereplicated_localhmm.txt blastOnly: 23 shared : 6 hmmOnly : 2 input files: /u/gjr/nirk2/ma1w2_run2_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w2_run2_dereplicated_localhmm.txt blastOnly: 28 shared : 8 hmmOnly : 4 input files: /u/gjr/nirk2/ma1w4_run1_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w4_run1_dereplicated_localhmm.txt blastOnly: 24 hmmOnly : 5 input files: /u/gjr/nirk2/ma1w4_run2_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w4_run2_dereplicated_localhmm.txt blastOnly: 34 shared : 16 hmmOnly : 5  Interesting

11 Profile matters! Hmmsearch 6 seed profile hmm against all 3055 fungene nirKs (some may not real nirKs…) See the E-value distribution

12 6Seed profile e-value distribution
make the seqs(124) on left into a profile

13 124Seq e-value distribution

14 Cumulative curve The green line(124Seq) is above blue(6seq) near This means the whole distribution of e-values moves left a little. For e-value, the smaller, more likely that sequence is nirK. The 126 Seqs are relatively better? At least, from this perspective, it is true.

15 124Seq profile HMMER and BLAST Result
input files: /u/gjr/nirk3/ma1w2_run1_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w2_run1_dereplicated.localhmm.txt  blastOnly: 112 shared : 7 hmmOnly : 0  input files: /u/gjr/nirk3/ma1w2_run2_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w2_run2_dereplicated.localhmm.txt  blastOnly: 129 shared : 8 hmmOnly : 0   input files: /u/gjr/nirk3/ma1w4_run1_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w4_run1_dereplicated.localhmm.txt  blastOnly: 109 shared : 10 input files: /u/gjr/nirk3/ma1w4_run2_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w4_run2_dereplicated.localhmm.txt  blastOnly: 120 shared : 18 hmmOnly : 0 Hmmer results are totally covered by Blast. I think blast result has a lot of bad nirks. But we still can tell which are real nirKs or not, try other methods, then come back to the blast problem.

16 Then tree method Just to show an idea nirK1 Seq1(good) nirK2 nirK1
Seq2(bad)

17 NCBI nirK(cultured) Soil blast result Soil Hmmeresult Hmmalign with 6 seq profile quicktree tree

18 Too big that it is hard to get any conclusion from it
Too big that it is hard to get any conclusion from it. Considering write a program to parse this tree.

19 Question to answer Best definition of nirK according to the current information Criteria of choosing seeds for profile hmm Blast false positive problem

20 Thanks


Download ppt "Finding new nirK genes in metagenomic data"

Similar presentations


Ads by Google