Presentation is loading. Please wait.

Presentation is loading. Please wait.

This presentation uses animations and is best viewed as a slide show. To start the presentation, click Slide Show on the top tool bar and then View show.

Similar presentations


Presentation on theme: "This presentation uses animations and is best viewed as a slide show. To start the presentation, click Slide Show on the top tool bar and then View show."— Presentation transcript:

1 This presentation uses animations and is best viewed as a slide show. To start the presentation, click Slide Show on the top tool bar and then View show

2 Welcome to Introduction to Bioinformatics Wednesday, 28 February 2007 Introduction to Viral Metagenome Project Discussion of Edward & Rohwer (2005)* Exam retrospective (Problem 12) Other matters? *Unless otherwise noted, all figures herein are from: Edwards RA, Rohwer F (2005). Viral metagenomics. Nature Rev Microbiol (2005) 3:504-510.

3 Edwards & Rohwer (2005) Phage phylogeny and taxonomy Placement of unknown phage into phylogeny SQ11. How to test? Result of test? ~50,000 nt Blast ~500 nt

4 Edwards & Rohwer (2005) The proviral metagenome SQ11. What's a provirus or prophage? Why would a virus do such a thing?

5 Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Phage genome Deat h General transduction Edwards & Rohwer (2005) The proviral metagenome Lytic pathway

6 Infection Phage Bacterial chromosome Phage genome Life! Lytic pathwayLysogenic pathway Edwards & Rohwer (2005) The proviral metagenome

7 Edwards & Rohwer (2005) Viral community structure and ecology SQ14. What means ~10 12 viruses but only ~1000 viral genotypes? Two scenarios?

8 Edwards & Rohwer (2005) Viral community structure and ecology SQX. How to measure complexity? - Sample 1000 - How many counted once? - How many counted twice? - How many counted zero times? - Model the process Use different number of types

9 Edwards & Rohwer (2005) Viral community structure and ecology SQX. How to measure complexity? 200 types Times encountered ProbabilItyProbabilIty

10 Edwards & Rohwer (2005) Viral community structure and ecology SQX. How to measure complexity? 200 types Times encountered ProbabilItyProbabilIty 5000 types

11 Edwards & Rohwer (2005) Viral community structure and ecology SQX. How to measure complexity? Times encountered ProbabilItyProbabilIty

12 Edwards & Rohwer (2005) Bioinformatics and viral metagenomics 1. How to identify genes? 2. How to identify genes' viruses?

13 Edwards & Rohwer (2005) Bioinformatics and viral metagenomics How to identify genes? Sequence Open reading frames Sequence 151 TATTTCGTAG TTATGTTGAA CCGATGAAAC TTGTTTGTTC TCAAATTGAG Translation-Frame-1 151 Y F V V M L N R * N L F V L K L S Translation-Frame-2 151 I S * L C * T D E T C L F S N * A Translation-Frame-3 151 F R S Y V E P M K L V C S Q I E Complement 151 ATAAAGCATC AATACAACTT GGCTACTTTG AACAAACAAG AGTTTAACTC Translation-Frame-4 151 I E Y N H Q V S S V Q K N E F Q Translation-Frame-5 151 Y K T T I N F R H F K N T R L N L Translation-Frame-6 151 T N R L * T S G I F S T Q E * I S Sequence 201 CTCAATACAG CTCTTCAACT AGTTAGTAGA GCTGTAGCCA CTAGGCCTTC Translation-Frame-1 201 S I Q L F N * L V E L * P L G L R Translation-Frame-2 201 Q Y S S S T S * * S C S H * A F Translation-Frame-3 201 L N T A L Q L V S R A V A T R P S Complement 201 GAGTTATGTC GAGAAGTTGA TCAATCATCT CGACATCGGT GATCCGGAAG Translation-Frame-4 201 A * Y L E E V L * Y L Q L W * A K Translation-Frame-5 201 E I C S K L * N T S S Y G S P R Translation-Frame-6 201 S L V A R * S T L L A T A V L G E Open reading frame finder + ORF characteristics E.g. GeneMark

14 Edwards & Rohwer (2005) Bioinformatics and viral metagenomics How to identify genes? Sequence Open reading frames Predicted function BlastP

15 Edwards & Rohwer (2005) Bioinformatics and viral metagenomics How to identify genes? Sequence Open reading frames Predicted function BlastN? SQ16. Other Blasts? TBlastX? Why so much time?

16 Edwards & Rohwer (2005) Bioinformatics and viral metagenomics How to identify genes' viruses?

17

18

19 Codon usage in different organisms SQ16. What means "codon usage"? How useful?

20 GC content in different organisms SQ18. GC/AT differences in cyanobacterial genomes?

21 GC content in different organisms S6301 0.5548433 S7942 0.554378 P9313 0.50739753 S6803 0.47359636 Npun 0.4135452 A7120 0.4126833 Tery 0.34196815 PRO1375 0.3644214 S8102 0.594126 Gvi 0.6199786 TeBP1 0.5391793 PMED4 0.3079916 Cwat 0.37098223 A29413 0.4141176

22 Constancy of sequence characteristics - GC content - Codon frequencies - Dinucleotide frequencies DNA sequence

23 Constancy of sequence characteristics DNA sequence - GC content - Codon frequencies - Dinucleotide frequencies

24 Constancy of sequence characteristics DNA sequence - GC content - Codon frequencies - Dinucleotide frequencies

25 Constancy of sequence characteristics Karlin S (2001). Trends Microbiol 9:335-343

26 Edwards & Rohwer (2005) Bioinformatics and viral metagenomics How to identify genes' viruses? - GC content - Codon frequencies - Dinucleotide frequencies Virus #1 Virus #2 Virus #3 Virus #4 Virus #5 Virus #6... Viral fragment

27

28


Download ppt "This presentation uses animations and is best viewed as a slide show. To start the presentation, click Slide Show on the top tool bar and then View show."

Similar presentations


Ads by Google