Introduction to Bioinformatics.

Slides:



Advertisements
Similar presentations
IMPRS workshop Comparative Genomics 18 th -21 st of February 2013 Lecture 4 Positive selection.
Advertisements

Evolution of genomes.
1 Number of substitutions between two protein- coding genes Dan Graur.
HIV and its lifecycle Sources: Wikipedia, HIV is a retrovirus (enveloped viruses possessing an RNA genome,
31.6 Diseases that Weaken the Immune System When the immune system is weakened, the body cannot fight off disease.
Retroviruses And retroposons
Treating HIV with Azidothymidine (AZT) A Design by Jeanine Nasser.
CCR5 : and HIV Immunity Gene Variation Works for and Against HIV Ashley Alexis & Hilda Hernandez.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Bioe 109 Evolution Summer 2009 Lecture 1: Part II Evolution in action: the HIV virus.
HIV/AIDS as a Microcosm for the Study of Evolution.
1 Detecting selection using phylogeny. 2 Evaluation of prediction methods  Comparing our results to experimentally verified sites Positive (hit)Negative.
From population genetics to variation among species: Computing the rate of fixations.
Genetic Mutations Recombinant DNA Viruses Chapter 22 Nucleic Acids and Protein Synthesis.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Positive selection A new allele (mutant) confers some increase in the fitness of the organism Selection acts to favour this allele Also called adaptive.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
1 Functional prediction in proteins (purifying and positive selection)
Computational Biology, Part 4 Protein Coding Regions Robert F. Murphy Copyright  All rights reserved.
Introduction to Molecular Biology. G-C and A-T pairing.
Genetica per Scienze Naturali a.a prof S. Presciuttini Mutation Rates Ultimately, the source of genetic variation observed among individuals in.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Viruses.
Genetic Mutations A mutation alters the nucleotide sequence in DNA, which can cause a change in the amino acid structure of the corresponding protein,
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
In the deterministic model, the time till fixation depends on the selective advantage, but fixation is guaranteed.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
HIV and AIDS Human Immunodeficiency Virus (HIV) causes Acquired Immune Deficiency Syndrome (AIDS)
More on translation. How DNA codes proteins The primary structure of each protein (the sequence of amino acids in the polypeptide chains that make up.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Microbial Models I: Genetics of Viruses and Bacteria 7 November, 2005 Text Chapter 18.
Using Molecular Information to Investigate the Evolutionary Origin of the HIV Virus.
Evolution is the unifying concept of biology. Two Central Themes of Biology Adaptation - How and in what ways do organisms function and become better.
Antivirals. Structure of a Virus all viruses- o have a central core of DNA or RNA o surrounded by a coat(capsid) of packed protein units(capsomers) UNLIKE.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Patterns of selection for or against amino acid change among different CD4 T-cell count progressor groups Michael Pina, Salomon Garcia Journal Club Presentation.
Estimating evolutionary parameters for Neisseria meningitidis Based on the Czech MLST dataset.
Retroviruses (Chap. 15, p.308) and Gene Regulation (Chap. 14) HIV (human immunodeficiency virus)
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Selectionist view: allele substitution and polymorphism
Chapter 47 Section 3 pp HIV AND AIDS. VACCINES  Vaccines artificially produce acquired immunity  Vaccine- substance that contains antigen.
NEW TOPIC: MOLECULAR EVOLUTION.
The Central Dogma The Central Dogma traces the flow of genetic information DNA Replication, Transcription, and Translation take place in human cells as.
Microbial Models I: Genetics of Viruses and Bacteria 8 November, 2004 Text Chapter 18.
SC.912.L Mutations 2. Genetic Recombination (sexual reproduction)
Evolution of individual genes in humans
Modelling evolution Gil McVean Department of Statistics TC A G.
Infectious Diseases Unit 4 Lesson 4 plan. Do now What are the two ways a pathogen causes damage?
Evolution and transmission in HIV Steve Paterson Review; Rambaut 2004 Nature Reviews Genetics 5: ‘The causes and consequences of HIV evolution’
LBA ProtPars. LBA Prot Dist no Gamma and no alignment.
ORF Calling.
Data analyses Course code: ZOO560 Week 3
NATURAL SELECTION AT THE MOLECULAR LEVEL
Evolution of gene function
Virus Basics - part I Viruses are genetic parasites that are smaller than living cells. They are much more complex than molecules, but clearly not alive,
The neutral theory of molecular evolution
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Pipelines for Computational Analysis (Bioinformatics)
Distances.
Schematic of Eukaryotic Protein-Coding Locus
More on translation.
Pedir alineamiento múltiple
Terminology HIV AIDS Acquired Human Immune Immunodeficiency Deficiency
Chapter 17 Nucleic Acids and Protein Synthesis
Presentation transcript:

Introduction to Bioinformatics

Introduction to Bioinformatics. LECTURE 6: Natural selection at the molecular basis * Chapter 6: Fighting HIV

6.1 Acquired Immune Deficiency Syndrome (AIDS) Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS 6.1 Acquired Immune Deficiency Syndrome (AIDS) * First noticed in 1979 as peculiar disease in US * Only 1981 recognized as transmissible disease: AIDS * Infectious agent: HIV (Human immunodeficiency Virus) * Still not curable, more than 20 M victims, expensive medication (eg AZT) to keep the virus in check * How does HIV manage to evade our attempts to destroy it?

HIV virus

Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS HIV is a retrovirus A retrovirus is an enveloped virus possessing a RNA genome, and replicate via a DNA intermediate. Retroviruses rely on the enzyme reverse-transcriptase to perform the reverse transcription of its genome from RNA into DNA, which can then be integrated into the host's genome with an integrase enzyme.

Scanning electron micrograph of HIV-1 budding from lymphocyte.

Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

THE WORLD Mark Newman (http://www-personal.umich.edu/~mejn/)

PEOPLE LIVING WITH HIV/AIDS Mark Newman (http://www-personal.umich.edu/~mejn/)

6.2 Evolution and natural selection Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS 6.2 Evolution and natural selection 1859: Charles Darwin: on the origin of species by means of natural selection. At the molecular level: natural selection : * removes deleterious mutations: purifying or negative selection * Promotes spread of advantageous mutation: positive selection

6.3 HIV and the human immune system Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS 6.3 HIV and the human immune system * HIV has a 9.5 Kb RNA genome - no DNA!!! * HIV is a retro-virus: RNA  DNA  virus * HIV recognizes helper T-cells of the human immune system * Infected T-cells have viral proteins sticking out that can be recognized by the immune system * Short reproduction span: 1.5 days to reproduce * RNA  High error rate

Introduction to Bioinformatics 6.3: HIV and the human immune system Fast reproduction + High error rate = FAST EVOLUTION Evolutionary arms race between human immune system and HIV

6.4 Quantifying natural selection on DNA sequences Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS 6.4 Quantifying natural selection on DNA sequences * Mutations arise in the germ-line of one single individual and eventually become fixed in the population * We observe fixed mutations as differences between individuals * Most fixed mutations are neutral: genetic drift * Some 80-90% of the non-neutral mutations are detrimental to the organismal function. * A very small fraction of mutations is advantageous – but this is the engine for evolution.

* Synonymous and non-synonymous mutations. Introduction to Bioinformatics 6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES * How to measure whether mutations are neutral, deleterious, or advantageous? * Experimentally very difficult: short-lived simple organisms, and large populations (typical a virus) * Alternative: count number of mutations that can change the protein and those that don’t * Synonymous and non-synonymous mutations.

Remember the translation from nucleotides to aminoacids Introduction to Bioinformatics 6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES Remember the translation from nucleotides to aminoacids (read from centre outwards)

* Non-synonymous mutations do not Introduction to Bioinformatics 6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES * Synonymous mutation: the new codon translates for the same amino-acid, example: GTT (Val) → GTA (Val). * Non-synonymous mutations do not * Mutations in the first position are sometimes synonymous (5%) * Mutations in the second position are never synonymous * Mutations in the third position are mostly synonymous

* Almost all synonymous mutations are neutral. Introduction to Bioinformatics 6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES * Almost all synonymous mutations are neutral. * A priori, there are many more non-synonymous mutations possible than synonymous. * In most genes 70% of the mutations are non-synonymous * KA: #non-synonymous substitutions per non-synonymous site * KS: #synonymous substitutions per synonymous site

Introduction to Bioinformatics 6 Introduction to Bioinformatics 6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES Motoo Kimura (1977): Comparison of the non-synonymous to the synonymous substitutions in a gene tells us about the strength and form of the natural selection, i.e.: the ratio KA / KS. Reasoning: * Advantageous mutations are very rare * Deleterious mutations will ‘not’ spread through a population * Therefore, most mutations are neutral Strong negative selection → Few non-synonymous substitutions

* f0 = fraction of non-synonymous mutations that are neutral. Introduction to Bioinformatics 6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES * f0 = fraction of non-synonymous mutations that are neutral. * v = mutation rate * # non-synonymous mutations after time t : KA = v f0 t * # synonymous mutations after time t : KS = v t * KA / KS = f0 * Strong negative selection: f0 is small thus KA / KS < 1 * If KA / KS is > 1 this is evidence for advantageous non-synonymous mutations

* Then after time t : KA = v(f0 + α)t * and: KA / KS = f0 + α Introduction to Bioinformatics 6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES * Define: α = fraction of non-synonymous mutations that are advantageous * Then after time t : KA = v(f0 + α)t * and: KA / KS = f0 + α * Thus KA / KS is gauge for the natural selection on genes * negative selection dominates: KA / KS < 1 * positive selection dominates: KA / KS > 1 * But averaged over the gene!

6.5 Estimating KA/KS How to determine KA/KS? Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS 6.5 Estimating KA/KS How to determine KA/KS? Simplest way: just count and compare the number of synonymous and non-synonymous sites and ditto differences between two aligned strings Correct for multiple substitutions (e.g. Jukes-Cantor) Thus obtain a normalized ratio

Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS 6.5 Estimating KA/KS Based upon this idea the algorithm of Masatoshi Nei and Takashi Gojobori (1986): Assume that rate of transitions and transversions is the same There is no bias towards codon usage (i.e. no information on the ensuing protein)

Introduction to Bioinformatics 6.5 ESTIMATING KA/KS Nei-Gojobori algorithm * Consider two aligned homologous sequences without gaps s1 and s2 * Sc = #synonymous sites between s1 and s2 * Ac = #non-synonymous sites between s1 and s2 * Sd = #synonymous differences between s1 and s2 * Ad = #non-synonymous differences between s1 and s2

Introduction to Bioinformatics 6.5 ESTIMATING KA/KS Nei-Gojobori algorithm * As the two sequences s1 and s2 are aligned there should be a correspondence between their codons. NOTE: point mutations only act on nucleotides and not on codons but here we analyse whether a mutation results in different aminoacids

Introduction to Bioinformatics 6.5 ESTIMATING KA/KS Nei-Gojobori algorithm STEP 1: Count A and S sites

Introduction to Bioinformatics 6.5 NEI-GOJOBORI ALGORITHM STEP 1: Count A and S sites Example: Consider the alignment : TTT TTA This is – say – the k-th codon of a sequence.

Introduction to Bioinformatics 6.5: NEI-GOJOBORI ALGORITHM Now define: sc(ck) = #synonymous sites in this codon ac(ck) = 1 - sc(ck) = #non-synonymous sites in this codon fi : fraction of changes in at i-th position of codon that result in a synonymous change (i=1,2,3) Then: sc(ck) = ∑ fi and: ac(ck) = 3 - sc(ck) = 3 - ∑ fi

Introduction to Bioinformatics 6.5: NEI-GOJOBORI ALGORITHM In our example: Codon: TTA codes for: Leucine The 6 synonyms for Leucine (table 2.2 chapter 2, p. 27): CTA CTG CTC CTT TTA TTG f1 : 1 (ATA(-),GTA(-),CTA(+) from 3 changes, so: 1/3 f2 : 0 (TAA(-),TGA(-),TCA(-)) from 3 changes, so: 0/3 f3 : 1 (TTG(+),TTC(-),TTT(-)) from 3 changes, so: 1/3 So: sc(ck) = ∑ fi = 2/3 ac(ck) = 3 - sc(ck) = 3 - ∑ fi = 7/3

Introduction to Bioinformatics 6.5: NEI-GOJOBORI ALGORITHM For a DNA sequence of r codons: Sc = ∑k=1:r sc(ck) Ac = 3r - Sc For multiple sequences: average these quantities Note: do not include the STOP codon

Introduction to Bioinformatics 6.5: NEI-GOJOBORI ALGORITHM STEP 2: Count A and S differences

Introduction to Bioinformatics 6.5: NEI-GOJOBORI ALGORITHM Now define: sd(ck) = #synonymous differences in this codon ad(ck) = 1 - sd(ck) = #non-synonymous differences Example: sequence 1: GTT (Val) sequence 2: GTA (Val) there is only 1 difference and it is synonymous, so: sd = 1 and ad = 0

Introduction to Bioinformatics 6.5: NEI-GOJOBORI ALGORITHM Multiple nucleotide differences between two codons: If there are n differences between two codons (n=0,1,2,3) then there are n! pathways from the first to the second codon Example: sequence 1: TTT (Phe) sequence 2: GTA (Val) the two possible pathways are : pathway 1 : TTT (Phe) ↔ GTT (Val) ↔ GTA (Val) pathway 2 : TTT (Phe) ↔ TTA (Leu) ↔ GTA (Val)

Introduction to Bioinformatics 6.5: NEI-GOJOBORI ALGORITHM Example (Continued): the two possible pathways are : pathway 1 : TTT (Phe) ↔ GTT (Val) ↔ GTA (Val) pathway 2 : TTT (Phe) ↔ TTA (Leu) ↔ GTA (Val) Pathway 1 has: 1 non-syn and 1 syn substitution Pathway 2 has: 2 non-syn and 0 syn substitutions Assume that both pathways occur with same probability Therefore: sd = 1 syn / 2 pathways = 0.5 ad = 3 non-syns / 2 pathways = 1.5

Introduction to Bioinformatics 6.5: NEI-GOJOBORI ALGORITHM For a codon with n differences: * Consider all n! pathways of n point-mutations * Evaluate sd and ad as above: * Average over all paths with equal weights * The total number of syn and non-syn differences is: Sd = ∑k=1:r sd(ck) Ad = ∑k=1:r ad(ck) Note: Sd + Ad is the total number of differences between the two sequences

Introduction to Bioinformatics 6.5: NEI-GOJOBORI ALGORITHM STEP 3: Compute KA and KS

Introduction to Bioinformatics 6.5: NEI-GOJOBORI ALGORITHM * Approximate the proportion of synonymous (ds) and non-synonymous differences by: and * Use the Jukes-Cantor correction to find the number of substitutions: For both ds and da to obtain KS and KA.

Introduction to Bioinformatics 6.5: NEI-GOJOBORI ALGORITHM SUMMARY of Nei-Gojobori algorithm: see box on page 105 of the book Remark: the algorithm is linear in the size of the sequences

6.6 Case study: natural selection and the HIV genome Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS 6.6 Case study: natural selection and the HIV genome * HIV is a fast evolving virus * HIV is a different kind of virus and has RNA and no DNA * An analysis of KA/KS over a gene is not so informative as it averages over positive and negative selection * Sliding window plot gives information on smaller scale of evolution pressure.

Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

6.6 Case study: natural selection and the HIV genome Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS 6.6 Case study: natural selection and the HIV genome * STEP 1: ORF finding

Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS HIV-I genome

6.6 Case study: natural selection and the HIV genome Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS 6.6 Case study: natural selection and the HIV genome * STEP 1: ORF finding * STEP 2: Nei-Gojobori to find high KA/KS ratios with sliding window plot.

HIV epitopes: the ENV gene Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS HIV epitopes: the ENV gene An epitope is the part of a macromolecule that is recognized by the immune system, specifically by antibodies. ENV: Envelope and docking: strong selection pressure from human immune system

Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

HIV epitopes: the GAG polyprotein Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS HIV epitopes: the GAG polyprotein 1500 bp : viral core Strong selection pressure from human immune system

Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS Visualisation of the fast evolution of the HIV virus with a phylogenetic tree

END of LECTURE 6