Tutorial #2. Quiz next week Cover everything you’ve seen in the course so far Combination of True/False, definition, short answer, or some similar question.

Slides:



Advertisements
Similar presentations
Evolution of genomes.
Advertisements

Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
BLAST Sequence alignment, E-value & Extreme value distribution.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
EVOLUTIONARY CHANGE IN DNA SEQUENCES - usually too slow to monitor directly… … so use comparative analysis of 2 sequences which share a common ancestor.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Molecular Evolution Revised 29/12/06
Heuristic alignment algorithms and cost matrices
Protein Modules An Introduction to Bioinformatics.
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence alignment, E-value & Extreme value distribution
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Comparative Genomics of the Eukaryotes
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
An Introduction to Bioinformatics
BLAST What it does and what it means Steven Slater Adapted from pt.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Bacterial Genetics - Assignment and Genomics Exercise: Aims –To provide an overview of the development and.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Introduction to Phylogenetics
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Basic Local Alignment Search Tool BLAST Why Use BLAST?
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Using blast to study gene evolution – an example.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Sequence Alignment.
Construction of Substitution matrices
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Sequence similarity, BLAST alignments & multiple sequence alignments
Basics of Comparative Genomics
Linkage and Linkage Disequilibrium
Pipelines for Computational Analysis (Bioinformatics)
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Molecular Evolution.
Sequence alignment, Part 2
Basic Local Alignment Search Tool
Chapter 6 Clusters and Repeats.
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
1-month Practical Course Genome Analysis Iterative homology searching
Presentation transcript:

Tutorial #2

Quiz next week Cover everything you’ve seen in the course so far Combination of True/False, definition, short answer, or some similar question from the problem set

How to design a PCR primer? Primer length and sequence are of critical importance in designing the parameters of a successful amplification A simple formula for calculating the Tm Tm = 4(G + C) + 2(A + T) When designing a PCR primer, Tm is not the only thing, should also consider; the GC content, any secondary structure or hairpin loop

Example Design PCR primer to amplify IFI16 (interferon, gamma-inducible protein 16)

NCBI

Synonymous Vs Nonsynonymous When studying the evolutionary divergences of DNA sequence Synonymous = silent Nonsynonymous = amino acid altering The rates of these nucleotide substitution maybe used as a molecular clock for dating the evolutionary time of closely related species

Calculating Synonymous sites (s) and nonsynonymous sites (n) Each codon has 3 nucleotides, denote by f i (I = 1,2,3) Where s and n for a codon are given by s = ∑ 3 i=1 f i and n = (3-s) Ex. TTA (Leu)f 1 =1/3 (T→C) f 2 =0 f 3 =1/3 (A→G) Thus, s = 2/3 and n = 7/3 For DNA sequence of r codons, it will be s = ∑ r i=1 s i and n = (3r-s), where s i is the value of s for the ith codon

Calculation of s and n for 2 nucleotide differences between 2 codons Ex. GTT (Val) and GTA (Val) 1 synonymous difference Denote s d and n d the number of synonymous and nonsynonymous differences per codon, respectively s d = 1 n d = 0

Con ’ t Ex. TTT and GTA, 2 pathways to get there Pathway #1: TTT(Phe)↔GTT(Val)↔GTA(Val) Pathway #2: TTT(Phe)↔TTA(Leu)↔GTA(Val) Pathway 1 involve 1 synonymous and 1 nonsynonymous substitution Pathway 2 involve 2 nonsynonymous substitution s d = 1 synonymous substitution / 2 change state = 0.5 n d = 3 nonsysnonymous substitution / 2 change state =1.5 D in the problem set = proportion of synonymous or nonsynonymous differences, therefore, for this nonsynonymous site, the Dn would be 1 / 1.5 = Note that s d + n d is equal to the total number of nucleotide differences between the two DNA sequences compared

Sequence Alignment Every alignment will have a scoring system  Base change cost = 1  Gap cost = 2  Gap extension cost = 1 Ex.ACT GTT GCC AG - C - - GCT Score of this alignment would be 3 + 2x2 + 1 = 8 In this case, a higher score means a worst alignment

MLST - Methods Isolate multiple strains of species of interest PCR ~500bp regions of 4-20 housekeeping genes ( “ loci ” ) Sequence PCR products Assign “ allele numbers ” to each locus  Arbitrary, each # represents a different sequence

MLST - Methods Collate the information into a table  Row = isolate  Column = loci  Fill in allele numbers Locus A Locus B Locus C Isolate Isolate Isolate 3 312

MLST of a Halorubrum Population 36 isolates 4 housekeeping genes  atpB  ef-2  radA  secY 500bp PCR product Allelic profiles vary  Few identical pairs All loci polymorphic  8-15 alleles

Insights from the MLST Data - 1 Genetic diversity H = 1-Σx i 2 Overall genetic diversity = 0.69  Varied between ponds of different salinity 0.57 in 23% saline pond 0.83 in 36% saline pond Higher than E. coli diversity of 0.47 Higher than E. coli diversity of 0.47 >5x higher than eukaryotic diversity >5x higher than eukaryotic diversity How genetically diverse is the saltern Archaeal population?

Insights from the MLST Data - 2 Linkage disequilibrium calculator – mlst.net LD = Alleles are linked and are transferred together during recombination LE = Alleles are not linked and recombination scatters them randomly Halorubrum population is near linkage equilibrium Halorubrum population is near linkage equilibrium Suggests recombination is occurring Suggests recombination is occurring Is recombination occurring in the Archaea?

Nature Reviews Genetics 3; (2002); 2X? Tetraodon Nigroviridis

Phylogenetic tree Phylogenetics is the field of systematics that focuses on evolutionary relationship between organisms or genes/proteins (phylogeny) clade -- A monophyletic taxon taxon -- Any named group of organisms, not necessarily a clade. Human Mouse Fly A clade A node

A phylogenetic tree Human Mouse Fly A clade A node DACDAC A+B+C is less than D+B+C So the mouse Sequence is more related to fly than the human sequence is to fly in this example B

Tetraodon gene evolution Fourfold degenerate (4D) site substitution - a mesure of neutral nucleotide mutations  4D site = 3 rd base of codon free to change with no FX on AA  # of AA changes at these sites = neutral mutations Fish proteins have diverged faster vs. mammalian homologues Figure 3

Brief generalization of the papers Comparative genomics help identifying region of DNA that are shared between two different species and allows the transfer of information between both species in the common region. It can also detect regions that have gone through chromosomes rearrangement occurring in many different diseases. This information can be of different type.  1) Using one of the species it is possible to transfer annotation information that were not known in the other species,  2) identify region that are under selective pressure,  3) It is also possible to compare for examples regions that have gone through chromosomes rearrangement with annotation genes map to identify genes responsible for a particular disease

Homologs Have common origins but may or may not have common activity Orthologs – Homologs produced by speciation. They tend to have similar function Paralogs – Homologs produced by gene duplication. They tend to have differing function Xenologs – Homologs resulting from horizontal gene transfer between two organism

BLAST B asic L ocal A lignment S earch T ool Developed in 1990 and 1997 (S. Altschul) A heuristic method (Fast alignment method) for performing local alignments through searches of high scoring segment pairs (HSP’s) 1st to use statistics to predict significance of initial matches - saves on false leads Offers both sensitivity and speed

Looks for clusters of nearby or locally dense “similar or homologous” k-tuples Uses “look-up” tables to shorten search time Uses larger “word size” than FASTA to accelerate the search process Can generate “domain friendly” local alignments Fastest and most frequently used sequence alignment tool – BECAME THE STANDARD BLAST

Connecting HSP ’ s

Extreme Value Distribution Kmne - S is called Expect or E-value In BLAST, default E cutoff = 10 so P = If E is small then P is small Why does BLAST report an E-value instead of a p value?  E-values of 5 and 10 are easier to understand than P- values of and  However, note that when E < 0.01, P-values and E- value are nearly identical. P(x) = 1 - e -e -x =

Expect value Kmne - S = Expect or E-value What parameters does it depend on? - and K are two parameters – natural scales for search space size and scoring system, respectively   = lnq/p and K = (q-p) 2 /q ¨ p = probability of match (i.e. 0.05) ¨ q = probability of not match (i.e. 0.95) Then = 2.94 and K =0.85 p and q calculated from a “random sequence model” (Altschul, S.F. & Gish, W. (1996) "Local alignment statistics." Meth. Enzymol. 266: ) based on given subst. matrix and gap costs - m = length of sequence - n = length of database - S = score for given HSP

Expect value Expect value an intuitive value but…  Expect value changes as database changes  Expect value becomes zero quickly Alternative: bit score S' (bits) = [lambda * S (raw) - ln K] / ln 2  Independent of scoring system used - normalized  Larger value for more similar sequences, therefore useful in analyses of very similar sequences

Similarity by chance – the impact of sequence complexity MCDEFGHIKLAN…. High Complexity ACTGTCACTGAT…. Mid Complexity NNNNTTTTTNNN…. Low Complexity Low complexity sequences are more likely to appear similar by chance