B3- Olympic High School Bioinformatics

Slides:



Advertisements
Similar presentations
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Advertisements

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Measuring the degree of similarity: PAM and blosum Matrix
DNA sequences alignment measurement
DNA and Proteins In this guide you will be learning about DNA and proteins Presented by Garth Jensen Emerson Middle School A project from AMGEN workshop.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequencing a genome and Basic Sequence Alignment
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
© Wiley Publishing All Rights Reserved.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
A day 3/14/ writing prompts at end of the table in a pile! If it’s not there then it’s a zero 2. Replication quiz 2- take this time to review your.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Sequencing a genome and Basic Sequence Alignment
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogenetics.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
HW4: sites that look like transcription start sites Nucleotide histogram Background frequency Count matrix for translation start sites (-10 to 10) Frequency.
From DNA to Proteins Section 2.3 BC Science Probe 9 Pages
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Determining Sequence Relationships
Multiple sequence alignments with Clustal Omega Yu He 04/13/2016 Adapted from Mingchao Xie and Julie Thompson multiple sequence alignment ppt online.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogeny - based on whole genome data
Basics of Comparative Genomics
Transcription Translation
Pipelines for Computational Analysis (Bioinformatics)
Overview of Multiple Sequence Alignment Algorithms
In-Text Art, Ch. 16, p. 316 (1).
B3- Olympic High School Bioinformatics
5.4 Cladistics.
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Multiple Sequence Alignment
B3- Olympic High School Bioinformatics
There are four levels of structure in proteins
Sequence Based Analysis Tutorial
How to Use This Presentation
Evidence of Evolution Several types of information support Darwin’s theory of evolution. anatomy homologous, analogous, and vestigial.
Applying principles of computer science in a biological context
Unit Genomic sequencing
Basics of Comparative Genomics
STAAR Notebook 2.
MULTIPLE SEQUENCE ALIGNMENT
Presentation transcript:

B3- Olympic High School Bioinformatics Dr. Jennifer Weller May 2016 Sequence Analysis B3- Olympic High School Bioinformatics How do you go from the organism (the American Chestnut tree) to a sample you can collect (the leaf), to the part of the leaf of interest (the chloroplast) to the chloroplast DNA (from sequencing) to putting the sequence together in order (a map) and labeling the parts (annotation) so you can pick out one particular region that you want to barcode? Part of this involves the wet-lab process that we do in the Summer Science camp, where we use wet-lab tools to purify the DNA sample and carry out the sequencing. Part of it is a computational process – it still requires tools and knowing how to use them. One of the things you can do is complete the circle to go back to the lab with better (more focused) assays. 11/15/2018 Dr. Weller B3 Olympic HS

Topics Finding patterns & comparing sequences Visualizing multiple sequence comparisons 11/15/2018 Weller UNCC

Finding a pattern in DNA DNA sequence does not come with Capital letters, periods and spaces to help identify ‘words’ or patterns of interest. A matrix is set up to scan across the DNA string, with the DNA string set up as the columns.

You only get a score when one of the ‘1s’ is in the desired position. T A T A A A

Comparing sequences We can find a pattern by using a sliding window, as we did above, but we needed prior knowledge – the pattern of interest. Data-driven approach: we can compare sequences and look for common patterns – if a pattern is present in many organisms, or many places in one organism, it may have some functional importance. The sequences should be in register (aligned) with each other or they will look more different than they are: If the sequence is a coding sequence (makes a protein) you can check whether the nucleotide change creates an amino acid change, and check whether the amino acid change looks like it will change the function of the protein in some way. Some changes that don’t change the amino acid in the protein have still been shown to affect the turn-over or regulation of the RNA: the so-called ‘silent’ or synonymous mutations are not necessarily without effect. I could transform this encoding to the protein encoding if I know the register – where the start of the protein is, and whether there are introns (interruptions) in this gene. TCAAAGATTAAGCCATGCATGTCAAAGTACAAGCCCCATTAAGGTGAAACCGCGAATGGCT GACCTCAAAGATAAAGCCAAGCATGTCAATGTACAAGGCCCATTTAGGTGAAACCGGGAATGG

Multiple Sequence Alignment With two sequences to compare you can slide them along (inserting gaps as necessary) to get the best possible alignment. With multiple sequences there you have to Pick one as the reference Align the rest of the sequences Nice extras: provide visual cues showing similarities and differences The DNA Subway Blue line has the MUSCLE program integrated in it. http://www.ebi.ac.uk/Tools/msa/muscle/

Annotating the MSA A representation of a set of sequences, in which equivalent residues (e.g. functional or structural) are aligned in columns Conserved residues Conservation profile Secondary structure

Scoring the comparison You can count the similarities (or differences) between each pair of sequences, and keep track of them in a table (called a matrix in this case). Below is a comparison of the hemoglobin a and b chains in a number of organisms with scores derived by: (1-(#identical residues/# compared residues) - you can multiply by 100 to get percent. Each sequence compared to itself is 0 difference (shown as -). - .17 - .59 .60 - .59 .59 .13 - .77 .77 .75 .75 - .81 .82 .73 .74 .80 - .87 .86 .86 .88 .93 .90 - Hbb_human Hbb_horse Hba_human Hba_horse Myg_phyca Glb5_petma Lgb2_lupla 1 2 3 4 5 6 7

Analogy and Homology When organisms share a trait due to common ancestry, it is called a homologous trait. Vertebrate animals have skeletons because the ancestor of all vertebrates had a skeleton and passed on that trait. The trait can be morphological, such as the bones in the front limb, or molecular, such as the RNA that organizes ribosomes. When organisms share a trait for reasons other than relatedness it is called an analogous trait. Functions may occur as adaptations to a common environmental challenges, giving rise to analogous traits (convergent evolution). The ‘eye’ and ‘wings’ have arisen independently a number of times.

http://www.nku.edu/~whitsonma/Bio120LSite/Bio120LReviews/Bio120LHomologyRev.html

Visualization - Sequence Variation and Homology Trees are a common way to represent the distance between two elements You can define different rules for using the distance matrix that determines the branching of the tree.

Progressive multiple alignment Sequential branching / Guide tree construction Sequential branching Guide tree Hbb_human Hbb_horse Hba_human Hba_horse Myg_phyca Hbb_horse Hba_human Hbb_human Glb5_petma Hba_horse Glb5_petma Lgb2_lupla Myg_phyca Lgb2_lupla - Join the 2 closest sequences - Recalculate distances and join the 2 closest sequences or nodes - Step 3 is repeated until all sequences are joined

Lab: Using the Blue Line for sequence comparisons Log into the Blue Line We are going to use the chloroplast gene coding for a protein used in the photon-to- sugar conversion pathway called Ribulose bis-phosphate carboxylase or RubisCO - the protein has two subunits, we will be looking at the gene for the large subunit (rbcL) for our comparisons. Select CSH HS Oct 2010 for an example of such a gene sequence. Name the project (your initials, CP1 for example)

Add more sequences from a database You can use one of the sequences as a search string against a database of sequences like GenBank – this lets you add sequences from other species than those you sampled yourself.

Sequence data can be imported and inspected Why would you want to eliminate poor quality base calls? By pairing forward and reverse reads you can Get better quality assurance on base calls that overlap Get longer reads where forward and reverse do not overlap Why do you want sequences you are comparing to be the same length? Should the consensus be perfect? What does it mean if it is an exact match? Build a agreed-upon sequence – a consensus – a region of the same length and mostly the same sequence

Screen shots for saving data

Quality scores If the information is available, a quality score cut-off can be imposed – low-quality bases will be greyed out. These can be trimmed If there are paired reads that overlap and they disagree at one position, if one is below the quality cut-off score then the other call will be used in the consensus sequence for that sample.

Multiple Alignments - MUSCLE

Visualizing the comparisons Phylip – Neighbor Joining Phylip – Maximum Likelihood