Lecture 1 BNFO 240 Usman Roshan. Course overview Perl progamming language (and some Unix basics) Sequence alignment problem –Algorithm for exact pairwise.

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Structural bioinformatics
BNFO 602 Multiple sequence alignment Usman Roshan.
Sequence Similarity Searching Class 4 March 2010.
CIS786, Lecture 7 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Dynamic programming.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.
BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 18: Application-Driven Hardware Acceleration (4/4)
BNFO 602 Lecture 2 Usman Roshan. Sequence Alignment Widely used in bioinformatics Proteins and genes are of different lengths due to error in sequencing.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
BNFO 602 Lecture 1 Usman Roshan.
BNFO 240 Usman Roshan. Last time Traceback for alignment How to select the gap penalties? Benchmark alignments –Structural superimposition –BAliBASE.
BNFO 602, Lecture 3 Usman Roshan Some of the slides are based upon material by David Wishart of University.
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Sequence alignment.
Protein structure Friday, 10 February 2006 Introduction to Bioinformatics Brigham Young University DA McClellan
1. Primary Structure: Polypeptide chain Polypeptide chain Amino acid monomers Peptide linkages Figure 3.6 The Four Levels of Protein Structure.
Lecture 4 BNFO 235 Usman Roshan. IUPAC Nucleic Acid symbols.
Sequence Alignment III CIS 667 February 10, 2004.
BNFO 602 Multiple sequence alignment Usman Roshan.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
CIS786, Lecture 6 Usman Roshan Some of the slides are based upon material by David Wishart of University.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Lecture 1 BNFO 136 Usman Roshan. Course overview Pre-req: BNFO 135 or approval of instructor Python progamming language and Perl for continuing students.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Protein Structures.
DNA TEST STUDY GUIDE. 1. What is this a picture of? Nucleotides.
Sequencing a genome and Basic Sequence Alignment
Exploration Session Week 8: Computational Biology Melissa Winstanley: (based on slides by Martin Tompa,
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Proteins and DNA Chapter 3.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Intelligent Systems for Bioinformatics Michael J. Watts
Some Independent Study on Sequence Alignment — Lan Lin prepared for theory group meeting on July 16, 2003.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Molecules of Life II CHAPTER 3 Proteins Amino Acid Monomers Polypeptide (protein) Polymers Levels of Protein Structure Importance of Structure to Function.
D. NUCLEIC ACIDS 1.ARE MADE OF THE ELEMENTS C,H,O,N,P.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Sequencing a genome and Basic Sequence Alignment
Construction of Substitution Matrices
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Algorithms Algorithms for Molecular Biology CSCI Elizabeth White
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Protein Structure  The structure of proteins can be described at 4 levels – primary, secondary, tertiary and quaternary.  Primary structure  The sequence.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Dynamic programming.
Chromosomes. Chromosomes CCCChromosomes are strands of DNA bound to proteins EEEEach cell has two sets of chromosomes, one from the mother and.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
PROTEIN MODELLING Presented by Sadhana S.
Lecture 1 BNFO 601 Usman Roshan.
Protein Synthesis.
. Nonpolar (hydrophobic) Nonpolar (hydrophobic) Amino Acid Side Chains
Local alignment and BLAST
BNFO 602 Lecture 2 Usman Roshan.
There are four levels of structure in proteins
BNFO 602 Lecture 2 Usman Roshan.
Protein Structures.
DNA and RNA.
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Lecture 1 BNFO 240 Usman Roshan

Course overview Perl progamming language (and some Unix basics) Sequence alignment problem –Algorithm for exact pairwise alignment –Heuristics for exact multiple alignment –Computational complexity –Heuristics for pairwise alignment and BLAST, FASTA database search –Real world alignment problems –Substitution matrices Phylogeny reconstruction –Estimating distance matrices –Distance based phylogeny reconstruction ---- UPGMA and neighbor joining algorithms

Overview (contd) Wednesdays --- meet in GITC 2305 Fridays --- meet in PC Mall room number PC 36 Grade: 50% monthly programming assignment and 50% final exam Texts: –Introduction to Bioinformatics by Arthur Lesk –Beginning Perl for Bioinformatics by James Tisdall

DNA Sequence Evolution AAGACTT -3 mil yrs -2 mil yrs -1 mil yrs today AAGACTT T_GACTTAAGGCTT _GGGCTTTAGACCTTA_CACTT ACCTT (Cat) ACACTTC (Lion) TAGCCCTTA (Monkey) TAGGCCTT (Human) GGCTT (Mouse) T_GACTTAAGGCTT AAGACTT _GGGCTTTAGACCTTA_CACTT AAGGCTTT_GACTT AAGACTT TAGGCCTT (Human) TAGCCCTTA (Monkey) A_C_CTT (Cat) A_CACTTC (Lion) _G_GCTT (Mouse) _GGGCTTTAGACCTTA_CACTT AAGGCTTT_GACTT AAGACTT

Comparative bioinformatics What is the evolutionary relationship of a set of DNA sequences? What are the evolutionary conserved regions of a set of proteins? How evolutionary close is a pair of species? How similar are two DNA sequences? How similar are a set of DNA sequences?

Representing DNA in a format manipulatable by computers DNA is a double-helix molecule made up of four nucleotides: –Adenosine (A) –Cytosine (C) –Thymine (T) –Guanine (G) Since A (adenosine) always pairs with T (thymine) and C (cytosine) always pairs with G (guanine) knowing only one side of the ladder is enough We represent DNA as a sequence of letters where each letter could be A,C,G, or T. For example, for the helix shown here we would represent this as CAGT.

Transcription and translation

Amino acids Proteins are chains of amino acids. There are twenty different amino acids that chain in different ways to form different proteins. For example, FLLVALCCRFGH (this is how we could store it in a file) This sequence of amino acids folds to form a 3-D structure

Protein folding

The protein folding problem is to determine the 3-D protein structure from the sequence. Experimental techniques are very expensive. Computational are cheap but difficult to solve. By comparing sequences we can deduce the evolutionary conserved portions which are also functional (most of the time).

Protein structure Primary structure: sequence of amino acids. Secondary structure: parts of the chain organizes itself into alpha helices, beta sheets, and coils. Helices and sheets are usually evolutionarily conserved and can aid sequence alignment. Tertiary structure: 3-D structure of entire chain Quaternary structure: Complex of several chains

Key points DNA can be represented as strings consisting of four letters: A, C, G, and T. They could be very long, e.g. thousands and even millions of letters Proteins are also represented as strings of 20 letters (each letter is an amino acid). Their 3-D structure determines the function to a large extent.

Pairwise sequence alignment How to align two sequences?

Pairwise alignment

Dynamic programming Define V(i,j) to be the optimal pairwise alignment score between S 1..i and T 1..j (|S|=m, |T|=n)

Dynamic programming Time and space complexity is O(mn) Define V(i,j) to be the optimal pairwise alignment score between S 1..i and T 1..j (|S|=m, |T|=n)

Tabular computation of scores

Traceback to get alignment

How do we understand this dynamic programming algorithm? Let’s first look at some example alignments Let’s look at gaps. How do we know where to insert gaps Let’s look at the structure of an optimal alignment of two sequences x and y and how it relates optimal alignments of subsequences of x and y