C OMPUTATIONAL BIOLOGY. O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms.

Slides:



Advertisements
Similar presentations
The Structure of DNA Reproduction occurs as a series of cell divisions that start within the nuclei of cells. Chromosomes, which can be seen with a microscope,
Advertisements

Protein Structure and Physics. What I will talk about today… -Outline protein synthesis and explain the basic steps involved. -Go over the Chemistry of.
Regents Biology Nucleic acids: Information molecules.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Profiles for Sequences
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Sequence similarity (II). Schedule Mar 23midterm assignedalignment Mar 30midterm dueprot struct/drugs April 6teams assignedprot struct/drugs April 13RNA.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
What IS DNA anyway? Do Now: 1.pass up the lab! 2.Read “why do enzymes have an optimal temperature” 3.Why do you think enzymes have an optimal pH also?
Protein Modules An Introduction to Bioinformatics.
Course information To reach me: Barry Cohen GITC 4301 W 4:00-5:30 F 4:45-5:55 Web site,
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Chromosomes carry genetic information
DNA and Amino Acids Molecular Structure Lecture 3.
PROTEIN SYNTHESIS. DNA RNA Protein Scientists call this the: Central Dogma of Biology!
Central Dogma of Biology
Exploration Session Week 8: Computational Biology Melissa Winstanley: (based on slides by Martin Tompa,
An Introduction to Bioinformatics
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
CSE 6406: Bioinformatics Algorithms. Course Outline
Human Genome Project by: Amanda Mosello. What is the Human Genome Project? created in 1990, by the National Institutes of Health and the US Department.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
National 5 Biology Course Notes Part 4 : DNA and production of
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
DNA Deoxyribose Nucleic Acid. DNA (deoxyribonucleic acid) Genetic Information in the form of DNA is passed from parent to offspring. Genes are the code.
Chapter 11 DNA and GENES. DNA: The Molecule of Heredity DNA, the genetic material of organisms, is composed of four kinds nucleotides. A DNA molecule.
Construction of Substitution Matrices
Chapter 3 Computational Molecular Biology Michael Smith
Regents Biology Nucleic acids: Information molecules.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Condor: BLAST Rob Quick Open Science Grid Indiana University.
Chapter 7 - Sequence patterns1 Chapter 7 – Sequence patterns (first part) We want a signature for a protein sequence family. The signature should ideally.
Why are nucleic acids considered organic compounds? Complete Enzyme Handout!!! –
Construction of Substitution matrices
Have Your DNA and Eat It Too I will be able to describe the structure of the DNA molecule I will be able to explain the rules of base pairing I will understand.
DNA Sequencing Sean Downes. DNA – Sequencing History Walter Fiers at the University of Ghent (Ghent, Belgium), between 1972 and ,Walter Gilbert.
DNA. Unless you have an identical twin, you, like the sisters in this picture will share some, but not all characteristics with family members.
. Sequence Alignment Author:- Aya Osama Supervision:- Dr.Noha khalifa.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
DNA, RNA & Protein Synthesis. A. DNA and the Genetic Code 1. DNA controls the production of proteins by the order of the nucleotides.
Four Levels of Protein Structure Amino acids Primary structure.
Molecular Genetics Chromosome Structure  DNA coils around histones to form nucleosomes, which coil to form chromatin fibers.  The chromatin fibers supercoil.
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
DNA, RNA & Protein Synthesis. A. DNA and the Genetic Code 1. DNA controls the production of proteins by the order of the nucleotides.
SC.912.L.16.3 DNA Replication. – During DNA replication, a double-stranded DNA molecule divides into two single strands. New nucleotides bond to each.
Protein Synthesis The formation of proteins based on information in DNA and carried out by RNA. (Gene expression) Flow of Genetic Information: DNA “unzips”
Nucleic Acids Objective:
Data-intensive Computing: Case Study Area 1: Bioinformatics
DNA The Blueprint of Life.
3.11 Proteins are essential to the structures and activities of life
H.B.2A.1 Construct explanations of how the structures of carbohydrates, lipids, proteins, and nucleic acids (including DNA and RNA) are related.
Information molecules
How does genetic information become traits we can observe?
DNA and the Production of Proteins
Bioinformatics Vicki & Joe.
Unit 5 Biology Notes DNA Objective 1: Describe the structure of DNA. (shape, parts of a nucleotide, and location in the eukaryotic cell)
Genetics: From Genes to Genomes
The Study of Biological Information
DNA.
Information molecules
DNA.
Jeopardy Final Jeopardy Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 $100
BC Science Connections 10
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

C OMPUTATIONAL BIOLOGY

O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms

DEFINITION Computational Biology encompasses all computational methods and theories applicable to molecular biology and areas of computer based techniques for solving biological problems.

PROTIENS Building blocks of living organism Large molecule that is composed of sequences of amino acids There are 20 amino acids which are divided into classes hydrophobic(h-phob) hydrophillic(h-phil) polar(pos,neg)

Amino acid SymClassAmino Acid SymClass AlanineAh-phobLeucineLh-phob ArginineRposLysineKpos Asparagi ne Nh-phillMetheioni ne Mh-phob Aspartic acid DnegPhenylala nine Fh-phob CysterineCh-phillProlinePh-phob Glutamin e Qh-phillSerineSh-phill Glutamic acid EnegThreonin e Th-phill GlycineGh-phobTryptoph an Wh-phob HistidineHposTyrosineYh-phill IsoleucineIh-probValineVh-prob

DNA Blueprint of living organisms DNA is composed of two strands hold by a weak hydrogen bond Each strand is a sequence of nucleotides DNA has four bases which are classified as two chemical types BaseSymbolType AdenineAPurine ThymineTPurine CytosineCPyrimidine GuanineGPyrimidine

DNA DOUBLE HELIX

RNA RNA is chemically very similar to DNA There are two important differences  Four bases present in RNA are adenine(A) guanine(G) cystosine(C) uracil(U)  RNA nucleotides contain a different sugar molecule(ribose)

G ENETICS AND EVOLTION Mutation Natural selection Genetic drift

S EQUENCE MATCHING PROBLEM Matching DNA,RNA, or Protein sequence between a diseased organism and a healthy organism Proteins are longer and DNA strands are even longer We match them by breaking them in to shorter subsequences Breaking and matching is done by notion of alignment.

S EQUENCE MATCHING EXAMPLE Consider two amino acid sequences: ACCTGAGAG ACGTGGCAG sequence alignment A C C T G A G – A C A C G T G – G C A C

F INITE STATE MACHINES IN BLAST It is used to find out which of the sequences in a database are related to the new given sequence using BLAST The BLAST system is a three step process 1. Examine the query string and select set of substrings of length w (between 4 and 20) which are good for producing matches 2. Build a DFSM that uses set of substrings and find the sequences with the highest local matches in the database 3. Examine the matches found in step2 and try to build a longer matching sequences

R EGULAR EXPRESSIONS SPECIFY PROTEIN MOTIF Aligning collection of related proteins we can define a motif Example: E S G H D T Y Y N K N R M D T T T T T S W Q S R G S D T T T P D M T A G P T T W R N T Once an motif is defined we can search for the occurrences of it in other protein sequence by using regular expressions

H MM FOR SEQUENCE MATCHING HMM’s are used when sequences become fairly diverse We can capture the variations among the members of the family and the probabilities associated with them So by using HMM’s we can find the best alignment between two sequences and from which family does a given new sequence belongs to

HMM profile is given by M = (K,O,π,A,B)  K is a set of n states, one for each position in the sequence  O is the output alphabet  Π contains the initial state probabilities  A contains the transition probabilities  B contains the output probabilities

E XAMPLE OF HMM DESCRIBING PROTEIN SEQUENCE FAMILY

R NA SEQUENCE MATCHING AND SECONDARY STRUCTURE PREDICTION USING THE TOOLS OF CONTEXT - FREE LANGUAGES In RNA a change to a single nucleotide in a stem region could completely alter the molecules shape and its function So an change in the stem must be matched by a corresponding change in the paired nucleotide Context free languages are used describe these nested dependencies and secondary structure

EXAMPLE

C OMPLEXITY OF ALGORITHMS USED IN COMPUTATIONAL BIOLOGY Approaches to many of the problems described here are computational like breaking up of large protein and DNA molecules into substrings NP-hard Conversion to decision problem SHOERTEST-SUPERSTRING( : S is a set of strings and there exists some superstring T such that every element of S is a substring of T and T has length less than or equal to K) – NP-complete

REFERENCE Automata, computability, and complexity|Theory and Applications [book] by Elaine Rich.

Thank you