Intelligent Systems for Bioinformatics Michael J. Watts

Slides:



Advertisements
Similar presentations
The Central Dogma of Genetics
Advertisements

RNA and Protein Synthesis
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Transcription & Translation Biology 6(C). Learning Objectives Describe how DNA is used to make protein Explain process of transcription Explain process.
DNA as the genetic code.
10-2: RNA and 10-3: Protein Synthesis
CHAPTER 17 FROM GENE TO PROTEIN Copyright © 2002 Pearson Education, Inc., publishing as Benjamin Cummings Section B: The Synthesis and Processing of RNA.
Lesson Overview 13.1 RNA.
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
Transcription and Translation
Protein Synthesis (Eukaryotes)
Protein Synthesis Chapter 13. Protein Synthesis  How does your DNA eventually lead to your different phenotypes (hair color, eye color, etc)
RNA and Protein Synthesis
VII RNA and Protein Synthesis
CENTRAL DOGMA OF BIOLOGY. Transcription & Translation How do we make sense of the DNA message? Genotype to Phenotype.
RNA Structure and Transcription Mrs. MacWilliams Academic Biology.
Transcription and Translation
Chapter 13.1 and 13.2 RNA, Ribosomes, and Protein Synthesis
By: Anne Russell, Madelyn Stroder, Hannah Black, And Bailey Mills.
RNA and Protein Synthesis
RNA & Protein Synthesis.
RNA & Protein Synthesis. I. DNA to Genes A. We now know how the double helix is replicated but we still don’t know how it is then transformed into genes.
What is the job of p53? What does a cell need to build p53? Or any other protein?
From Gene To Protein Chapter 17. From Gene to Protein The “Central Dogma of Molecular Biology” is DNA  RNA  protein Meaning that our DNA codes our RNA.
How Genes Work Ch. 12.
Transcription & Translation Chapter 17 (in brief) Biology – Campbell Reece.
PROTEIN SYNTHESIS The Blueprint of Life: From DNA to Protein.
Structure of RNA  Structure  Nucleic acid made up of nucleotides  composed of Ribose, phosphate group, and nitrogenous base  Nitrogenous bases  Adenine.
Lesson Overview Lesson OverviewFermentation Lesson Overview 13.1 RNA.
What is central dogma? From DNA to Protein
Lesson Overview Lesson OverviewFermentation Lesson Overview 13.1 RNA.
Structure and functions of RNA. RNA is single stranded, contains uracil instead of thymine and ribose instead of deoxyribose sugar. mRNA carries a copy.
Chapter 13 –RNA and Protein Synthesis
Leaving Cert Biology Genetics – section 2.5 Genetics ( RNA), 2.5.5,
RNA, transcription & translation Unit 1 – Human Cells.
DNA in the Cell Stored in Number of Chromosomes (24 in Human Genome) Tightly coiled threads of DNA and Associated Proteins: Chromatin 3 billion bp in Human.
PROTEIN SYNTHESIS TRANSCRIPTION AND TRANSLATION. TRANSLATING THE GENETIC CODE ■GENES: CODED DNA INSTRUCTIONS THAT CONTROL THE PRODUCTION OF PROTEINS WITHIN.
Transcription and Translation of DNA How does DNA transmit information within the cell? PROTEINS! How do we get from DNA to protein??? The central dogma.
Introduction to Molecular Biology and Genomics BMI/CS 776 Mark Craven January 2002.
Dna Transcription & Translation Presented by: Lulu Tesha Rahshad Cobb.
RNA A nucleic acid which acts as a messenger between DNA and the ribosomes to carry out the process of making proteins from amino acids. Structure is similar.
RNA, Transcription, and the Genetic Code. RNA = ribonucleic acid -Nucleic acid similar to DNA but with several differences DNARNA Number of strands21.
Transcription and The Genetic Code From DNA to RNA.
Placed on the same page as your notes Warm-up pg. 48 Complete the complementary strand of DNA A T G A C G A C T Diagram 1 A T G A C G A C T T A A C T G.
RNA and Protein Synthesis. RNA Structure n Like DNA- Nucleic acid- composed of a long chain of nucleotides (5-carbon sugar + phosphate group + 4 different.
Protein Synthesis RNA, Transcription, and Translation.
Higher Human Biology Unit 1 Human Cells KEY AREA 3: Gene Expression.
Gene Expression DNA, RNA, and Protein Synthesis. Gene Expression Genes contain messages that determine traits. The process of expressing those genes includes.
CH 12.3 RNA & Protein Synthesis. Genes are coded DNA instructions that control the production of proteins within the cell…
Macromolecular and Physical Data Michael J. Watts 1.
NOTES CHPT. 10.
Chapter 13 From DNA to Proteins
From Gene to Protein pp Discover Biology: C15 From Gene to Protein pp
RNA & Protein synthesis
Higher Biology Gene Expression Mr G R Davidson.
Protein Synthesis Genetics.
Protein Synthesis in Detail
12-3 RNA and Protein Synthesis
DNA and the Genome Key Area 3b Transcription.
What is RNA? Do Now: What is RNA made of?
RNA and Transcription DNA RNA PROTEIN.
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
Lesson Overview 13.1 RNA
12-3 RNA and Protein Synthesis
Copyright Pearson Prentice Hall
Copyright Pearson Prentice Hall
12-3 RNA and Protein Synthesis
Copyright Pearson Prentice Hall
Copyright Pearson Prentice Hall
So how do we get from DNA to Protein?
Presentation transcript:

Intelligent Systems for Bioinformatics Michael J. Watts

Lecture Outline What is bioinformatics? What is DNA? How is it processed in cells? What is DNA data? How is DNA data represented? How can IS be applied to DNA data?

What is Bioinformatics? Computational analysis of biological sequences Many different kinds of biological data exist Amount of data increasing at an exponential rate Need automated methods of processing it

What is DNA? DeoxyriboNucleicAcid storage medium of genetic information in higher organisms encapsulated in cell nucleus of eukaryotic (multi-cellular) organisms consists of long chains of nucleotides Two chains in double helix structure

What is DNA? Four bases: o Adenine A o Guanine G o Thymine T o Cytosine C sequence of bases encodes genetic information

How is DNA Processed in Cells? “Central Dogma of Molecular Genetics” Describes the flow of information from DNA to protein

DNA Processing DNA is transcribed into messenger RNA (mRNA) o RNA is a less stable relative of DNA o replaces Thymine (T) with Uracil (U) RNA strand read by ribosome to produce protein (translation)

Transcription DNA split into single strands RNA polymerase binds to DNA strand at promoter site RNA strand formed from DNA base complements o A -> U G -> C o C -> G T -> A

Transcription mRNA cut into sections Coding (exon) portions of RNA strands spliced together Non-coding (intron) segments discarded

RNA Translation Triplets of RNA bases (codons) translated to amino acid (residue) o The genetic code o amino acids linked to form protein Protein folds according to electrostatic forces Shape of protein determines it’s function

DNA Data Many different kinds of DNA data and DNA related data in existence DNA promoter data RNA splice junction data The “genetic code” Protein sequences and configurations

Representing Biological Data Basic sequence data string of letters o A, C, G, T (DNA) o A,C,D,E, etc for Amino Acids Can be represented in several ways Substitute arbitrary numbers for letters o e.g. A=1, C=2, G=3, T=4 o doesn’t reflect some properties of the bases o problems dealing with uncertainty o Theoretical problems (measurement theory)

Representing Biological Data Binary representation o orthogonal encoding o each base can be one of four (or 20) o represent each base by four bits  i.e. A = , C = , etc. o handles uncertainty better  e.g. A or C = o still ignores properties of the bases

Representing Biological Data Electron Ion Interaction Potential (EIIP) o Measure of the chemical properties of DNA bases o Preserves information about the properties of the bases o specific to DNA o Problems with uncertainty remain

Representing Biological Data Charge, hydrophobicities o biophysical properties of amino acids o been tried in the past, but never really successful o specific to proteins

Applications of IS to Biological Data IS can be applied to each stage of the protein synthesis process identifying DNA promoter sites identifying RNA splice sites modelling the genetic code predicting protein configuration

Identifying DNA Promoter Sites Promoters are the start of coding regions of DNA DNA coding regions have known termination points o typically AATAAA ANN can be trained to classify a region of DNA as promoter or non-promoter

RNA Splice Site Prediction Splicing is the second step in removing unused information Sites can be either intron - exon (non-coding - coding) or exon - intron (coding - non-coding) MLPs and Knowledge based neural networks (KBNN) have both been applied

RNA Splice Site Prediction Data set consists of sequences of 60 nucleotides each sequence represents either o intron - exon junction o exon - intro junction o non-junction region binary representation of bases used Performance exceeds statistical methods

Modelling the Genetic Code 3 bases in each codon (64 combinations) 20 natural amino acids plus STOP codon o genetic code is degenerate Train a MLP to model the genetic code Problems with training indicate properties of the genetic code Internal representation matches known biochemical groupings

Protein Configuration Prediction Proteins are formed from chains of amino acids Proteins have primary, secondary, tertiary and quaternary structures Primary structure is its amino acid sequence Electrostatic forces folds it into secondary, tertiary or even quaternary structures

Protein Configuration Prediction Final structure determines it’s biological function Secondary structure consists of sub structures o alpha helix o pleated (or beta) sheet o rest is coil Can use ANN to predict the structure from amino acid sequence

Secondary Structure Prediction Modelled with an ANN in 1988 Used an MLP to predict secondary structure classes Input was a ‘window’ of 13 residues Predict which SS centre of the window is in

Secondary Structure Prediction Each residue represented by a 21-bit vector o 1 bit for each residue type o +1 for ‘spacer’ bit Total input size of 273 bits 3 output nodes o helix, sheet, coil

Secondary Structure Prediction Sequence profiles o Find a set of known proteins that are similar (homologous) to the unknown protein o Align the homologues with the unknown so that the maximum number of residues match o Use the alignment to construct a profile of residue frequencies

Secondary Structure Prediction Use the profile and other measures as inputs to the ANN This gives the best prediction results so far (>86% accuracy) Used in the PhD prediction server o based prediction server

Protein Configuration Prediction Genetic algorithms used to predict tertiary structure fitness based on the energy required to maintain each structure o GA aims to minimise energy Alternative is to use brute force search o very time consuming

Signal Peptide Cleavage Proteins that are exported from a cell are “marked” o “signal” molecule at end of protein o cleaved off before export ANN can be used to predict where the “signal” section ends Organism specific, to some extent

Signal Peptide Cleavage Prediction is based on structure, rather than sequence Statistical methods have problems modelling this kind of thing ANN have shown some promise o Still work to be done

Conclusion Bioinformatics is a very large field Filled with many challenges o and lots of $$$$! HUGE amount of data exists and being continuously produced Problems with processing it all Intelligent systems can be used to do this