Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 1 Biological Information.

Slides:



Advertisements
Similar presentations
Transcription and Translation
Advertisements

Do Now:.  TRANSCRIPTION: process that makes an RNA copy of DNA.  RNA is single-stranded, and T is replaced by U (A-U; G-C)  RNA polymerase makes RNA,
Nucleic Acids Not considered a nutrient macromolecule
Nucleic acids: Information Molecules
Chromosomes carry genetic information
KEY WORDS – CELLS, DNA, INFORMATION All living things are made from Deoxyribonucleic acid is abbreviated This molecule stores that helps cells carry.
Biological Information Flow
13.3: RNA and Gene Expression
Protein Synthesis. The DNA Code The order of bases along the DNA strand codes for the order in which amino acids are chemically joined together to form.
Introduction to Biological Sequences. Background: What is DNA? Deoxyribonucleic acid Blueprint that carries genetic information from one generation to.
GENE EXPRESSION.
Protein Synthesis: Transcription
DNA Replication and Protein Synthesis
Chapter 11: DNA and Genes.
DNA STRUCTURE page What are the monomers of the nucleic acids?
Making of Proteins: Transcription and Translation
Questions How many letters are in the genetic library?
Protein Synthesis. DNA acts like an "instruction manual“ – it provides all the information needed to function the actual work of translating the information.
Protein Synthesis. The DNA Code It is a universal code. The order of bases along the DNA strand codes for the order in which amino acids are chemically.
GENE EXPRESSION © 2007 Paul Billiet ODWSODWS. Two steps are required 1. Transcription The synthesis of mRNA use the gene on the DNA molecule as a template.
DNA, RNA, and Protein Synthesis
CHAPTER 12: GENETICS.
Chapter 10: RNA & Protein Synthesis Mrs. Cook Biology
12-3 RNA and Protein Synthesis
DNA Deoxyribonucleic Acid Structure and Function.
DNA and Genes Chapter DNA: The Molecule of Heredity Objectives Analyze the structure of DNA Determine how the structure of DNA enables it to.
DNA The Code of Life.
DNA and Translation Gene: section of DNA that creates a specific protein Approx 25,000 human genes Proteins are used to build cells and tissue Protein.
DNA Deoxyribonucleic Acid. DNA Structure What is DNA? The information that determines an organisms traits. DNA produces proteins which gives it “The.
DNA Structure DNA Replication RNA Transcription Translation.
Core Transcription and Translation
DNA Structure and Protein Synthesis (also known as Gene Expression)
Nucleic acids: the code of life The next class of biological molecules, nucleic acids, are the information-bearing “code of life”. Like proteins, nucleic.
RNA, Transcription, Translation
8-2 DNA Structure & Replication  DNA - Carries information about heredity on it genes.  Deoxyribonucleic Acid  belongs to the class of macromolecules.
DNA Replication Notes. DNA Replication DNA must be copied DNA must be copied The DNA molecule produces 2 IDENTICAL new complementary strands following.
Nucleic Acid Structure
Transcription Objectives: Trace the path of protein synthesis.
Protein Synthesis. The DNA Code The order of bases along the DNA strand codes for the order in which amino acids are chemically joined together to form.
Transcription and Translation of DNA How does DNA transmit information within the cell? PROTEINS! How do we get from DNA to protein??? The central dogma.
DNA: Replication, Transcription, and Translation.
Protein Synthesis. The genetic code This is the sequence of bases along the DNA molecule Read in 3 letter words (Triplet) Each triplet codes for a different.
RNA, Transcription, and the Genetic Code. RNA = ribonucleic acid -Nucleic acid similar to DNA but with several differences DNARNA Number of strands21.
DNA. Unless you have an identical twin, you, like the sisters in this picture will share some, but not all characteristics with family members.
DNA  RNA  Protein. Central Dogma Central Dogma – describes how information from DNA gets used to make proteins 3 processes: –1. Replication copies DNA.
DNA TranscriptionTranslation The Central Dogma TraitRNA Protein Molecular Genetics - From DNA to Trait RNA processing.
Introduction to molecular biology Data Mining Techniques.
12-3 RNA and Protein Synthesis Page 300. A. Introduction 1. Chromosomes are a threadlike structure of nucleic acids and protein found in the nucleus of.
Ch. 11: DNA Replication, Transcription, & Translation Mrs. Geist Biology, Fall Swansboro High School.
DNA to RNA to a Protein.
Biochemistry: Nucleic Acids.
Protein Synthesis.
Protein Synthesis.
Nucleic Acids.
From DNA to Proteins Transcription.
Do Now 2/12.
Copy 115.DNA replication W/s DNA to RNA notes
Protein Synthesis.
Protein Synthesis Chapter 10.
Transcription 8.4.
DNA Test Review.
DNA and Genes Chapter 11.
Recommended Reading(s): OpenStax: Biology Unit 3: Genetics
Do Now 2/12.
Transcription/ Translation Notes 16-17
An Overview of Gene Expression
Protein Synthesis.
Protein Synthesis Chapter 10.
Protein Synthesis.
Unit 3: Genetics Part 1: Genetic Informaiton
Presentation transcript:

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 1 Biological Information

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 2 The basics: DNA (deoxyribonucleic acid) stores information, codes for more DNA and for RNA (ribonucleic acid), which is the intermediate between long term storage in the nucleus and Proteins, which do most of the work in living cells DNA (deoxyribonucleic acid) stores information, codes for more DNA and for RNA (ribonucleic acid), which is the intermediate between long term storage in the nucleus and Proteins, which do most of the work in living cells

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 3 Alphabets and translation DNA and RNA use four letter alphabets (ACGT or ACGU); base pairing (A-T and G- C) in DNA double helix is the key to replication, and in DNA-RNA duplex is the key to transcription Proteins have a basic 20 letter alphabet corresponing to the amino acids. Since strands of DNA, RNA, and polypeptide are linear, unbranched polymers, they can be treated as character strings. DNA and RNA use four letter alphabets (ACGT or ACGU); base pairing (A-T and G- C) in DNA double helix is the key to replication, and in DNA-RNA duplex is the key to transcription Proteins have a basic 20 letter alphabet corresponing to the amino acids. Since strands of DNA, RNA, and polypeptide are linear, unbranched polymers, they can be treated as character strings.

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 4 Alphabets and translation Transcription of DNA to RNA is a simple 1:1 read- a strand of DNA produces its complement Translation of RNA to protein amino acid sequence is complex Transcription of DNA to RNA is a simple 1:1 read- a strand of DNA produces its complement Translation of RNA to protein amino acid sequence is complex

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 5 Alphabets and translation One base alone could only code for 4 different AA Two bases together could code for 4x4=16 different AA- close, but no cigar Three bases could code for 64 different AA- we only need 21 for the 20 AA used in proteins and a stop signal One base alone could only code for 4 different AA Two bases together could code for 4x4=16 different AA- close, but no cigar Three bases could code for 64 different AA- we only need 21 for the 20 AA used in proteins and a stop signal

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 6 Alphabets and translation In translation, groups of three bases (codons) are translated into amino acids Since there are 64 (4x4x4) codons, most AAs have multiple codons (serine has 6!). We say that the genetic code is degenerate. This isn’t a comment on its character. In translation, groups of three bases (codons) are translated into amino acids Since there are 64 (4x4x4) codons, most AAs have multiple codons (serine has 6!). We say that the genetic code is degenerate. This isn’t a comment on its character.

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 7 Alphabets and translation One consequence of the degeneracy of the genetic code is that you can translate nucleic acid sequences to AA sequences, but you can’t reverse translate to a unique nucleic acid sequence.

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 8 Information content How much information can you put into a character string? The computer age has provided the current generation of students with valuable intuition in this area If I can put 10,000 songs on one ipod, how many songs can I put on two ipods? How much information can you put into a character string? The computer age has provided the current generation of students with valuable intuition in this area If I can put 10,000 songs on one ipod, how many songs can I put on two ipods?

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 9 Information content In general, we expect the amount of information to increase linearly with the amount of space available to store it: songs with ipods, phone numbers with pages in the phone book, digital photos with memory cards.

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 10 Information content More precisely, we express information content in terms of bits (or bytes) of information. The information content of a string of binary characters is just the number of characters = 7 bits = 7 bits 10 = 2 bits (no shave or haircut) This assumes 1 and 0 are equally likely More precisely, we express information content in terms of bits (or bytes) of information. The information content of a string of binary characters is just the number of characters = 7 bits = 7 bits 10 = 2 bits (no shave or haircut) This assumes 1 and 0 are equally likely

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 11 Information content It should be obvious that the information content of a number is independent of how we express it – 999 should have the same significance written in binary as it does in base 10,

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 12 Information content In general, if the characters in the alphabet are equally probable we can express the information content of a character string as N log 2 M, where N is the number of characters in a sequence and M is the number of letters in the alphabet. For binary strings, there are only two characters so N log 2 M, = N.

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 13 Information content For nucleic acids, M = 4 (ACGT) so N log 2 M =2 N For proteins, M=20 (ACDEFGHIKLMNPQRSTVWY) so N log 2 M ~ 4.3 N For nucleic acids, M = 4 (ACGT) so N log 2 M =2 N For proteins, M=20 (ACDEFGHIKLMNPQRSTVWY) so N log 2 M ~ 4.3 N

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 14 Information content A protein sequence has more than twice the information content of a nucleic acid sequence of the same length. But since it takes 3 bases to code for a single AA, a protein sequence has only about.7 the information content of the DNA sequence that originally coded for it. A protein sequence has more than twice the information content of a nucleic acid sequence of the same length. But since it takes 3 bases to code for a single AA, a protein sequence has only about.7 the information content of the DNA sequence that originally coded for it.

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 15 Information content Suppose we translate a 15 base pair sequence into a five AA sequence. The information content of the nucleic acid sequence is just 2N=30 bits. The information content of the protein sequence is 5log 2 20 ( this is an upper bound assuming all AAs equally probable), or about 21.6 bits Almost 8 1 / 2 bits are lost to degeneracy. Suppose we translate a 15 base pair sequence into a five AA sequence. The information content of the nucleic acid sequence is just 2N=30 bits. The information content of the protein sequence is 5log 2 20 ( this is an upper bound assuming all AAs equally probable), or about 21.6 bits Almost 8 1 / 2 bits are lost to degeneracy.

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 16 Information and Entropy Entropy is a measure of the number of ways a system can exist. Example: the oversimplified 2 state molecule ______ B _______ A Entropy is a measure of the number of ways a system can exist. Example: the oversimplified 2 state molecule ______ B _______ A Molecule has two states, A and B In a large ensemble (sample) of molecules the populations of the states are Na and Nb

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 17 Information and Entropy The oversimplified 2 state molecule ______ B _______ A The oversimplified 2 state molecule ______ B _______ A If a photon with energy h can induce transitions between the states the energy difference between them is just  = h, and at temperature T the population ratio Nb/Na is e -  /kT, where K is the Boltzmann constant

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 18 Information and Entropy The oversimplified 2 state molecule: multiplicity ______ B _______ A The oversimplified 2 state molecule: multiplicity ______ B _______ A Now suppose that A consists of n substates and B of m substates. The ratio of the populations of any substate of B to any substate of A is e  /kT, so the ratio the populations of all the B states to A states is just n/m ( e -  /kT )

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 19 Information and Entropy The oversimplified 2 state molecule: free energy and entropy We can rearrange the expression n/m(e -  /kT ) using simple algebra to obtain the equivalent expression e -(D+kTlog(n/m)/kT. In the exponent, the term (D+kTln(n/m) has units of energy and is a free energy. Free energies in general determine equilibria. Ln(n/M) is an entropy term representing the difference in entropy between A and B (  S=Sb-Sa).

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 20 Information and Entropy Question: What has entropy got to do with information? Answer: Everything, because entropy is just a measure of the number of possible states. The entropy of a state is just the natural logarithm of the # of ways that state can exist. (That’s why it’s related to the degree of order: there are more ways of making a mess than of keeping things neat).

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 21 Information and Entropy Half a century ago Claude Shannon’s seminal work on information theory showed that the information content in a message could be expressed as an function we call the Shannon entropy. The basic idea is that the information content is the difference between the ln of the ways the message might read before we see it and the ln of the ways it might read after we read it. (Shannon was interested in errors as well as perfect reads.) Other people has similar ideas, (e.g., Norbert Weiner, who coined the term cybernetics) but Shannon got the details right.

Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 22 Information and Entropy The information content (in bits) of a string of N characters with M ‘letters’ in the alphabet is Nlog 2 M if characters are equally probable. More generally, information content can be written in terms of probabilities as –  log  P i, which looks worse than it is. Suppose that in an organism the CG content is 60%. The P i are.3 for C and G and.2 for A and T. Each C or G contributes –log 2 (.3) bits, and each A or T contributes –log 2 (.2) bits. The average information per position is –  P i log  P i ~1.96.