An intuitive introduction to information theory Ivo Grosse Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben Bioinformatics Centre.

Slides:



Advertisements
Similar presentations
Biotechnolgy. Basic Molecular Biology Core of biotechnology.
Advertisements

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Welcome Each of You to My Molecular Biology Class.
DNA: Deoxyribose Nucleic Acid The Genetic Material.
THERE IS NO GENERAL METHOD OR FORMULA WHICH IS ‘CORRECT’. YOU CAN PROBABLY IGNORE SOME OF THIS ADVICE AND STILL WRITE A GOOD ESSAY… BUT FOLLOWING IT MAY.
BIOLOGY 12 Genetics: An Introduction. A little motivational video:
Chapter 6 Table of Contents Section 1 What Does DNA Look Like?
Chapter 6 Information Theory
Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 1 Biological Information.
Course information To reach me: Barry Cohen GITC 3800 T 4:00-5:30 Th 3:00-4:30
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Molecular Information Theory Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael Partensky.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Bioinformatics Original definition (1979 by Paulien Hogeweg): “application of information technology and computer science to the field of molecular biology”
E. coli Genetics: From Genes to Genomes. Figure 4.1a: Gregor Mendel © National Library of Medicine.
Origins of Modern Genetics ► Jean Baptiste Lamarck (French, early 19 th c.): “The Inheritance of Acquired Characteristics” ► Charles Darwin (English, 1859):
Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information.
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Basic Concepts in Information Theory
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
6.4 Studying Genetic Changes Student text p
DNA DEOXYRIBONUCLEIC ACID. DNA IS THE GENETIC MATERIAL Only understood hundreds of years after Mendel How do we know? Read about it before the test to.
Processes of Evolution & Genetics Part 2. Learning Objectives 1.Describe the importance of DNA replication 2.Describe the mechanisms of inheritance at.
Darwin’s Tea Party The Biological Revolution: DNA and Modern Genetics Winter 2009.
Journal Entry: What is DNA? What are the subunits of DNA? Objectives:
CSE 6406: Bioinformatics Algorithms. Course Outline
Unit 1: DNA and the Genome DNA and the Genome Unit 1: CFE Higher Biology June – October Unit Assessment – before October break.
§4 Continuous source and Gaussian channel
Outline of Modern Genetics ► Jean Baptiste Lamarck (French, early 19 th c.): “The Inheritance of Acquired Characteristics” ► Charles Darwin (English, 1859):
Chapter 11: The Data Survey Supplemental Material Jussi Ahola Laboratory of Computer and Information Science.
(Important to algorithm analysis )
Workshop 8 Computers and Biology. Neural Networks  Warren McCulloch and Walter Pitts – artificial neural networks  Warren McCulloch Interview.
Date DNA. ✤ DNA stands for deoxyribonucleic acid ✤ DNA carries all the genetic information of living organisms.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
Information Theory Basics What is information theory? A way to quantify information A lot of the theory comes from two worlds Channel.
Copyright © by Holt, Rinehart and Winston. All rights reserved. ResourcesChapter menu How to Use This Presentation To View the presentation as a slideshow.
Preview Section 1 What Does DNA Look Like? Section 2 How DNA Works
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Bioinformatics lectures at Rice University Li Zhang Lecture 11: Networks and integrative genomic analysis-3 Genomic data
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
Genetics Part 2 Notes:. 1) BEFORE, You Learned: The function of chromosomes. The difference between DNA, Chromosome, and Gene The role chromosomes play.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
What do you know about DNA? T. Trimpe 2008
Biotechnolgy. Basic Molecular Biology Core of biotechnology.
Genes and Inheritance. What is DNA? Chromosomes are made up of DNA coiled tightly around proteins called histones. Chromosomes are made up of DNA coiled.
The Structure of DNA. DNADNA The blueprint of life (instructions for all living things). D= “deoxyribose” N= “nucleic” A= “acid” DNA = Deoxyribonucleic.
REVIEW ______________________ contains all the genetic instructions to create all the cells in your body. What Does DNA stand For? ________ -part of a.
DNA and the Code of Life Biology 11: Section 6.1.
Introduction to molecular biology Data Mining Techniques.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
DNA function and structure. History Francis Crick and James Watson first described the structure of DNA in They received the Nobel Prize for this.
(Important to algorithm analysis )
What is DNA short for?.
Algorithms for Biological Sequence Analysis
Chapter 6 Table of Contents Section 1 What Does DNA Look Like?
(Important to algorithm analysis )
History of Genetics/ INTRO TO DNA
7th Grade Science Biology (DNA) Clinton County Middle School
Science 9:Unit A Biological Diversity Lesson DNA
A Brief History What is molecular biology?
History of DNA Biology Mrs. Harper 2/7/18.
Unit 2 LE4 The Language of Heredity
DNA Basics What do you know about DNA?
Genetics & Evolution Today’s Goal I can complete a Punnett Square
Heredity is the passing of traits from parents to offspring.
August 7, 2015 Bell Work: True or False: All living organisms have DNA. Objective: The student will be able to… Explain his or her knowledge of DNA. Define.
Jeopardy Final Jeopardy DNA People Protein Synthesis Mutations Random
Presentation transcript:

An intuitive introduction to information theory Ivo Grosse Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben Bioinformatics Centre Gatersleben-Halle

2 Outline  Why information theory?  An intuitive introduction

3 History of biology St. Thomas Monastry, Brno

4 Genetics Gregor Mendel 1822 – Mendel‘s laws Foundation of Genetics Ca. 1900: Biology becomes a quantitative science

5 50 years later … 1953 James Watson & Francis Crick

6 50 years later … 1953

7

8 DNA Watson & Crick 1953 Double helix structure of DNA 1953: Biology becomes a molecular science

– 2003 … 50 years of revolutionary discoveries

Goals:  Identify all of the ca genes  Identify all of the ca base pairs  Store all information in databases  Develop new software for data analysis

Human Genome Project officially finished 2003: Biology becomes an information science

– 2053 … biology = information science

– 2053 … biology = information science SystemsBiology

15 What is information?  Many intuitive definitions  Most of them wrong  One clean definition since 1948  Requires 3 steps -Entropy -Conditional entropy -Mutual information

16 Before starting with entropy … Who is the father of information theory? Who is this? Claude Shannon 1916 – 2001 A Mathematical Theory of Communication. Bell System Technical Journal, 27, 379–423 & 623–656, 1948

17 Before starting with entropy … Who is the grandfather of information theory? Simon bar Kochba Ca. 100 – 135 Jewish guerilla fighter against Roman Empire (132 – 135)

18 Entropy  Given a text composed from an alphabet of 32 letters (each letter equally probable)  Person A chooses a letter X (randomly)  Person B wants to know this letter  B may ask only binary questions  Question: how many binary questions must B ask in order to learn which letter X was chosen by A  Answer: entropy H(X)  Here: H(X) = 5 bit

19 Conditional entropy (1)  The sky is blu_  How many binary questions?  5?  No!  Why?  What’s wrong?  The context tells us “something” about the missing letter X

20 Conditional entropy (2)  Given a text composed from an alphabet of 32 letters (each letter equally probable)  Person A chooses a letter X (randomly)  Person B wants to know this letter  B may ask only binary questions  A may tell B the letter Y preceding X  E.g.  L_  Q_  Question: how many binary questions must B ask in order to learn which letter X was chosen by A  Answer: conditional entropy H(X|Y)

21 Conditional entropy (3)  H(X|Y) <= H(X)  Clear!  In worst case – namely if B ignores all “information” in Y about X – B needs H(X) binary questions  Under no circumstances should B need more than H(X) binary questions  Knowledge of Y cannot increase the number of binary questions  Knowledge can never harm! (mathematical statement, perhaps not true in real life )

22 Mutual information (1)  Compare two situations:  I: learn X without knowing Y  II: learn X with knowing Y  How many binary questions in case of I?  H(X)  How many binary questions in case of II?  H(X|Y)  Question:How many binary questions could B save in case of II?  Question:How many binary questions could B save by knowing Y?  Answer:I(X;Y) = H(X) – H(X|Y)  I(X;Y) = information in Y about X

23 Mutual information (2)  H(X|Y) = 0  In worst case – namely if B ignores all information in Y about X or if there is no information in Y about X – then I(X;Y) = 0  Information in Y about X can never be negative  Knowledge can never harm! (mathematical statement, perhaps not true in real life )

24 Mutual information (3)  Example 1: random sequence composed of A, C, G, T (equally probable)  I(X;Y) = ?  H(X) = 2 bit  H(X|Y) = 2 bit  I(X;Y) = H(Y) – H(X|Y) = 0 bit  Example 2: deterministic sequence … ACGT ACGT ACGT ACGT …  I(X;Y) = ?  H(X) = 2 bit  H(X|Y) = 0 bit  I(X;Y)=H(Y) – H(X|Y)=2 bit

25 Mutual information (4)  I(X;Y) = I(Y;X)  Always! For any X and any Y!  Information in Y about X = information in X about Y  Examples:  How much information is there in the amino acid sequence about the secondary structure? How much information is there in the secondary structure about the amino acid sequence?  How much information is there in the expression profile about the function of the gene? How much information is there in the function of the gene about the expression profile?  Mutual information

26 Summary  Entropy  Conditional entropy  Mutual information  There is no such thing as information content  Information not defined for a single variable  2 random variables needed to talk about information  Information in Y about X  I(X;Y) = I(Y;X)  info in Y about X = info in X about Y  I(X;Y) >= 0  information never negative  knowledge cannot harm  I(X;Y) = 0 if and only ifX and Y statistically independent  I(X;Y) > 0 otherwise