Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 598SS Probabilistic Methods in Biological Sequence Analysis Saurabh Sinha.

Similar presentations


Presentation on theme: "CS 598SS Probabilistic Methods in Biological Sequence Analysis Saurabh Sinha."— Presentation transcript:

1 CS 598SS Probabilistic Methods in Biological Sequence Analysis Saurabh Sinha

2 What is the course about? Bioinformatics / Computational Biology Tools for analyzing genomes Probabilistic methods

3 What is the course format? Research course Lectures by instructor Student presentations of research papers –1 or 2 paper(s) per student Research project & presentation –Typically, 2 students per project –30 mins presentation at end of course.

4 Grading Project: 40% Paper presentation: 25% Assignments and/or tests: 25% Participation: 10% Grade distribution

5 Expectations Programming skills (for the project) Basic exposure to probability theory Basic exposure to algorithms

6 What you can do at the end of the course Start working on research projects in bioinformatics: biological sequence analysis Use principled approaches, supported by probability theory, instead of ad hoc methods Join me as a graduate advisee ?

7 Administrative Details Instructor: –Saurabh Sinha –Room 2122, Siebel Center –Email: sinhas@uiuc.edusinhas@uiuc.edu Class hrs: Tue & Thurs, 2:00pm - 3:15pm, 1131SC CRN: 43781 Credits: 4 graduate hrs Welcome to sit in, if not taking for credit

8 Books Not required 1.Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids -- Durbin, Eddy, Krogh, Mitchison 2.Bioinformatics: The Machine Learning Approach -- Baldi, Brunak 3.Statistical Methods in Bioinformatics -- Ewens and Grant 4.Bioinformatics -- Polanski and Kimmel

9 Why study bioinformatics? Molecular biology is the new frontier of 21 st century science Computer science is the crown prince of 20 th century engineering Bioinformatics is the application and development of computer science with the goal of supporting molecular biology

10 Why study bioinformatics? Flood of data: several Giga (Tera?) bytes of sequence, and gene expression data. Noise in the data –Biological –Experimental Algorithms needed to make discoveries –Probabilistic methods –Need for efficiency

11 Why study bioinformatics? The big picture: –Human health and quality of life –Fundamental science Billions of dollars being spent –Health research gets the major chunk of the US Govt’s funds –Fundamental health research is at the molecular level –Molecular biology research increasingly a quantitative science

12 Why study bioinformatics? Recent issue of Science: top 25 questions >>What Is the Universe Made Of?>What is the Biological Basis of Consciousness?>Why Do Humans Have So Few Genes?>To What Extent Are Genetic Variation and Personal Health Linked?>Can the Laws of Physics Be Unified?>How Much Can Human Life Span Be Extended?>What Controls Organ Regeneration?>How Can a Skin Cell Become a Nerve Cell?>How Does a Single Somatic Cell Become a Whole Plant?>How Does Earth's Interior Work?>Are We Alone in the Universe?>How and Where Did Life on Earth Arise?>What Determines Species Diversity?>What Genetic Changes Made Us Uniquely Human?>How Are Memories Stored and Retrieved?>How Did Cooperative Behavior Evolve?>How Will Big Pictures Emerge from a Sea of Biological Data?>How Far Can We Push Chemical Self- Assembly?>What Are the Limits of Conventional Computing?>Can We Selectively Shut Off Immune Responses?>Do Deeper Principles Underlie Quantum Uncertainty and Nonlocality?>Is an Effective HIV Vaccine Feasible?>How Hot Will the Greenhouse World Be?>What Can Replace Cheap Oil -- and When?>Will Malthus Continue to Be Wrong?

13 Basic Molecular Biology

14 Life, Cells, Proteins The study of life  the study of cells Cells are born, do their job, duplicate, die –What is “their job”? –Break down nutrients, produce energy, produce required molecules All these processes controlled by proteins

15 Protein functions “Enzymes” (catalysts) –Control chemical reactions in cell Transfer of signals/molecules between and inside cells –E.g., sensing of environment Regulate production of other proteins

16 Protein molecule Protein is a sequence of amino-acids 20 possible amino acids The amino-acid sequence “folds” into a 3-D structure called protein

17 Protein Structure Protein DNA The DNA repair protein MutY (blue) bound to DNA (purple). PNAS cover, courtesy Amie Boal

18 DNA Deoxyribonucleic acid: a molecule that is involved in production of proteins Double helical structure (discovered by Watson, Crick, Wilkins & Franklin) Chromosomes are densely coiled and packed DNA

19 SOURCE: http://www.microbe.org/espanol/news/human_genome.asp Chromosome DNA

20 The DNA Molecule G -- C A -- T T -- A G -- C C -- G G -- C T -- A G -- C T -- A T -- A A -- T A -- T C -- G T -- A  Base = Nucleotide 5’ 3’

21 SRC:http://www.biologycorner.com/resources/DNA-RNA.gif Cell From DNA to Amino-acid sequence

22 From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded) Alphabet size = 4 (A,C,G,U) 3.mRNA  amino acid sequence Alphabet size = 20 4.Amino acid sequence “folds” into 3- dimensional molecule called protein

23 Central Dogma “Information” flows from DNA to RNA to Protein Why “information” ? The DNA in a cell has complete information of which proteins will be present in the cell

24 DNA and genes DNA is a very “long” molecule DNA in human has 3 billion base-pairs –String of 3 billion characters ! DNA harbors “genes” –A gene is a substring of the DNA string

25 Genes code for proteins DNA  mRNA  protein can actually be written as Gene  mRNA  protein A gene is typically few hundred base- pairs (bp) long

26 Transcription Process of making a single stranded mRNA using double stranded DNA as template

27 Step 1: From DNA to mRNA Transcription

28 Step 1: From DNA to mRNA Transcription

29 Translation Process of making an amino acid sequence from (single stranded) mRNA Each triplet of bases translates into one amino acid: each such triplet is called “codon” The translation is basically a table lookup

30

31 The Genetic Code SOURCE: http://www.bioscience.org/atlases/genecode/genecode.htm

32 Step 2: mRNA to Amino acid sequence Translation

33 Review so far Proteins: important molecules, amino acid sequences DNA: structure, base-pairing. Genes: substrings of DNA Gene --> mRNA (transcription) mRNA --> amino acid sequence (translation), genetic code.

34 Gene expression Process of making a protein from a gene as template Transcription, then translation Can be regulated

35 GENE ACAGTGA TRANSCRIPTION FACTOR PROTEIN Transcriptional regulation

36 GENE ACAGTGA TRANSCRIPTION FACTOR PROTEIN Transcriptional regulation

37 The importance of gene regulation

38 Genetic regulatory network controlling the development of the body plan of the sea urchin embryo Davidson et al., Science, 295(5560):1669-1678.

39 That was the “circuit” responsible for development of the sea urchin embryo Nodes = genes Switches = gene regulation Change the switches and the circuit changes Gene regulation significance: –Development of an organism –Functioning of the organism –Evolution of organisms

40 Genome The entire sequence of DNA in a cell All cells have the same genome –All cells came from repeated duplications starting from initial cell (zygote) Human genome is 99.9% identical among individuals Human genome is 3 billion base-pairs (bp) long

41 Genome features Genes Regulatory sequences The above two make up 5% of human genome What’s the rest doing? –We don’t know for sure “Annotating” the genome –Task of bioinformatics


Download ppt "CS 598SS Probabilistic Methods in Biological Sequence Analysis Saurabh Sinha."

Similar presentations


Ads by Google