Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.

Slides:



Advertisements
Similar presentations
MNW leerlijn Bioinformatics Bioinformatics & Systems Biology Faculty of Sciences & Faculty of Earth and Life Sciences Jaap Heringa – 12 sep 2011.
Advertisements

CHAPTER 11.1 GENES ARE MADE OF DNA.
Prokaryote Gene Expression Section 1 Overview of RNA Function
Classical and Modern Genetics.  “Genetics”: study of how biological information is carried from one generation to the next –Classical Laws of inheritance.
Bioinformatics Master Course II: DNA/Protein structure-function analysis and prediction Lecture 12: DNA/RNA structure Centre for Integrative Bioinformatics.
Structure and Function of DNA Ch. 13. DNA Encodes hereditary information. Located in the nucleus of a eukaryotic cell. Each chromosome is a macromolecule.
Ulf Schmitz, Introduction to molecular and cell biology1 Bioinformatics Introduction to molecular and cell biology Ulf Schmitz
Introduction to bioinformatics Lecture 2 Genes and Genomes.
Introduction to bioinformatics Lecture 2 Genes and Genomes.
“INTRODUCTION TO BIOINFORMATICS” by (Aqsad). What is Bioinformatics? Bioinformatics = Biology + Information Biology is becoming an information science.
1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands.
ECE 501 Introduction to BME
1-Month Practical Master Course Genome Analysis (Integrative Bioinformatics & Genomics) Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije.
(A) Protein-protein interaction and (B) Nucleic Acid Structure Lecture 19: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I.
DNA/Protein structure-function analysis and prediction
LECTURE 5: DNA, RNA & PROTEINS
DNA/Protein structure-function analysis and prediction Lecture 11: DNA/RNA structure.
Prepared with lots of help from friends... Metsada Pasmanik-Chor, Zohar Yakhini and NUMEROUS WEB RESOURCES. BioInformatics / Computational Biology Introduction.
DNA and RNA. I. DNA Structure Double Helix In the early 1950s, American James Watson and Britain Francis Crick determined that DNA is in the shape of.
Introduction to Bioinformatics Lecture 20: Sequencing genomes.
Introduction to Molecular Biology. G-C and A-T pairing.
From Gene to Protein Lecture Notes Biol 100 – K.Marr
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
GENE EXPRESSION.
Nature and Action of the Gene
Chapter 10 Molecular Biology of the Gene. Information transfer is from DNA  RNA  protein Replication What is it? Where does it occur? REPLICATION Copying.
Chapter 12 DNA and RNA. What is DNA again? Deoxyribonucleic acid Long double-stranded molecule of nucleotides Stores genetic code that is transferred.
Chapter 22 and GHW#12 Questions Nucleic Acid. Nucleic acids A nucleic acid is a polymer in which the monomer units are nucleotides. There are two Types.
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
Molecular Biology (Foundation Block) The central dogma of molecular biology Nucleotide chemistry DNA, RNA and chromosome structure DNA replication Gene.
NUCLEIC ACIDS AND PROTEIN SYNTHESIS. QUESTION 1 DNA.
COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering.
AP Biology Ch. 17 From Gene to Protein.
Copyright © 2007 Pearson Education Inc., publishing as Pearson Benjamin Cummings The Genetic Code The genetic code is the set of rules relating nucleotide.
1 TRANSCRIPTION AND TRANSLATION. 2 Central Dogma of Gene Expression.
From Gene to Protein A.P. Biology. Regulatory sites Promoter (RNA polymerase binding site) Start transcription DNA strand Stop transcription Typical Gene.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
Protein Synthesis Biology 11 preAP Overview and Application.
Molecular Biology I-II The central dogma of molecular biology Nucleotide chemistry DNA, RNA and Chromosome Structure DNA Replication Gene Expression Transcription.
REVIEW. Protein Synthesis AT-A-GLANCE Translation.
Transcription and mRNA Modification
Genes and Genomic Datasets. DNA compositional biases Base composition of genomes: E. coli: 25% A, 25% C, 25% G, 25% T P. falciparum (Malaria parasite):
Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Nucleic Acids and Protein Synthesis 10 – 1 DNA 10 – 2 RNA 10 – 3 Protein Synthesis.
Structure and functions of RNA. RNA is single stranded, contains uracil instead of thymine and ribose instead of deoxyribose sugar. mRNA carries a copy.
Nucleic Acids: Cell Overview and Core Topics. Outline I.Cellular Overview II.Anatomy of the Nucleic Acids 1.Building blocks 2.Structure (DNA, RNA) III.Looking.
DNA mRNA Transcription Chapter 8 The Central Dogma of Molecular Biology Cell Polypeptide (protein) Translation Ribosome.
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
LOGO Course lecturer : Jasmin Šutković Organic Chemistry – FALL 2015 Lecture 10 Nucleic acids and protein synthesis.
Gene Expression Gene: contains the recipe for a protein 1. is a specific region of DNA on a chromosome 2. codes for a specific mRNA.
PM703 Practical Biotechnology (2015). Bioinformatics Lab Learn the DNA language Material by Dr. Ramy K. Aziz.
From DNA to Proteins Chapter 13. Central Dogma DNA RNA Protein.
Structure and Function of DNA DNA Replication and Protein Synthesis.
Transcription and The Genetic Code From DNA to RNA.
Protein Synthesis RNA, Transcription, and Translation.
Gene Expression DNA, RNA, and Protein Synthesis. Gene Expression Genes contain messages that determine traits. The process of expressing those genes includes.
Unit 1: DNA and the Genome Structure and function of RNA.
RNA MODIFICATION Eukaryotic mRNA molecules are modified before they exit the nucleus.
Molecular Biology - I Dr. Sumbul Fatma Clinical Chemistry Unit Department of Pathology.
DNA, RNA and Protein.
RNA and Protein Synthesis
CONTINUITY AND CHANGE.
DNA By: Mr. Kauffman.
DNA and RNA.
Gene architecture and sequence annotation
PROTEIN SYNTHESIS.
DNA to protein DNA, transcription, translation
DNA, RNA, & Proteins Chapter 13.
Fundamentals of Protein Structure
LECTURE 5: DNA, RNA & PROTEINS
Presentation transcript:

Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E

Organisational Course website: or click on (>teaching >Introduction to Bioinformatics) Course book: Bioinformatics and Molecular Evolution by Paul G. Higgs and Teresa K. Attwood (Blackwell Publishing), 2005, ISBN (Pbk) Lots of information about Bioinformatics can be found on the web.

.....acctc ctgtgcaaga acatgaaaca nctgtggttc tcccagatgg gtcctgtccc aggtgcacct gcaggagtcg ggcccaggac tggggaagcc tccagagctc aaaaccccac ttggtgacac aactcacaca tgcccacggt gcccagagcc caaatcttgt gacacacctc ccccgtgccc acggtgccca gagcccaaat cttgtgacac acctccccca tgcccacggt gcccagagcc caaatcttgt gacacacctc ccccgtgccc ccggtgccca gcacctgaac tcttgggagg accgtcagtc ttcctcttcc ccccaaaacc caaggatacc cttatgattt cccggacccc tgaggtcacg tgcgtggtgg tggacgtgag ccacgaagac ccnnnngtcc agttcaagtg gtacgtggac ggcgtggagg tgcataatgc caagacaaag ctgcgggagg agcagtacaa cagcacgttc cgtgtggtca gcgtcctcac cgtcctgcac caggactggc tgaacggcaa ggagtacaag tgcaaggtct ccaacaaagc aaccaagtca gcctgacctg cctggtcaaa ggcttctacc ccagcgacat cgccgtggag tgggagagca atgggcagcc ggagaacaac tacaacacca cgcctcccat gctggactcc gacggctcct tcttcctcta cagcaagctc accgtggaca agagcaggtg gcagcagggg aacatcttct catgctccgt gatgcatgag gctctgcaca accgctacac gcagaagagc ctctc..... DNA sequence

Genome size OrganismNumber of base pairs  X-174 virus5,386 Epstein Bar Virus172,282 Mycoplasma genitalium580,000 Hemophilus Influenza1.8  10 6 Yeast (S. Cerevisiae)12.1  10 6 Human 3.2  10 9 Wheat16  10 9 Lilium longiflorum 90  10 9 Salamander100  10 9 Amoeba dubia670  10 9

Four DNA nucleotide building blocks G-C is more strongly hydrogen-bonded than A-T

A gene codes for a protein Protein mRNA DNA transcription translation CCTGAGCCAACTATTGATGAA PEPTIDEPEPTIDE CCUGAGCCAACUAUUGAUGAA

Central Dogma of Molecular Biology ReplicationDNA Transcription mRNA Translation Protein Transcription is carried out by RNA polymerase (II) Translation is performed on ribosomes Replication is carried out by DNA polymerase Reverse transcriptase copies RNA into DNA Transcription + Translation = Expression

But DNA can also be transcribed into non-coding RNA …  tRNA (transfer): transfer of amino acids to the ribosome during protein synthesis.  rRNA (ribosomal): essential component of the ribosomes (complex with rProteins).  snRNA (small nuclear): mainly involved in RNA-splicing (removal of introns). snRNPs.  snoRNA (small nucleolar): involved in chemical modifi-cations of ribosomal RNAs and other RNA genes. snoRNPs.  SRP RNA (signal recognition particle): form RNA-protein complex involved in mRNA secretion.  Further: microRNA, eRNA, gRNA, tmRNA etc.

Eukaryotes have spliced genes …  Promoter: involved in transcription initiation (TF/RNApol-binding sites)  TSS: transcription start site  UTRs: un-translated regions (important for translational control)  Exons will be spliced together by removal of the Introns  Poly-adenylation site important for transcription termination (but also: mRNA stability, export mRNA from nucleus etc.)

DNA makes mRNA makes Protein

DNA makes RNA makes Protein … yet another picture to appreciate the above statement

Some facts about human genes  There are about – genes in the human genome (~ 3% of the genome)  Average gene length is ~ bp  Average of 5-6 exons per gene  Average exon length is ~ 200 bp  Average intron length is ~ 2000 bp  8% of the genes have a single exon  Some exons can be as small as 1 or 3 bp

DMD: the largest known human gene  The largest known human gene is DMD, the gene that encodes dystrophin: ~ 2.4 milion bp over 79 exons  X-linked recessive disease (affects boys)  Two variants: Duchenne-type (DMD) and becker-type (BMD)  Duchenne-type: more severe, frameshift-mutations Becker-type: milder phenotype, “in frame”- mutations Posture changes during progression of Duchenne muscular dystrophy

Nucleic acid basics  Nucleic acids are polymers  Each monomer consists of 3 moieties nucleoside nucleotide

Nucleic acid basics (2)  A base can be of 5 rings  Purines and Pyrimidines can base-pair (Watson- Crick pairs) Watson and Crick, 1953

Nucleic acid as hetero-polymers  Nucleosides, nucleotides (Ribose sugar, RNA precursor) (2’-deoxy ribose sugar, DNA precursor) (2’-deoxy thymidine tri- phosphate, nucleotide)  DNA and RNA strands REMEMBER: DNA =deoxyribonucleotides; RNA =ribonucleotides (OH-groups at the 2’ position) Note the directionality of DNA (5’-3’ & 3’-5’) or RNA (5’-3’) DNA = A, G, C, T ; RNA = A, G, C, U

So … DNARNA

Stability of base-pairing  C-G base pairing is more stable than A-T (A-U) base pairing (why?)  3 rd codon position has freedom to evolve (synonymous mutations)  Species can therefore optimise their G-C content (e.g. thermophiles are GC rich) (consequences for codon use?) Thermocrinis ruber, heat-loving bacteria

TAA, TAG, TGAStopStop codons CGT, CGC, CGA, CGG, AGA, AGGRArginine AAA, AAGKLysine GAT, GACDAspartic acid GAA, GAGEGlutamic acid CAT, CACHHistidine AAT, AACNAsparagine CAA, CAGQGlutamine TGGWTryptophan TAT, TACYTyrosine TCT, TCC, TCA, TCG, AGT, AGCSSerine ACT, ACC, ACA, ACGTThreonine CCT, CCC, CCA, CCGPProline GGT, GGC, GGA, GGGGGlycine GCT, GCC, GCA, GCGAAlanine TGT, TGC c Cysteine ATG M, Start Methionine TTT, TTCFPhenylalanine GTT, GTC, GTA, GTGVValine CTT, CTC, CTA, CTG, TTA, TTGLLeucine ATT, ATC, ATAIIsoleucine DNA codons Single Letter Code Amino Acid

DNA compositional biases  Base compositions of genomes: G+C (and therefore also A+T) content varies between different genomes  The GC-content is sometimes used to classify organism in taxonomy  High G+C content bacteria: Actinobacteria e.g. in Streptomyces coelicolor it is 72% Low G+C content: Plasmodium falciparum (~20%)  Other examples: Saccharomyces cerevisiae (yeast)38% Arabidopsis thaliana (plant)36% Escherichia coli (bacteria)50%

Genetic diseases: cystic fibrosis  Known since very early on (“Celtic gene”)  Autosomal, recessive, hereditary disease (Chr. 7)  Symptoms:  Exocrine glands (which produce sweat and mucus)  Abnormal secretions  Respiratory problems  Reduced fertility and (male) anatomical anomalies 30,000 3,000 20,000

cystic fibrosis (2)  Gene product: CFTR (cystic fibrosis transmembrane conductance regulator)  CFTR is an ABC (ATP-binding cassette) transporter or traffic ATPase.  These proteins transport molecules such as sugars, peptides, inorganic phosphate, chloride, and metal cations across the cellular membrane.  CFTR transports chloride ions (Cl - ) ions across the membranes of cells in the lungs, liver, pancreas, digestive tract, reproductive tract, and skin.

cystic fibrosis (3)  CF gene CFTR has 3-bp deletion leading to Del508 (Phe) in 1480 aa protein (epithelial Cl - channel)  Protein degraded in Endoplasmatic Reticulum (ER) instead of inserted into cell membrane The deltaF508 deletion is the most common cause of cystic fibrosis. The isoleucine (Ile) at amino acid position 507 remains unchanged because both ATC and ATT code for isoleucine Diagram depicting the five domains of the CFTR membrane protein (Sheppard 1999). Theoretical Model of NBD1. PDB identifier 1NBD as viewed in Protein Explorer

Let’s return to DNA and RNA structure …  Unlike three dimensional structures of proteins, DNA molecules assume simple double helical structures independent of their sequences.  There are three kinds of double helices that have been observed in DNA: type A, type B, and type Z, which differ in their geometries.  RNA on the other hand, can have as diverse structures as proteins, as well as simple double helix of type A.  The ability of being both informational and diverse in structure suggests that RNA was the prebiotic molecule that could function in both replication and catalysis (The RNA World Hypothesis).  In fact, some viruses encode their genetic materials by RNA (retrovirus)

Three dimensional structures of double helices Side view: A-DNA, B-DNA, Z-DNA Top view: A-DNA, B-DNA, Z-DNA Space-filling models of A, B and Z- DNA

Major and minor grooves

Forces that stabilize nucleic acid double helix  There are two major forces that contribute to stability of helix formation: Hydrogen bonding in base-pairing Hydrophobic interactions in base stacking 5’ 3’ Same strand stacking cross-strand stacking

Types of DNA double helix  Type A major conformation RNA minor conformation DNA Right-handed helix Short and broad  Type B major conformation DNA Right-handed helix Long and thin  Type Z minor conformation DNA Left-handed helix Longer and thinner

Secondary structures of Nucleic acids  DNA is primarily in duplex form  RNA is normally single stranded which can have a diverse form of secondary structures other than duplex.

Non B-DNA Secondary structures  Cruciform DNA  Triple helical DNA  Slipped DNA Hoogsteen basepairs Source: Van Dongen et al. (1999), Nature Structural Biology 6,

More Secondary structures  RNA pseudoknots  Cloverleaf rRNA structure Source: Cornelis W. A. Pleij in Gesteland, R. F. and Atkins, J. F. (1993) THE RNA WORLD. Cold Spring Harbor Laboratory Press. 16S rRNA Secondary Structure Based on Phylogenetic Data

3D structures of RNA : transfer-RNA structures  Secondary structure of tRNA (cloverleaf)  Tertiary structure of tRNA

3D structures of RNA : ribosomal-RNA structures  Secondary structure of large rRNA (16S)  Tertiary structure of large rRNA subunit Ban et al., Science 289 ( ), 2000

3D structures of RNA : Catalytic RNA  Secondary structure of self-splicing RNA  Tertiary structure of self-splicing RNA

Some structural rules …  Base-pairing is stabilizing  Un-paired sections (loops) destabilize  3D conformation with interactions makes up for this

Three main principles DNA makes RNA makes Protein Structure more conserved than sequence Sequence Structure Function

How to go from DNA to protein sequence A piece of double stranded DNA: 5’ attcgttggcaaatcgcccctatccggc 3’ 3’ taagcaaccgtttagcggggataggccg 5’ DNA direction is from 5’ to 3’

How to go from DNA to protein sequence 6-frame conceptual translation using the codon table: 5’ attcgttggcaaatcgcccctatccggc 3’ 3’ taagcaaccgtttagcggggataggccg 5’ So, there are six possibilities to make a protein from an unknown piece of DNA, only one of which might be a natural protein

Remark Identifying (annotating) human genes, i.e. finding what they are and what they do, is a difficult problem –First, the gene should be delineated on the genome Gene finding methods should be able to tell a gene region from a non- gene region Start, stop codons, further compositional differences –Then, a putative function should be found for the gene located

Dean, A. M. and G. B. Golding: Pacific Symposium on Bioinformatics 2000 Evolution and three-dimensional protein structure information Isocitrate dehydrogenase: The distance from the active site (in yellow) determines the rate of evolution (red = fast evolution, blue = slow evolution)

Genomic Data Sources DNA/protein sequence Expression (microarray) Proteome (xray, NMR, mass spectrometry) Metabolome Physiome (spatial, temporal) Integrative bioinformatics

Dinner discussion: Integrative Bioinformatics & Genomics VU metabolome proteome genome transcriptome physiome Genomic Data Sources Vertical Genomics

DNA makes RNA makes Protein (reminder)

DNA makes RNA makes Protein: Expression data More copies of mRNA for a gene leads to more protein mRNA can now be measured for all the genes in a cell at ones through microarray technology Can have 60,000 spots (genes) on a single gene chip Colour change gives intensity of gene expression (over- or under-expression)

Proteomics Elucidating all 3D structures of proteins in the cell This is also called Structural Genomics Finding out what these proteins do This is also called Functional Genomics

Protein-protein interaction networks

Metabolic networks Glycolysis and Gluconeogenesis Kegg database (Japan)

High-throughput Biological Data Enormous amounts of biological data are being generated by high-throughput capabilities; even more are coming –genomic sequences –arrayCGH (Comparative Genomic Hybridization) data, gene expression data –mass spectrometry data –protein-protein interaction data –protein structures –......

Protein structural data explosion Protein Data Bank (PDB): Structures (6 March 2001) x-ray crystallography, 1810 NMR, 278 theoretical models, others...

Dickerson’s formula: equivalent to Moore’s law On 27 March 2001 there were 12,123 3D protein structures in the PDB: Dickerson’s formula predicts 12,066 (within 0.5%)! n = e 0.19(y-1960) with y the year.

Sequence versus structural data Structural genomics initiatives are now in full swing and growth is still exponential. However, growth of sequence data is even more rapidly. There are now more than 500 completely sequenced genomes publicly available. Increasing gap between structural and sequence data (“Mind the gap”)

Bioinformatics Large - external (integrative)ScienceHuman Planetary ScienceCultural Anthropology Population Biology Sociology SociobiologyPsychology Systems Biology Biology Medicine Molecular Biology Chemistry Physics Small – internal (individual) Bioinformatics

Offers an ever more essential input to –Molecular Biology –Pharmacology (drug design) –Agriculture –Biotechnology –Clinical medicine –Anthropology –Forensic science –Chemical industries (detergent industries, etc.)