Introduction to bioinformatics Lecture 2 Genes and Genomes.

Slides:



Advertisements
Similar presentations
MNW leerlijn Bioinformatics Bioinformatics & Systems Biology Faculty of Sciences & Faculty of Earth and Life Sciences Jaap Heringa – 12 sep 2011.
Advertisements

Nucleic Acids and Protein Synthesis Nucleic Acids and Protein Synthesis Cell Growth and Division Cell Growth and Division Heredity Heredity Genetic Engineering.
Bioinformatics Master Course II: DNA/Protein structure-function analysis and prediction Lecture 12: DNA/RNA structure Centre for Integrative Bioinformatics.
Translation By Josh Morris.
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
RNA Say Hello to DNA’s little friend!. EngageEssential QuestionExplain Describe yourself to long lost uncle. How do the mechanisms of genetics and the.
Supplementary Fig.1: oligonucleotide primer sequences.
Transcription & Translation Worksheet
Ulf Schmitz, Introduction to molecular and cell biology1 Bioinformatics Introduction to molecular and cell biology Ulf Schmitz
Introduction to bioinformatics Lecture 2 Genes and Genomes.
“INTRODUCTION TO BIOINFORMATICS” by (Aqsad). What is Bioinformatics? Bioinformatics = Biology + Information Biology is becoming an information science.
1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands.
1-Month Practical Master Course Genome Analysis (Integrative Bioinformatics & Genomics) Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije.
 Genetic information, stored in the chromosomes and transmitted to the daughter cells through DNA replication is expressed through transcription to RNA.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
Transcription and Translation
Proteins are made by decoding the Information in DNA Proteins are not built directly from DNA.
Nature and Action of the Gene
Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
DNA The Secret of Life. Deoxyribonucleic Acid DNA is the molecule responsible for controlling the activities of the cell It is the hereditary molecule.
Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,
PROTEIN SYNTHESIS NOTES #1. Review What is transcription? Copying of DNA onto mRNA Where does transcription occur? In the Nucleus When copying DNA onto.
1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Human Biology Sylvia S. Mader Michael Windelspecht Chapter.
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
Cystic Fibrosis Hereditary recessive trait disease
TRANSLATION: information transfer from RNA to protein the nucleotide sequence of the mRNA strand is translated into an amino acid sequence. This is accomplished.
RNA Structure Like DNA, RNA is a nucleic acid. RNA is a nucleic acid made up of repeating nucleotides.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
Do Now Look at the picture below and answer the following questions.
Chapter 11 DNA and Genes.
NSCI 314 LIFE IN THE COSMOS 4 - The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB
The Purpose of DNA To make PROTEINS! Proteins give us our traits (ex: one protein gives a person blue eyes, another gives brown Central Dogma of Molecular.
Genes and Genomic Datasets. DNA compositional biases Base composition of genomes: E. coli: 25% A, 25% C, 25% G, 25% T P. falciparum (Malaria parasite):
Introduction to bioinformatics Lecture 2 Genes and Genomes C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
PM703 Practical Biotechnology (2015). Bioinformatics Lab Learn the DNA language Material by Dr. Ramy K. Aziz.
Parts is parts…. AMINO ACID building block of proteins contain an amino or NH 2 group and a carboxyl (acid) or COOH group PEPTIDE BOND covalent bond link.
Rate of mutations in the Human Genome A study published in Current Biology in 2009, shows that in total, we all carry new mutations in our DNA.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Name of presentation Month 2009 SPARQ-ed PROJECT Mutations in the tumor suppressor gene p53 Pulari Thangavelu (PhD student) April Chromosome Instability.
DNA, RNA and Protein.
G U A C G U A C C A U G G U A C A C U G UUU UUC UUA UCU UUG UCC UCA
Protein Synthesis Translation e.com/watch?v=_ Q2Ba2cFAew (central dogma song) e.com/watch?v=_ Q2Ba2cFAew.
From DNA to Protein.
Translation PROTEIN SYNTHESIS.
Whole process Step by step- from chromosomes to proteins.
Please turn in your homework
RNA and Protein Synthesis
Modelling Proteomes.
Bellringer Three consecutive bases in mRNA are known as what?
Supplementary information Table-S1 (Xiao)
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
Python.
Huntington Disease (HD)
Section Objectives Relate the concept of the gene to the sequence of nucleotides in DNA. Sequence the steps involved in protein synthesis.
Protein Synthesis Translation.
DNA By: Mr. Kauffman.
DNA The Secret of Life.
Gene architecture and sequence annotation
Transcription You’re made of meat, which is made of protein.
BELL RINGER What are the base pairing rules for DNA replication?
20.2 Gene Expression & Protein Synthesis
Fundamentals of Protein Structure
Transcription and Translation
Today’s notes from the student table Something to write with
Transcription and Translation
Python.
Bellringer Please answer on your bellringer sheet:
Presentation transcript:

Introduction to bioinformatics Lecture 2 Genes and Genomes

.....acctc ctgtgcaaga acatgaaaca cctgtggttc ttccttctcc tggtggcagc tcccagatgg gtcctgtccc aggtgcacct gcaggagtcg ggcccaggac tggggaagcc tccagagctc aaaaccccac ttggtgacac aactcacaca tgcccacggt gcccagagcc caaatcttgt gacacacctc ccccgtgccc acggtgccca gagcccaaat cttgtgacac acctccccca tgcccacggt gcccagagcc caaatcttgt gacacacctc ccccgtgccc ccggtgccca gcacctgaac tcttgggagg accgtcagtc ttcctcttcc ccccaaaacc caaggatacc cttatgattt cccggacccc tgaggtcacg tgcgtggtgg tggacgtgag ccacgaagac cccgaggtcc agttcaagtg gtacgtggac ggcgtggagg tgcataatgc caagacaaag ctgcgggagg agcagtacaa cagcacgttc cgtgtggtca gcgtcctcac cgtcctgcac caggactggc tgaacggcaa ggagtacaag tgcaaggtct ccaacaaagc aaccaagtca gcctgacctg cctggtcaaa ggcttctacc ccagcgacat cgccgtggag tgggagagca atgggcagcc ggagaacaac tacaacacca cgcctcccat gctggactcc gacggctcct tcttcctcta cagcaagctc accgtggaca agagcaggtg gcagcagggg aacatcttct catgctccgt gatgcatgag gctctgcaca accgctacac gcagaagagc ctctc..... DNA sequence

Four DNA nucleotide building blocks

DNA compositional biases Base composition of genomes: E. coli: 25% A, 25% C, 25% G, 25% T P. falciparum (Malaria parasite): 82% A+T Translation initiation: ATG (AUG) is the near universal motif indicating the start of translation in DNA coding sequence.

Amino AcidSLCDNA codons Isoleucine IATT, ATC, ATA Leucine LCTT, CTC, CTA, CTG, TTA, TTG ValineVGTT, GTC, GTA, GTG Phenylalanine FTTT, TTC MethionineMATG Cysteine c TGT, TGC Alanine AGCT, GCC, GCA, GCG Glycine GGGT, GGC, GGA, GGG Proline PCCT, CCC, CCA, CCG Threonine TACT, ACC, ACA, ACG Serine STCT, TCC, TCA, TCG, AGT, AGC Tyrosine YTAT, TAC Tryptophan WTGG Glutamine QCAA, CAG Asparagine NAAT, AAC Histidine HCAT, CAC Glutamic acid EGAA, GAG Aspartic acid DGAT, GAC Lysine KAAA, AAG Arginine RCGT, CGC, CGA, CGG, AGA, AGG Stop codonsStopTAA, TAG, TGA

A gene codes for a protein Protein mRNA DNA transcription translation CCTGAGCCAACTATTGATGAA PEPTIDEPEPTIDE CCUGAGCCAACUAUUGAUGAA

Humans have spliced genes…

DNA makes RNA makes Protein

Some facts about human genes Comprise about 3% of the genome Average gene length: ~ 8,000 bp Average of 5-6 exons/gene Average exon length: ~200 bp Average intron length: ~2,000 bp ~8% genes have a single exon Some exons can be as small as 1 or 3 bp. HUMFMR1S is not atypical: 17 exons bp long, comprising 3% of a 67,000 bp gene

Genetic diseases Many diseases run in families and are a result of genes which predispose such family members to these illnesses Examples are Alzheimer’s disease, cystic fibrosis (CF), breast or colon cancer, or heart diseases. Some of these diseases can be caused by a problem within a single gene, such as with CF.

Genetic diseases (Cont.) For other illnesses, like heart disease, at least genes are thought to play a part, and it is still unknown which combination of problems within which genes are responsible. With a “problem” within a gene is meant that a single nucleotide or a combination of those within the gene are causing the disease (or make that the body is not sufficiently fighting the disease). Persons with different combinations of these nucleotides could then be unaffected by these diseases.

Genetic diseases (Cont.) Cystic Fibrosis Known since very early on (“Celtic gene”). One in 10,000 people displays disease, 1 in 20 is an unaffected carrier of an abnormal CF gene. These people usually are unaware that they are carriers. About 30,000 Americans, 3000 Canadians, and 20,000 Europeans have CF. Inherited autosomal recessive condition (Chr. 7) Symptoms: –Clogging and infection of lungs (early death) –Intestinal obstruction –Reduced fertility and (male) anatomical anomalies

Genetic diseases (Cont.) Cystic Fibrosis Name of Gene Product: cystic fibrosis transmembrane conductance regulator (CFTR) CFTR is an ABC (ATP-binding cassette) transporter or traffic ATPase. These proteins transport molecules such as sugars, peptides, inorganic phosphate, chloride, and metal cations across the cellular membrane. CFTR transports chloride ions (Cl - ) ions across the membranes of cells in the lungs, liver, pancreas, digestive tract, reproductive tract, and skin.

Genetic diseases (Cont.) Cystic Fibrosis CF gene CFTR has 3-bp deletion leading to Del508 (Phe) in 1480 aa protein (epithelial Cl - channel) – the protein is degraded in the Endoplasmatic Reticulum (ER) instead of being inserted into cell membrane Diagram depicting the five domains of the CFTR membrane protein (Sheppard 1999). Theoretical Model of NBD1. PDB identifier 1NBD as viewed in Protein Explorer The deltaF508 deletion is the most common cause of cystic fibrosis. The isoleucine (Ile) at amino acid position 507 remains unchanged because both ATC and ATT code for isoleucine

Genomic Data Sources DNA/protein sequence Expression (microarray) Proteome (xray, NMR, mass spectrometry) Metabolome Physiome (spatial, temporal) Integrative bioinformatics

Dinner discussion: Integrative Bioinformatics & Genomics VU metabolome proteome genome transcriptome physiome Genomic Data Sources Vertical Genomics

Remark Identifying (annotating) human genes, i.e. finding what they are and what they do, is a difficult problem. It is considerably harder than the early success story for ß- globin might suggest (see Lesk’s “Introduction to bioinf”). The human factor VIII gene (whose mutations cause hemophilia A) is spread over ~186,000 bp. It consists of 26 exons ranging in size from 69 to 3,106 bp, and its 25 introns range in size from 207 to 32,400 bp. The complete gene comprises ~9 kb of exon and ~177 kb of intron. The biggest human gene yet is for dystrophin. It has >30 exons and is spread over 2.4 million bp.

DNA makes RNA makes Protein (reminder)

DNA makes RNA makes Protein: Expression data More copies of mRNA for a gene leads to more protein mRNA can now be measured for all the genes in a cell at ones through microarray technology Can have 60,000 spots (genes) on a single gene chip Colour change gives intensity of gene expression (over- or under-expression)

Proteomics Elucidating all 3D structures of proteins in the cell This is also called Structural Genomics Finding out what these proteins do This is also called Functional Genomics

Protein-protein interaction networks

Metabolic networks Glycolysis and Gluconeogenesis Kegg database (Japan)

High-throughput Biological Data Enormous amounts of biological data are being generated by high-throughput capabilities; even more are coming –genomic sequences –gene expression data –mass spec. data –protein-protein interaction –protein structures –......

Protein structural data explosion Protein Data Bank (PDB): Structures (6 March 2001) x-ray crystallography, 1810 NMR, 278 theoretical models, others...

Dickerson’s formula: equivalent to Moore’s law On 27 March 2001 there were 12,123 3D protein structures in the PDB: Dickerson’s formula predicts 12,066 (within 0.5%)! n = e 0.19(y-1960) with y the year.

Sequence versus structural data Structural genomics initiatives are now in full swing and growth is still exponential. However, growth of sequence data is even more rapidly. There are now more than 300 completely sequenced genomes publicly available. Increasing gap between structural and sequence data (“Mind the gap”)

Bioinformatics Large - external (integrative)ScienceHuman Planetary ScienceCultural Anthropology Population Biology Sociology SociobiologyPsychology Systems Biology Biology Medicine Molecular Biology Chemistry Physics Small – internal (individual) Bioinformatics

Offers an ever more essential input to –Molecular Biology –Pharmacology (drug design) –Agriculture –Biotechnology –Clinical medicine –Anthropology –Forensic science –Chemical industries (detergent industries, etc.)