Outline What is an amino acid / protein 20 naturally occurring amino acids Codon – triplet coding for an amino acid How are proteins synthesized Transcription & translation DNA, chromosomes and base-pairing Genes, intron and exons Reading frames
Amino Acids Functional group Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon The amino acids found in Living organisms are L-amino acids.
Amino Acids - peptide bond Send around a model of a di-peptide N-terminal C-terminal
Examples of protein 3D-structure Human proteins typically has a length of 220aa Small protein one domain Bigger protein two domains
The 20 amino acids Thr (T) Phe (F) Val (V) Ala (A) His (H) Arg (R) Ser (S) Leu (L) Cys (C) Met (M) Asp (D) Lys (K) Asn (N) Ile (I) Trp (W) Gln (Q) Glu (E) Tyr (Y) Pro (P) Gly (G)
Sidechain determines physical property Positive charged (basic) amino acids are: R, K, H Negative (acidic): D and E Arg - R Lys - K Asp - D Glu - E What is the charge ? R,K (+) at physiological pH D,E (-) at physiological pH H sometimes (+) at physiological pH These amino acids are also polar His - H
Livingstone & Barton, CABIOS, 9, 745-756, 1993 Amino acid Amino Acids A – Ala C – Cys D – Asp E – Glu F – Phe G – Gly H – His I – Ile K – Lys L – Leu M – Met N – Asn P – Pro Q – Gln R – Arg S – Ser T – Thr V – Val W – Trp Y - Tyr What is the charge ? R,K (+) at physiological pH D,E (-) at physiological pH H sometimes (+) at physiological pH These amino acids are also polar Livingstone & Barton, CABIOS, 9, 745-756, 1993
tRNA – amino acids and codons Anti-codon Codon for Phenylalanine is TTC
Transcription & translation DNA | Transcription mRNA | Translation Protein movie In higher organisms the picture is a bit more complex. DNA -> pre-mRNA -> mRNA ->protein Protein might need a chaperone in order to fold correctly.
DNA - a double helix 5’ - A T T G C C - 3’ 3’ - T A A C G G - 5’ Many organelles in a cell: Mention: Nucleus, cytoplasma, membrane, ER James Watson and Francis Crick with their model of the structure of the DNA molecule, 1953 5’ - A T T G C C - 3’ 3’ - T A A C G G - 5’
DNA - Base pairing of nucleotides T in DNA is replaced by U (Uracil) in RNA -CH3 group in Thymine replaced with an -H in Uracil
Genes, chromosomes and base pairs Genes are located at the chromosomes 3.000.000 bp in human genome - diploid => 6.000.000 bp Many organelles in a cell: Mention: Nucleus, cytoplasma, membrane, ER
Gene structure - start stop and UTR A gene starts in 5’ end with ATG - stop in 3’ end with TAG stop codon Introns are spliced out from DNA transcript => mRNA Transcript – piece of DNA that is transcribed into RNA i.e. introns are spliced out mRNA with UTR-regions
ARTN_HUMAN chr1:44401329-44402434
Single Nucleotide Polymorphism SNP SNPs can be located anywere in the genome non synomous (nsSNP) i.e. amino acid is changed Synomous SNP does not affect the the protein An amino acid is coded by 3 nucleotides Leu: TTG
RNA/DNA translation table - codon
Identify possible start codons how many ? GATAATGGGGCATTCAGTACAAAAATCCCGTACGGAGCTA GGCAGCTAACCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCATGGGTAATGCCATGGTATAGCA TGATAATGGGGCATTCAGTACAAAAATCCCGTACGTAGCT GGTAGCTAGCCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCAGTGGGTAAGTGCCATGGTATAG GATAATGGGGCATTCAGTACAAAAATCCCGTACGGAGCTA GGTAGCTAGCCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCATGGGTAATGCCATGGTATAGCA TGATAATGGGGCATTCAGTACAAAAATCCCGTACGTAGCT TTCGATCATTCATTGTCAGTGGGTAAGTGCCATGGTATAG
Identify possible start codons how many ? GATAATGGGGCATTCAGTACAAAAATCCCGTACGGAGCTA GGCAGCTAACCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCATGGGTAATGCCATGGTATAGCA TGATAATGGGGCATTCAGTACAAAAATCCCGTACGTAGCT GGTAGCTAGCCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCAGTGGGTAAGTGCCATGGTATAG GATAATGGGGCATTCAGTACAAAAATCCCGTACGGAGCTA GGTAGCTAGCCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCATGGGTAATGCCATGGTATAGCA TGATAATGGGGCATTCAGTACAAAAATCCCGTACGTAGCT TTCGATCATTCATTGTCAGTGGGTAAGTGCCATGGTATAG
Reding frame An open reading frame (orf) is a piece of DNA from start to stop ATG (start codon) -> TAG or TGA or TAA (stop codons) GATAATGGGGCATTCAGTACAAAAATCCCGTACGGAGCTA GGCAGCTAACCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCATGGTAAGTGCCATGGTATAGCA TGATAATGGGGCATTCAGTACAAAAATCCCGTACGTAGCT GGTAGCTAGCCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCAGTGGGTAAGTGCCATGGTATAG Does the gene stop at that TAG ? 123 123 123 123 123 123 123 123 123 ... ATG CCA TGC ATA GCC CCT GCC ATA TCT ... GATAATGGGGCATTCAGTACAAAAATCCCGTACGGAGCTA GGCAGCTAACCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCATGGTAAGTGCCATGGTATAGCA TGATAATGGGGCATTCAGTACAAAAATCCCGTACGTAGCT GGTAGCTAGCCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCAGTGGGTAAGTGCCATGGTATAG
Reding frame - II GATAATGGGGCATTCAGTACAAAAATCCCGTACGGAGCTA GGCAGCTAACCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCATGGTAAGTGCCATGGTATAGCA TGATAATGGGGCATTCAGTACAAAAATCCCGTACGTAGCT GGTAGCTAGCCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCAGTGGGTAAGTGCCATGGTATAG 123 123 123 123 123 123 123 123 123 ... ATG CCA TGC ATA GCC CCT GCC ATA TCT ... What is the result of translating the mRNA into protein ? (only first 9 codons – use handout) 123 123 123 123 123 123 123 123 123 ... ATG CCA TGC ATA GCC CCT GCC ATA TCT ... M P S I A P A I S
Forward and reverse strand GATAATGGGGCATTCAGTACAAAAATCCCGTACGGAGCTA GGCAGCTAACCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCATGGTAAGTGCCATGGTATAGCA TGATAATGGGGCATTCAGTACAAAAATCCCGTACGTAGCT GGTAGCTAGCCCGATGCCATGCATAGCCCCTGCCATATCT TTCGATCATTCATTGTCAGTGGGTAAGTGCCATGGTATAG 5’-ATGCCATGCATAGCCC-3’ (forward or plus strand) 3’-TACGGTACGTATCGGG-5’ (reverse or negative strand)
Reading frame and reverse complement Having a piece of DNA like: TGCCATGCATAGCCCCTGCCATATCT Forward strings & reading frames 1 : TGCCATGCATAGCCCCTGCCATATCT 2 : GCCATGCATAGCCCCTGCCATATCT 3 : CCATGCATAGCCCCTGCCATATCT Reverse complement strings & reading frames -1: TCTATACCGTCCCCGATACGTACCGT -2: CTATACCGTCCCCGATACGTACCGT -3: TATACCGTCCCCGATACGTACCGT
Summary - protein 20 naturally occurring amino acids L-amino acids Amino acid is defined by a codon One and three letter codes (important) Protein reads from N -> C terminal
Summary – DNA/RNA transcription translation DNA -> mRNA -> Protein DNA: A-T, C-G RNA: A-U, C-G DNA/RNA strand reads from 5’ -> 3’ Gene starts with ATG until stop codon 64 codons, but only 20 amino acids
Subcellular location An animal cell: Many organelles in a cell: Mention: Nucleus, cytoplasma, membrane, ER
Proteins - where are they found Proteins are found in all living organisms In humans there are approx 25.000 proteins Each protein has a specific function Making up the human tissue - skin, hair, heart ... Degrading the food we eat Immune system Transportation of Oxygen in blood Triggering the growth of cells The brain - neural signalling Typically a protein is approx 220 aa in human Proteins talk, i.e. Protein-protein interactions To relay a signal across the cell-membrane
1 and 3-letter codes There are 20 naturally occurring amino acids Normally the one/three codes are used Met - M Asn - N Pro - P Gln - Q Arg - R Ser - S Thr - T Val - V Trp - W Tyr - Y Ala - A Cys - C Asp - D Glu - E Phe - F Gly - G His - H Ile - I Lys - K Leu - L
Graphic stick representation Different aa, different property Ile - Hydrophobic Phe - Hydrophobic & aromatic How many carbons are there in the Ile aa ? Ans: 6 Are they L-amino acids ? Ans: Yes
The 20 amino acids Just an overview - picture trying to show that the C-alpha atom is chiral
Charged amino acids (sidechain in red) Arg - R Lys - K Asp - D Glu - E What is the charge ? R,K (+) at physiological pH D,E (-) at physiological pH H sometimes (+) at physiological pH These amino acids are also polar His - H
Neutral amino acids Ile - I Leu - L Met - M Phe - F Pro - P Property ?: I - neutral, L - neutral, M - neutral, F - neutral, P - neutral Pro - P
Property of amino acids (I) neutral, polar or charged ? Ala - A Asn - N Cys - C Gln - Q Property ?: A - neutral, N - polar C - slightly ploar, Q - polar, G - neutral Gly - G
Property of amino acids (II) neutral, polar or charged ? Ser - S Thr - T Trp - W Tyr - Y Property ?: S - polar, T - polar W - slightly ploar, Y - neutral, V - neutral Val - V
Protein structure Primary structure: Amino acids sequences Secondary structure: Helix/Beta sheet Tertiary structure: Fold, 3D cordinates
Protein structure -helix helix 3 residues/turn - few, but not uncommon -helix 3.6 residues/turn - by far the most common helix Pi-helix 4.1 residues/turn - very rare
Protein structure strand/sheet
Protein structure Ribbon representation - easy to see the secondary structure elements
Protein structure Too many atoms - information is lost However notice proteins are surrounded by by water
Protein structure Hydrophilic/hydrophobic & stacking
Protein folds Class 4’th is ‘few secondary structure Architecture Overall shape of a domain Topology Share secondary structure connectivity
Summary transcription translation DNA -> mRNA -> Protein 20 naturally occurring amino acids Each amino acid has different properties, but can be grouped into: Charged, neutral, polar (basic, acidic) Secondary structure -helix & -strand/-sheet Protein hydrophobic inside, polar on the outside Folds or classes: all , all , +, few secondar structure elements
Summary The amino acids Ala M - Met N - Asn C - Cys P - Pro D - Asp Q - Gln E - Glu R - Arg F - Phe S - Ser G - Gly H - His T - Thr I - Ile V - Val K - Lys W - Trp L - Leu Y - Tyr