Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Slides:



Advertisements
Similar presentations
Uses of Cloned Genes sequencing reagents (eg, probes) protein production insufficient natural quantities modify/mutagenesis library screening Expression.
Advertisements

The genetic code.
Center for Biological Sequence Analysis Prokaryotic gene finding Marie Skovgaard Ph.D. student
 -GLOBIN MUTATIONS AND SICKLE CELL DISORDER (SCD) - RESTRICTION FRAGMENT LENGTH POLYMORPHISMS (RFLP)
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
RNA Say Hello to DNA’s little friend!. EngageEssential QuestionExplain Describe yourself to long lost uncle. How do the mechanisms of genetics and the.
Supplementary Fig.1: oligonucleotide primer sequences.
Gene Mutations Worksheet
Transcription & Translation Worksheet
Crick’s early Hypothesis Revisited. Or The Existence of a Universal Coding Frame Axel Bernal UPenn Center for Bioinformatics Jean-Louis Lassez Coastal.
1 Essential Computing for Bioinformatics Bienvenido Vélez UPR Mayaguez Lecture 5 High-level Programming with Python Part II: Container Objects Reference:
Transcription and Translation
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
Figure S1. Sequence alignment of yeast and horse cyt-c (Identity~60%), green highly conserved residues. There are 40 amino acid differences in the primary.
Dictionaries.
GENE MUTATIONS aka point mutations. DNA sequence ↓ mRNA sequence ↓ Polypeptide Gene mutations which affect only one gene Transcription Translation © 2010.
IGEM Arsenic Bioremediation Possibly finished biobrick for ArsR by adding a RBS and terminator. Will send for sequencing today or Monday.
Nature and Action of the Gene
FEATURES OF GENETIC CODE AND NON SENSE CODONS
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
Gene Prediction in silico Nita Parekh BIRC, IIIT, Hyderabad.
Math 15 Introduction to Scientific Data Analysis Lecture 10 Python Programming – Part 4 University of California, Merced Today – We have A Quiz!
More on translation. How DNA codes proteins The primary structure of each protein (the sequence of amino acids in the polypeptide chains that make up.
Undifferentiated Differentiated (4 d) Supplemental Figure S1.
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
Fig. S1 siControl E2 G1: 45.7% S: 26.9% G2-M: 27.4% siER  E2 G1: 70.9% S: 9.9% G2-M: 19.2% G1: 57.1% S: 12.0% G2-M: 30.9% siRNF31 E2 A B siRNF31 siControl.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
TRANSLATION: information transfer from RNA to protein the nucleotide sequence of the mRNA strand is translated into an amino acid sequence. This is accomplished.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
Prodigiosin Production in E. Coli Brian Hovey and Stephanie Vondrak.
Cell Division and Gene Expression
Passing Genetic Notes in Class CC106 / Discussion D by John R. Finnerty.
Chapter 14 Genetic Code and Transcription. You Must Know The differences between replication (from chapter 13), transcription and translation and the.
Supplementary materials
Dictionaries. A “Good morning” dictionary English: Good morning Spanish: Buenas días Swedish: God morgon German: Guten morgen Venda: Ndi matscheloni Afrikaans:
©1998 Timothy G. Standish From DNA To RNA To Protein Timothy G. Standish, Ph. D.
Parts is parts…. AMINO ACID building block of proteins contain an amino or NH 2 group and a carboxyl (acid) or COOH group PEPTIDE BOND covalent bond link.
Suppl. Figure 1 APP23 + X Terc +/- Terc +/-, APP23 + X Terc +/- G1Terc -/-, APP23 + X G1Terc -/- G2Terc -/-, APP23 + X G2Terc -/- G3Terc -/-, APP23 + and.
RA(4kb)- Atggagtccgaaatgctgcaatcgcctcttctgggcctgggggaggaagatgaggc……………………………………………….. ……………………………………………. ……………………….,……. …tactacatctccgtgtactcggtggagaagcgtgtcagatag.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Name of presentation Month 2009 SPARQ-ed PROJECT Mutations in the tumor suppressor gene p53 Pulari Thangavelu (PhD student) April Chromosome Instability.
DNA, RNA and Protein.
The response of amino acid frequencies to directional mutation pressure in mitochondrial genomes is related to the physical properties of the amino acids.
Ms. Hatch, What are we doing today?
Ms. Hatch, What are we doing today?
Fundamentals of Protein Structure
Modelling Proteomes.
Supplementary information Table-S1 (Xiao)
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
Supplemental Table 3. Oligonucleotides for qPCR
Laboratory Encounters in Plant Genomics
GENE MUTATIONS aka point mutations © 2016 Paul Billiet ODWS.
Supplementary Figure 1 – cDNA analysis reveals that three splice site alterations generate multiple RNA isoforms. (A) c.430-1G>C (IVS 6) results in 3.
Huntington Disease (HD)
DNA By: Mr. Kauffman.
Gene architecture and sequence annotation
PROTEIN SYNTHESIS RELAY
More on translation.
Molecular engineering of photoresponsive three-dimensional DNA
Quiz#8 LC710 10/20/10 name___________
Fundamentals of Protein Structure
Warm Up 3 2/5 Can DNA leave the nucleus?
Central Dogma and the Genetic Code
Laboratory Encounters in Plant Genomics
Python.
Station 2 Protein Synethsis.
6.096 Algorithms for Computational Biology Lecture 2 BLAST & Database Search Manolis Piotr Indyk.
DNA to proteins.
Shailaja Gantla, Conny T. M. Bakker, Bishram Deocharan, Narsing R
Presentation transcript:

Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance

Lecture 10, CS5672 Problems Deducing the genetic code Predicting genes Predicting signal peptide cleavage sites

Lecture 10, CS5673 Deducing the genetic code Problem: Given a codon, predict corresponding amino acid Of didactic value –Trivial mapping table, after-the-fact Perfect classification problem, rather than prediction –With minimal network Learning issues –‘Similar’ codons code for ‘similar’ amino acids –Abundance of amino acids proportional to code redundancy (this and previous point undermine effect of mutations) –Third base ‘wobble’ –N:1 mapping between codon and amino acid

Lecture 10, CS5674 The genetic code TCAG T TTT Phe (F) TTC " TTA Leu (L) TTG " TCT Ser (S) TCC " TCA " TCG " TAT Tyr (Y) TAC TAA Ter TAG Ter TGT Cys (C) TGC TGA Ter TGG Trp (W) C CTT Leu (L) CTC " CTA " CTG " CCT Pro (P) CCC " CCA " CCG " CAT His (H) CAC " CAA Gln (Q) CAG " CGT Arg (R) CGC " CGA " CGG " A ATT Ile (I) ATC " ATA " ATG Met (M) ACT Thr (T) ACC " ACA " ACG " AAT Asn (N) AAC " AAA Lys (K) AAG " AGT Ser (S) AGC " AGA Arg (R) AGG " G GTT Val (V) GTC " GTA " GTG " GCT Ala (A) GCC " GCA " GCG " GAT Asp (D) GAC " GAA Glu (E) GAG " GGT Gly (G) GGC " GGA " GGG "

Lecture 10, CS5675 Network Architecture Orthogonal coding (4X3)  2 hidden neurons (Is this a linear or non-linear problem?) 20 output neurons –Winner takes all Total of  86 parameters (How?) FFBP

Lecture 10, CS5676 Deducing the genetic code (Fig 6.7)

Lecture 10, CS5677 Deducing the genetic code (Fig 6.8)

Lecture 10, CS5678 Improving classification error Training rate high for misclassified codons, low otherwise (in addition to iteration dependence) Balanced cycles (Balanced in terms of amino acids, not codons) Adaptive training –Present mis-classified examples more often

Lecture 10, CS5679 Is it a gene or not a gene? Approaches depend on –Bias at junctions of coding and non-coding regions Donor (5’ end of intron) and acceptor sites (3’ end of intron) have biases in composition (GT [junk]+ C/U+ AG) –Bias in composition of coding regions (but not of non- coding regions, eg, introns) Exons are “regular guys”, introns are “freshman dorm rooms” Seen as GC bias, codon usage frequency and codon bias –Inverse relationship between the two (splice site strength and regularity within exons) “Food exit sign on highway doesn’t need prominent restaurant signs” “Stretch of prominent restaurant signs doesn’t need a sign indicating food”

Lecture 10, CS56710 Regularity within coding regions (Fig 6.11) BacteriaMammals C. elegans A. thaliana

Lecture 10, CS56711 Predicting Exons: The holy GRAIL Neural networks for gene prediction –Input representation/transformation key –NN per se trivial: MLP with single hidden layer and single output neuron –Input = Coding region candidate, transformed to 6mer (di-codon) score of candidate region 6mer (di-codon) score of flanking regions GC composition of candidate region GC composition of flanking region Markov model score Length of candidate Splice site score

Lecture 10, CS56712 Signal peptide (SignalP) prediction Signal peptides are N-terminal subsequences in proteins that are “export tags” including a “dotted line” (cleavage site) indicating point of detachment –Coding is species specific Problem analogous to exon/intron delineation –Distinguish between signalP and rest of protein –Find junction between signalP and rest of protein

Lecture 10, CS56713 Signal peptide (SignalP) prediction Two kinds of network that output, for each position, –S-score: Probability of classification as signal peptide –C-score: Probability of being the junction Key is post-processing – using S and C scores to come up with final prediction C-score prediction: Based on Asymmetric windows (why?) S-score prediction: Based on Symmetric windows (why?) Y-score = (C i  d S i ) 1/2 where  d S i = Average difference in S i in windows of size d flanking position i

Lecture 10, CS56714 Signal peptide (SignalP) prediction (Fig 6.5) S S