28. Juni 2016, BSc. Präsentation, M. Haberbusch F O R N A V I S A tool that helps to optimize the Forna Container‘s RNA secondary structure graph generation Max Haberbusch 28. Juni 2016, BSc. Präsentation, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Overview RNA Secondary Structure RNA Secondary Structure Graph Plotting Forna Container Metrics Simulation Visualization Prototype Advantages & Disadvantages BSc. Präsentation, 2016, M. Haberbusch
RNA Secondary Structure BSc. Präsentation, 2016, M. Haberbusch
RNA Secondary Structure (1) RNA consists of nucleotides RNA encodes the blueprint for proteins Secondary structure describes basepairing interactions between nucleotides Important for predicting and determining the function of RNA molecules FASTA Format, common format for storing RNA secondary structure: > Somesequence ; this is a random RNA sequence and its ; secondary structure AGAUAUGUGCCGGCCUAACUCUAACGGUAAUCUUCUGUCACGACCUACGCGCCGAGGUGACCUAUAAUAGCACGCACUACGCCGUCAACUACAGAGCAUU ((((...(((((............)))))))))(((((...(((....(((...(((((..(((...)))..))).)).))).)))....)))))..... BSc. Präsentation, 2016, M. Haberbusch
RNA Secondary Structure (2) Different RNA secondary structure representations depending on the purpose Undirected Graph Mountainplot Bonding Graph FASTA Format (a) (b) (c) BSc. Präsentation, 2016, M. Haberbusch
RNA Secondary Structure as Undirected Graph BSc. Präsentation, 2016, M. Haberbusch
RNA Secondary Structure as Undirected Graph Important for identifying substructures Substructures Loops Hairpins Stems etc. Visualization of canonical and non-canonical interactions between nucleotides BSc. Präsentation, 2016, M. Haberbusch
Examples from Google Search BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Forna Container BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Forna Container (1) Force-directed graph layout algorithm for RNA secondary structure graph plotting in web browser Implemented by the Theoretical Biochemistry Group at the Institute for Theoretical Chemistry at University of Vienna P. Kerpedjiev, S. Hammer, I. Hofacker. "Forna (force-directed RNA): simple and effective online RNA secondary structure diagrams."Bioinformatics (2015): btv372. BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Forna Container (3) BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Example Outputs Friction: 0.65 Charge: -40 Chargedistance: 80 Friction: 0.65 Charge: -140 Chargedistance: 40 Friction: 0.95 Charge: -200 Chargedistance: 150 BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Metrics BSc. Präsentation, 2016, M. Haberbusch
Nodecollisions & Linkcollisions Backbonelink Überlappungen Node Überlappungen BSc. Präsentation, 2016, M. Haberbusch
Linklength Deviation & Looproundness Friction: 0.95 Charge: -190 Chargedistance: 150 Friction: 0.65 Charge: -40 Chargedistance: 80 Linkcollisions: 0 Nodecollisions: 0 Linklength Deviations: 6.5 Loop Roundness: 0.77 Linkcollisions: 0 Nodecollisions: 0 Linklength Deviations: 1.48 Loop Roundness: 0.22 vs BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Looproundness BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Simulation BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Simulation BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Sampling Friction: [0.3 ; 0.95] with step size of 0.05 {0.3, 0.35, 0.4, …, 0.95} Charge: [-30 ; -200] with step size of 10 {-30, -40, -50, …, -200} Chargedistance: [30 ; 150] with step size of 10 {30, 40, 50, …, 150} Basic quantity: BQ = friction x charge x chargedistance |BQ| = 3276 combinations Random sampling: 1000 combinations, randomly picked out of the basic quantity BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Random Sampling Why? Clutter Reduction (Visualization) 1000 points vs 3276 points per structure Runtime Reduction (Simulation) Basic quantity: 3276*100*5s= 19days Random sample: 1000*100*5s= 6days File size reduction (Simulation Results) Literature G. Ellis, Random Sampling as a Clutter Reduction Technique to Facilitate Interactive Visualisation of Large Datasets (2008) BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Simulation 1000 parameter combinations per structure Currently 74 structures BSc. Präsentation, 2016, M. Haberbusch
Visualization Prototype BSc. Präsentation, 2016, M. Haberbusch
Advantages & Disadvantages BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Advantages Allows quickly determining optima via pareto view Identifying the distribution of simulation results Understanding the performance of specific parameter combinations on different structuresizes Identify input parameter trends Comparing the quality of the graph regarding two metrics Testing and comparing parameter combinations directly via side by side comparison of the drawn structures BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Disadvantages Just two metrics in the visualization Only three input parameters in the visualization No multiple point selection for better comparison No statement on the stability of the algorithm Performance (loadtimes) Difficult/unable to identify combinations in the input parameter scatter plot Red green blue coding problematic regarding colorblindness BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Improvements Parallel coordinates instead or in addition to input parameter scatter plots Eventually color coding of points in input parameter scatter plots Performance tuning (fine tuning of caching) Highlighted points placed on top of others Input parameter trends depending on structure size Two paretoscatter side by side to compare outcomes for two different structsizes BSc. Präsentation, 2016, M. Haberbusch
Thank you for your attention! BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Any Questions??? BSc. Präsentation, 2016, M. Haberbusch
Simulation Workflow (1) rsampling.php to generate N random parameter combinations rstructuress.php to generate N random RNA sequences with different length in FASTA-format RNAFold from Vienna RNA Package to generate the secondary structures startsimulation.sh to run the simulation BSc. Präsentation, 2016, M. Haberbusch
Simulation Workflow (2) combinations.json structures.struct CGCUUCAUAUAAUCCUAAUGACCUAU ((..((....)).(((....))).)) […] ./startsimulation.sh combinations.json structures.struct simulationresults BSc. Präsentation, 2016, M. Haberbusch
Simulation Workflow (3) Simulation output: Fileformat: JSON One file per structure containing all iterations BSc. Präsentation, 2016, M. Haberbusch
BSc. Präsentation, 2016, M. Haberbusch Simulation Dataset Structure of the dataset to load in the simulation BSc. Präsentation, 2016, M. Haberbusch