28. Juni 2016, BSc. Präsentation, M. Haberbusch

Slides:



Advertisements
Similar presentations
Weighted Flow graphs for statistics Edwin de Jonge NTTS February 2009.
Advertisements

Britain Southwick Nicole Anguiano March 29, 2014
Training Manual Aug Probabilistic Design: Bringing FEA closer to REALITY! 2.5 Probabilistic Design Exploring randomness and scatter.
Introduction to Bioinformatics - Tutorial no. 9 RNA Secondary Structure Prediction.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Hardware-based Load Generation for Testing Servers Lorenzo Orecchia Madhur Tulsiani CS 252 Spring 2006 Final Project Presentation May 1, 2006.
Subdue Graph Visualizer by Gayathri Sampath, M.S. (CSE) University of Texas at Arlington.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
T T07-01 Sample Size Effect – Normal Distribution Purpose Allows the analyst to analyze the effect that sample size has on a sampling distribution.
Visualization of Graph Data CS 4390/5390 Data Visualization Shirley Moore, Instructor October 6,
Overview of Search Engines
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
NGS data analysis CCM Seminar series Michael Liang:
Summer Student Program 15 August 2007 Cluster visualization using parallel coordinates representation Bastien Dalla Piazza Supervisor: Olivier Couet.
ETM 607 – Output Analysis: Estimation of Relative Performance Output comparison between two or more alternative systems Common Random Numbers (CRN) Comparison.
EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond.
Probabilistic Design Systems (PDS) Chapter Seven.
The Protein Identifier Cross-Reference (PICR) service.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Examining Protein Folding Process Simulation and.
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
NETWORK FLOWS Shruti Aggrawal Preeti Palkar. Requirements 1.Implement the Ford-Fulkerson algorithm for computing network flow in bipartite graphs. 2.For.
HELM 2.0 Toolkit Code Orientation. HELM 2.0 Package overview 2 HELM2NotationToolkit ChemistryToolkit ChemistryToolkitMarvinChemistryToolkitCDK HELMNotationParser.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Scripting Languages And Environments Paul Fitzpatrick for Abdur Rahman.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
DDC 2423 DATA STRUCTURE Main text:
Complex Geometry Visualization TOol
Introduction to Redux Header Eric W. Greene Microsoft Virtual Academy
Spark Presentation.
Unified Modeling Language
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
System Design.
Optimization and Parallelization of CBD models
Mirela Andronescu February 22, 2005 Lab 8.3 (c) 2005 CGDN.
Lab 8.3: RNA Secondary Structure
PreOpenSeesPost: a Generic Interface for OpenSees
Introducing OpenEdX Hosam Shahin CS 6604 – Online Education Systems
A Deep Dive into Logic Apps
Cloud Distributed Computing Environment Hadoop
Development of the Nanoconfinement Science Gateway
Dr. Hatem Elaydi Fall 2014 Lead Compensator
CMPT 733, SPRING 2016 Jiannan Wang
Learning to Program in Python
GIFT / Fiscal Data Package Iteration 3
Parallel Cartographic Modeling
PLGEM fits equally well on NSAF and GeneChip datasets
Mean Reverting Asset Trading
Identification and Characterization of pre-miRNA Candidates in the C
Volume 3, Issue 1, Pages (July 2016)
Kevin Mason Michael Suggs
Noémi Gaskó, Rodica Ioana Lung, Mihai Alexandru Suciu
Business and Management Research
Grid Based Data Integration with Automatic Wrapper Generation
CSE 373 Data Structures and Algorithms
Extracting Recipes from Chemical Academic Papers
Volume 13, Issue 9, Pages (December 2015)
MECH 3550 : Simulation & Visualization
CMPT 733, SPRING 2017 Jiannan Wang
Sample PowerPoint presentation
Cengizhan Can Phoebe de Nooijer
Map Reduce, Types, Formats and Features
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Unit III – Chapter 3 Path Testing.
Presentation transcript:

28. Juni 2016, BSc. Präsentation, M. Haberbusch F O R N A V I S A tool that helps to optimize the Forna Container‘s RNA secondary structure graph generation Max Haberbusch 28. Juni 2016, BSc. Präsentation, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Overview RNA Secondary Structure RNA Secondary Structure Graph Plotting Forna Container Metrics Simulation Visualization Prototype Advantages & Disadvantages BSc. Präsentation, 2016, M. Haberbusch

RNA Secondary Structure BSc. Präsentation, 2016, M. Haberbusch

RNA Secondary Structure (1) RNA consists of nucleotides RNA encodes the blueprint for proteins Secondary structure describes basepairing interactions between nucleotides Important for predicting and determining the function of RNA molecules FASTA Format, common format for storing RNA secondary structure: > Somesequence ; this is a random RNA sequence and its ; secondary structure AGAUAUGUGCCGGCCUAACUCUAACGGUAAUCUUCUGUCACGACCUACGCGCCGAGGUGACCUAUAAUAGCACGCACUACGCCGUCAACUACAGAGCAUU ((((...(((((............)))))))))(((((...(((....(((...(((((..(((...)))..))).)).))).)))....)))))..... BSc. Präsentation, 2016, M. Haberbusch

RNA Secondary Structure (2) Different RNA secondary structure representations depending on the purpose Undirected Graph Mountainplot Bonding Graph FASTA Format (a) (b) (c) BSc. Präsentation, 2016, M. Haberbusch

RNA Secondary Structure as Undirected Graph BSc. Präsentation, 2016, M. Haberbusch

RNA Secondary Structure as Undirected Graph Important for identifying substructures Substructures Loops Hairpins Stems etc. Visualization of canonical and non-canonical interactions between nucleotides BSc. Präsentation, 2016, M. Haberbusch

Examples from Google Search BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Forna Container BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Forna Container (1) Force-directed graph layout algorithm for RNA secondary structure graph plotting in web browser Implemented by the Theoretical Biochemistry Group at the Institute for Theoretical Chemistry at University of Vienna P. Kerpedjiev, S. Hammer, I. Hofacker. "Forna (force-directed RNA): simple and effective online RNA secondary structure diagrams."Bioinformatics (2015): btv372. BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Forna Container (3) BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Example Outputs Friction: 0.65 Charge: -40 Chargedistance: 80 Friction: 0.65 Charge: -140 Chargedistance: 40 Friction: 0.95 Charge: -200 Chargedistance: 150 BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Metrics BSc. Präsentation, 2016, M. Haberbusch

Nodecollisions & Linkcollisions Backbonelink Überlappungen Node Überlappungen BSc. Präsentation, 2016, M. Haberbusch

Linklength Deviation & Looproundness Friction: 0.95 Charge: -190 Chargedistance: 150 Friction: 0.65 Charge: -40 Chargedistance: 80 Linkcollisions: 0 Nodecollisions: 0 Linklength Deviations: 6.5 Loop Roundness: 0.77 Linkcollisions: 0 Nodecollisions: 0 Linklength Deviations: 1.48 Loop Roundness: 0.22 vs BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Looproundness BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Simulation BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Simulation BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Sampling Friction: [0.3 ; 0.95] with step size of 0.05 {0.3, 0.35, 0.4, …, 0.95} Charge: [-30 ; -200] with step size of 10 {-30, -40, -50, …, -200} Chargedistance: [30 ; 150] with step size of 10 {30, 40, 50, …, 150} Basic quantity: BQ = friction x charge x chargedistance |BQ| = 3276 combinations Random sampling: 1000 combinations, randomly picked out of the basic quantity BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Random Sampling Why? Clutter Reduction (Visualization) 1000 points vs 3276 points per structure Runtime Reduction (Simulation) Basic quantity: 3276*100*5s= 19days Random sample: 1000*100*5s= 6days File size reduction (Simulation Results) Literature G. Ellis, Random Sampling as a Clutter Reduction Technique to Facilitate Interactive Visualisation of Large Datasets (2008) BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Simulation 1000 parameter combinations per structure Currently 74 structures BSc. Präsentation, 2016, M. Haberbusch

Visualization Prototype BSc. Präsentation, 2016, M. Haberbusch

Advantages & Disadvantages BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Advantages Allows quickly determining optima via pareto view Identifying the distribution of simulation results Understanding the performance of specific parameter combinations on different structuresizes Identify input parameter trends Comparing the quality of the graph regarding two metrics Testing and comparing parameter combinations directly via side by side comparison of the drawn structures BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Disadvantages Just two metrics in the visualization Only three input parameters in the visualization No multiple point selection for better comparison No statement on the stability of the algorithm Performance (loadtimes) Difficult/unable to identify combinations in the input parameter scatter plot Red green blue coding problematic regarding colorblindness BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Improvements Parallel coordinates instead or in addition to input parameter scatter plots Eventually color coding of points in input parameter scatter plots Performance tuning (fine tuning of caching) Highlighted points placed on top of others Input parameter trends depending on structure size Two paretoscatter side by side to compare outcomes for two different structsizes BSc. Präsentation, 2016, M. Haberbusch

Thank you for your attention! BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Any Questions??? BSc. Präsentation, 2016, M. Haberbusch

Simulation Workflow (1) rsampling.php to generate N random parameter combinations rstructuress.php to generate N random RNA sequences with different length in FASTA-format RNAFold from Vienna RNA Package to generate the secondary structures startsimulation.sh to run the simulation BSc. Präsentation, 2016, M. Haberbusch

Simulation Workflow (2) combinations.json structures.struct CGCUUCAUAUAAUCCUAAUGACCUAU ((..((....)).(((....))).)) […] ./startsimulation.sh combinations.json structures.struct simulationresults BSc. Präsentation, 2016, M. Haberbusch

Simulation Workflow (3) Simulation output: Fileformat: JSON One file per structure containing all iterations BSc. Präsentation, 2016, M. Haberbusch

BSc. Präsentation, 2016, M. Haberbusch Simulation Dataset Structure of the dataset to load in the simulation BSc. Präsentation, 2016, M. Haberbusch