Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Slides:



Advertisements
Similar presentations
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
Advertisements

Beowulf Supercomputer System Lee, Jung won CS843.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Combining atomic-level Molecular Dynamics with coarse-grained Monte-Carlo dynamics Andrzej Koliński Laboratory of Theory of Biopolymers, Faculty of Chemistry,
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
The Protein Folding Problem David van der Spoel Dept. of Cell & Mol. Biology Uppsala, Sweden
Novell Server Linux vs. windows server 2008 By: Gabe Miller.
IT Infrastructure: Software September 18, LEARNING GOALS Identify the different types of systems software. Explain the main functions of operating.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Beowulf Cluster Computing Each Computer in the cluster is equipped with: – Intel Core 2 Duo 6400 Processor(Master: Core 2 Duo 6700) – 2 Gigabytes of DDR.
1 SOFTWARE TECHNOLOGIES BUS Abdou Illia, Spring 2007 (Week 2, Thursday 1/18/2007)
Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics)
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Hardening Linux for Enterprise Applications Peter Knaggs & Xiaoping Li Oracle Corporation Sunil Mahale Network Appliance Session id:
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
F1031 COMPUTER HARDWARE CLASSES OF COMPUTER. Classes of computer Mainframe Minicomputer Microcomputer Portable is a high-performance computer used for.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
KYLIN-I 麒麟一号 High-Performance Computing Cluster Institute for Fusion Theory and Simulation, Zhejiang University
Abstract Load balancing in the cloud computing environment has an important impact on the performance. Good load balancing makes cloud computing more.
Modelling, comparison, and analysis of proteomes Ram Samudrala University of Washington.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Modelling proteomes An integrated computational framework for systems biology research Ram Samudrala University of Washington How does the genome of an.
EGO Computing Center site report EGO - Via E. Amaldi S. Stefano a Macerata - Cascina (PI) | Stefano Cortese INFN Computing Workshop –
Software Software consists of the instructions issued to the computer to perform specific tasks. –The software on a computer system refers to the programs.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Introduction Sample Projects Resources Summary Future Plans Bioinformatics Support Information Session Karsten Hokamp TCD 3rd October, 2007.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
An Integrated Computational Framework for Systems Biology Ram Samudrala University of Washington How does the genome of an organism specify its behaviour.
BES III Computing at The University of Minnesota Dr. Alexander Scott.
Computational engineering of bionanostructures Ram Samudrala University of Washington How can we analyse, design, & engineer peptides capable of specific.
The DCS lab. Computer infrastructure Peter Chochula.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Queensland University of Technology CRICOS No J VMware as implemented by the ITS department, QUT Scott Brewster 7 December 2006.
Modelling proteomes Ram Samudrala Department of Microbiology How does the genome of an organism specify its behaviour and characteristics?
Where we are... Public Interconnects: Number of peers on switch ~60 Aggregate bandwidth through switching fabric ~530mb/s average - ~680mb/s peak.
I NTRODUCTION TO N ETWORK A DMINISTRATION. W HAT IS A N ETWORK ? A network is a group of computers connected to each other to share information. Networks.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.
Discovery of Therapeutics to Improve Quality of Life Ram Samudrala University of Washington.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Modelling proteomes Ram Samudrala University of Washington.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
Modelling proteomes: Application to understanding HIV disease progression Ram Samudrala Department of Microbiology University of Washington How does the.
Structure/function studies of HIV proteins HIV gp120 V3 loop modelling using de novo approaches HIV protease-inhibitor binding energy prediction.
Modelling genome structure and function Ram Samudrala University of Washington.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
Modelling genome structure and function - a practical approach Ram Samudrala University of Washington.
G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.
Matt Lemons Nate Mayotte
Hardware Components By Charlie Leivers.
University of Washington
Modelling the rice proteome
University of Washington
INFO 344 Web Tools And Development
How does the genome of an organism
University of Washington
Designing a PC Farm to Simultaneously Process Separate Computations Through Different Network Topologies Patrick Dreher MIT.
QMUL Site Report by Dave Kant HEPSYSMAN Meeting /09/2019
Presentation transcript:

Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Examples of biological problems Protein structure prediction/docking simulations - need to run different trajectories that sometimes talk with each other Molecular dynamics simulations - need more cohesive parallelisation Polarisable force fields - need true parallelisation Bioinformatics searches/exploration - trivially parallelisable

Computational issues Need efficient methods to start/stop jobs Need load/balancing queuing system Need fast communications at times Need stability (months/years uptimes) Need low maintainance/management overhead Need low installation overhead Needs to be cheap!

Hardware and operating system 256 AMD and Intel CPUs (1-2.5 GHz) GB RAM, GB HD, dual processor MBs 100Mbps ethernet connectivity for 64 processor sets White boxes are good but use up space – 1u racks ideal Minimal Linux installation – create clone “CD” – copy on all machines

Our solution No single solution – user implements their own Completely decentralised Analyse problem and determine parallelisable parts Implementation specific to problem Use local scratch space for computation Redundant storage of data for faster access Limit problem space to specific problems

Problem specific implementation MCSA/GA: socket-based communication of trajectories; multiple trajectories on different CPUs Docking: sample different ligands/regions of the protein on different CPUs MD: Pairwise force-fields are additive PFF: ? Bioinformatics: trivial parallelisation; communication by disk

Semi-exhaustive segment-based folding EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK generate fragments from database 14-state ,  model …… minimise monte carlo with simulated annealing conformational space annealing, GA …… filter all-atom pairwise interactions, bad contacts compactness, secondary structure

T170/sfrp3 – 4.8 Å for all 69 aa Ab initio prediction at CASP

Comparative modelling at CASP T182 – 1.0 Å (249 aa; 41% id)

Prediction of SARS CoV proteinase inhibitors Ekachai Jenwitheesuk

Bioverse – S. typhimurium protein-protein interaction network Jason McDermott

Bioverse – H. sapiens protein-protein interaction network Jason McDermott

Future directions Network connection with multiple ethernet cards based on traffic analysis Gigabit ethernet (switches are still expensive) Better network filesystems