Download presentation
Presentation is loading. Please wait.
1
Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Paul.Watson@ncl.ac.uk Microsoft Cloud Futures Conference, Redmond, 8 th April 2010 The team: David Leahy, Jacek Cala, Hugo Hiden, Dominic Searson, Vladimir Sykora, Martyn Taylor, Simon Woodman With thanks to: -Microsoft External Research for their financial support for Project Junior -Christophe Poulain, Savas Parastatidis
2
Chemists want to know: Q1. What are the properties of this molecule? Toxicity Solubility Biological Activity Q2. What molecule would have aqueous solubility of 0.1 μg/mL?
3
Answering the Question by performing experiments..... time consuming, expensive, ethical Issues
4
Quantitative Structure Activity Relationship - predict properties based on similar molecules An alternative to experimentation: QSAR Activity ≈ f( ) quantifiable structural attributes, e.g. #atoms logp shape.....
5
New/ Improved Models New Data or Model-Builders Data Model- Builders Model Generation Models Generating the models - Discovery Bus (Leahy et al) www.openqsar.com
6
Selected Descriptors + Responses Separate Training & Test Data Chemical Structures & their Activities Calculate Descriptors from Structures Training Data Combine Descriptors Descriptors + Responses Filter Descriptors Combined Descriptors + Responses Multiple Linear Regression Neural Network Partial Least Squares Classification Trees..... Build & Test Models Independently Select Best Models Add to Model Database Test Data
7
Increasing amounts of data for model building... CHEMBL : data on 622,824 compounds, collected from 33,956 publications WOMBAT-PK:data on 1230 compounds, for over 13,000 clinical measurements WOMBAT :data on 251,560 structures, for over 1,966 targets All contain structure information & numerical activity data More models Better models Computationally expensive: 5 years for new datasets on existing server
8
JUNIOR Project Aim Use Azure to generate models in weeks not years.... using as much of the available data as possible.... make models available on www.openqsar.com... so that researchers can generate predictions for their own molecules
9
Selected Descriptors + Responses Separate Training & Test Data Chemical Structures & their Activities Calculate Descriptors from Structures Training Data Combine Descriptors Descriptors + Responses Filter Descriptors Combined Descriptors + Responses Multiple Linear Regression Neural Network Partial Least Squares Classification Trees..... Build & Test Models Independently Select Best Models Add to Model Database Test Data Potential for concurrency...
10
Approach avoid rewriting all existing Discovery Bus software move existing Discovery Bus to Amazon Cloud – without parallelisation move critical tasks to run concurrently on Azure base solution around e-Science Central....
11
Clouds to the rescue? Building scalable, dependable, science applications is still hard..... e-Science Central – Science Cloud Platform
12
Science Cloud Options Cloud Infrastructure: Storage & Compute Science Cloud Platform Science App 1 Science App 1.... Science App n Science App n Users Cloud Infrastructure: Storage & Compute Cloud Infrastructure: Storage & Compute Science App 1 Science App 1.... Science App n Science App n Users
13
Cloud Infrastructure: Storage & Compute Science Cloud Platform Science App 1 Science App 1.... Science App n Science App n Users e-Science Central Science Cloud Platform for developers Science as a Service for users
14
What should the Science Cloud Platform Include? Store Analyse Automate Share Data (instruments, experimental data, sensors...) Identify the common needs of our e-Science Users
15
Science Cloud Platform App.... Workflow Enactment App API Social Networking Security Processing e-Science Central Storage App Analysis Services Cloud Infrastructure Provenance Metadata
19
Workflow Enactment App API Social Networking Security Processing e-Science Central Storage Analysis Services Azure Provenance Metadata Discovery Bus Planner Amazon
20
Discovery Bus invokes e- Science Central Workflow via API Workflow decomposed to Message Plan 1 1 2 2 3 3 Temporary workflow storage assigned, Message Plan queued for execution. Workflow temporary storage 4 4 Messages sent in sequence Call Message Response Message Internal Service Azure Service NFS HTTP Call Message Response Message RMI / JMS HTTP Post 5 5 Results data stored in e- Science Central folder Workflow Execution Completes 5 5 Discovery Bus notified with results Message Plan
21
Task Web Node Worker Node Queue e-Science Central Azure Blob Storage Results
22
Running across up to 100 Azure nodes Azure utilisation increasing (average ~60% over runs) Moving more admin tasks to Azure / e-Science Central – model validation – co-ordination Current Status – Work in Progress
23
CPU Utilization
24
Summary Discovery Bus exemplifies a good Cloud pattern –large, variable, bursty requirements –proposal to apply to software verification clouds do NOT make it easier to build complex, scalable, dependable distributed systems –we need higher-level “Science Cloud Platforms” –e-Science Central is our attempt at this using the Azure Cloud we have a scalable system that can handle large new datasets the models are being made freely available –www.openqsar.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.