Serono Science Scientific computing and high performance applications

Serono Science Scientific computing and high performance applications

Research Computing in Serono
Hardware environment High performance computing applications

Drug development pipeline
Target discovery Target validation Screening + H2L PCD Phase I/II Phase III/IV Marketing Proteomics Genomics Chemistry Human Genetics Biostatistics Transcriptomics Mouse genetics Cell biology Pharmacology Sciences WGS uArrays Protein arrays siRNA caliper combichem MS Taqman imaging cellomics x-ray RMN Y2H HTS Technologies genomes Transcript map SNPs Protein structure Patient data protein map interactions screening data phenotype images haplotype Data types pathways Structure/activity

Research Computing Vision & Missions
Use in-silico technologies to help identify and progress therapeutic proteins and small molecules that will successfully feed the development pipeline By: Research Knowledge Management Delivering across Research an integrated information environment that puts scientific data, information and knowledge at the scientist’s fingertips Computational Life Science Developing cutting-edge scientific applications enabling in-silico drug discovery and driving Serono’s competitive advantage Advanced Data Analysis Providing advanced and pervasive data analysis competencies to make sense of high-throughput and complex data. Research IT environment Providing the computing and communication infrastructure to deliver the vision

Research computing activities
1. Data processing Technology driven 2. Predictions and simulation 3. Data analysis Interpretation 1s -1s m 4. Data management

Advanced Data Analysis - Issues
Data complexity Amount of data Analysis cannot be performed in silos – we need information systems able to correlate data available from all sorts of experimental information (Genome scans, DGE, RNAi, Cell assays, proteomics, interactions, phenotypes) 2000 2002 2004 2001-3 High content cell assays, genomic sequence, QSAR High density microarrays as a discovery platform Genome scan data – 100’000 SNP’s, hundreds of patients, several diseases Biomarkers identification through proteomics and trasncriptomics Compendium, Virtual Combinatorial Library Multidimensional decision making Microarrays: complex data (time series, complex tissues deconvolution, disease models, full transcriptome) 2004-7

Grid for the life sciences – differences
Physics Biology Theory « complete » Inexistent or imprecise Level of abstraction (model) Single Multiple Volume Very high Low-medium Data complexity Low

User-friendly interfaces (Web based)
Generic End-user Access in silico generation tools e.g. Text mining, Data Analysis Corporate Database Core Integrated Oracle-Based Systems Drill-down E-notebook Publish LIMS QC LIMS QC LIMS QC Specialized, complex power-user interfaces

HPC Hardware environment
SGI Origin proc (cc-numa), IRIX, 128 GB SGI Altix 3700 BX2 16 proc Timelogic Decypher FPGA bioinformatics accelerator x 4 SGI Origin proc Linux Xeon cluster, 50 proc 10 TB CXFS SAN (Geneva only) Computational chemistry (docking, combichem, compendium, pharmacophore, structure resolution) Bioinformatics (public domain tools, sequence databases, peptide identification, in-silico modeling) Blast, SW, profiles Same as above, Boston, Paris Distributed data storage

High performance computing applications in Serono today
Large scale sequence to sequence comparisons Genome wide analysis (microRNA, focused gene prediction, gw profiles, etc.) Sequence data base monitoring Gene index and data mapping Large scale proteomics (peptide identification) Virtual screening In-silico biology

Smart is better than More
In combinatorial chemistry design, one scaffold and 4 groups of 800 reagents each generate a library of 320 billions virtual compounds Virtual Combinatorial Database Enumerated substracture search would take years of CPU time and 1 petabyte of storage A proprietary non-enumerated search retrieves hits in just a few seconds Fast pre-filtering of compounds reduces amount of compounds for time-consuming docking studies Useful for new compound acquisition, known protein target structure, not for primary screen (replating) Usual size of virtual screens in Serono: ~1000 compounds Virtual Screening

Future grid applications
Large scale in-silico modeling Protein-protein interaction QM-based, dynamic virtual screening Data grids Imaging

Past grid evaluations (corporate PC idle cycles)
High deployment costs – IT resources Concern about availability of PC resources – habits and procedures Foreseen replacement of desktop by even less available laptops Modification of software to run effectively on the grid Previous studies show that a large corporate grid of 1000 desktops is not more efficient than a 64 proc dedicated cluster (Novartis) The in-house idle-cycle grid model is not efficient

Issues in the pharma industry
IP considerations Competitive intelligence Security policies Obsession with proprietary data and know-how Is the current model of « all in-house » sustainable? Distributed (grid-enabled) public domain bioinformatics services will anyway become pervasive and will superceed capabilities available in-house

Serono Science Scientific computing and high performance applications

Similar presentations

Presentation on theme: "Serono Science Scientific computing and high performance applications"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Serono Science Scientific computing and high performance applications

Similar presentations

Presentation on theme: "Serono Science Scientific computing and high performance applications"— Presentation transcript:

Similar presentations

About project

Feedback