Serono Science Scientific computing and high performance applications

Slides:



Advertisements
Similar presentations
SALSA HPC Group School of Informatics and Computing Indiana University.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Wrapup. NHGRI strategic plan What does the NIH think genomics should be for the next 10 years? [Nature, Feb. 2011]
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Gene expression analysis summary Where are we now?
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
GeneData Solutions in-silico Swapna Annavarapu SoCalBSI CalState, LA.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
OMICS Group Contact us at: OMICS Group International through its Open Access Initiative is committed to make genuine and.
Applications of protomic Presented By: Muhammad Rizwan Roll no: Department of Bioinformatics.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Molecular Library and Imaging Francis Collins, NHGRI Tom Insel, NIMH Rod Pettigrew, NIBIB Building Blocks and Pathways Francis Collins,NHGRI Richard Hodes,
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
1 The Discovery Informatics Framework Pat Rougeau President and CEO MDL Information Systems, Inc. Delivering the Integration Promise American Chemical.
Combinatorial Chemistry and Library Design
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Institute of Systems Biology (INBIOSIS)/ School of Biosciences & Biotechnology (Faculty of Science & Technology), Bioinformatics Development in Malaysia.
DOE Genomics: GTL Program IT Infrastructure Needs for Systems Biology David G. Thomassen Office of Biological and Environmental Research DOE Office of.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
CS 790 – Bioinformatics Introduction and overview.
TOPICS IN (NANO) BIOTECHNOLOGY
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
Finish up array applications Move on to proteomics Protein microarrays.
1 G. P. S. Raghava Institute of Microbial Technology, Chandigarh.
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Page 1 SCAI Dr. Marc Zimmermann Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Grid-enabled drug discovery.
Harbin Institute of Technology Computer Science and Bioinformatics Wang Yadong Second US-China Computer Science Leadership Summit.
Bioinformatics Core Facility Guglielmo Roma January 2011.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
BioPaths-Catalyze Drug Discovery, Development and Clinical Research
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
Central dogma: the story of life RNA DNA Protein.
1 Advanced Collaborative Environments Kris Brown Carmel Conaty Johnny Medina.
An overview of Bioinformatics. Cell and Central Dogma.
Bioinformatics and Computational Biology
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
BIOINFOGRID: Bioinformatics Grid Application for life science MILANESI, Luciano National Research Council Institute of.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
High throughput biology data management and data intensive computing drivers George Michaels.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bioinformatics activity Christophe BLANCHET.
Milanesi Luciano Catania, Italy 13/03/2007 Bioinformatics challenges in European projects in Grid. Milanesi Luciano National Research Council Institute.
신기술 접목에 의한 신약개발의 발전전망과 전략 LGCI 생명과학 기술원. Confidential LGCI Life Science R&D 새 시대 – Post Genomic Era Genome count ‘The genomes of various species including.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Ingenuity Pathway Analysis Alex Pico. Description "IPA is a software application that enables researchers to analyze and understand the complex biological.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
BME435 BIOINFORMATICS.
Joslynn Lee – Data Science Educator
ATOM Accelerating Therapeutics for Opportunities in Medicine
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
Nancy Baker SILS Bioinformatics Seminar January 21, 2004
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
The increasing availability of quantitative biological data from the human genome project, coupled with advances in instrumentation, reagents, methodologies,
Presentation transcript:

Serono Science Scientific computing and high performance applications

Research Computing in Serono Hardware environment High performance computing applications

Drug development pipeline Target discovery Target validation Screening + H2L PCD Phase I/II Phase III/IV Marketing Proteomics Genomics Chemistry Human Genetics Biostatistics Transcriptomics Mouse genetics Cell biology Pharmacology Sciences WGS uArrays Protein arrays siRNA caliper combichem MS Taqman imaging cellomics x-ray RMN Y2H HTS Technologies genomes Transcript map SNPs Protein structure Patient data protein map interactions screening data phenotype images haplotype Data types pathways Structure/activity

Research Computing Vision & Missions Use in-silico technologies to help identify and progress therapeutic proteins and small molecules that will successfully feed the development pipeline By: Research Knowledge Management Delivering across Research an integrated information environment that puts scientific data, information and knowledge at the scientist’s fingertips Computational Life Science Developing cutting-edge scientific applications enabling in-silico drug discovery and driving Serono’s competitive advantage Advanced Data Analysis Providing advanced and pervasive data analysis competencies to make sense of high-throughput and complex data. Research IT environment Providing the computing and communication infrastructure to deliver the vision

Research computing activities 1. Data processing Technology driven 2. Predictions and simulation 3. Data analysis Interpretation 1s -1s m 4. Data management

Advanced Data Analysis - Issues Data complexity Amount of data Analysis cannot be performed in silos – we need information systems able to correlate data available from all sorts of experimental information (Genome scans, DGE, RNAi, Cell assays, proteomics, interactions, phenotypes) 2000 2002 2004 2001-3 High content cell assays, genomic sequence, QSAR High density microarrays as a discovery platform Genome scan data – 100’000 SNP’s, hundreds of patients, several diseases Biomarkers identification through proteomics and trasncriptomics Compendium, Virtual Combinatorial Library Multidimensional decision making Microarrays: complex data (time series, complex tissues deconvolution, disease models, full transcriptome) 2004-7

Grid for the life sciences – differences Physics Biology Theory « complete » Inexistent or imprecise Level of abstraction (model) Single Multiple Volume Very high Low-medium Data complexity Low

User-friendly interfaces (Web based) Generic End-user Access in silico generation tools e.g. Text mining, Data Analysis Corporate Database Core Integrated Oracle-Based Systems Drill-down E-notebook Publish LIMS QC LIMS QC LIMS QC Specialized, complex power-user interfaces

HPC Hardware environment SGI Origin 3900 64 proc (cc-numa), IRIX, 128 GB SGI Altix 3700 BX2 16 proc Timelogic Decypher FPGA bioinformatics accelerator x 4 SGI Origin 3900 32 proc Linux Xeon cluster, 50 proc 10 TB CXFS SAN (Geneva only) Computational chemistry (docking, combichem, compendium, pharmacophore, structure resolution) Bioinformatics (public domain tools, sequence databases, peptide identification, in-silico modeling) Blast, SW, profiles Same as above, Boston, Paris Distributed data storage

High performance computing applications in Serono today Large scale sequence to sequence comparisons Genome wide analysis (microRNA, focused gene prediction, gw profiles, etc.) Sequence data base monitoring Gene index and data mapping Large scale proteomics (peptide identification) Virtual screening In-silico biology

Smart is better than More In combinatorial chemistry design, one scaffold and 4 groups of 800 reagents each generate a library of 320 billions virtual compounds Virtual Combinatorial Database Enumerated substracture search would take years of CPU time and 1 petabyte of storage A proprietary non-enumerated search retrieves hits in just a few seconds Fast pre-filtering of compounds reduces amount of compounds for time-consuming docking studies Useful for new compound acquisition, known protein target structure, not for primary screen (replating) Usual size of virtual screens in Serono: ~1000 compounds Virtual Screening

Future grid applications Large scale in-silico modeling Protein-protein interaction QM-based, dynamic virtual screening Data grids Imaging

Past grid evaluations (corporate PC idle cycles) High deployment costs – IT resources Concern about availability of PC resources – habits and procedures Foreseen replacement of desktop by even less available laptops Modification of software to run effectively on the grid Previous studies show that a large corporate grid of 1000 desktops is not more efficient than a 64 proc dedicated cluster (Novartis) The in-house idle-cycle grid model is not efficient

Issues in the pharma industry IP considerations Competitive intelligence Security policies Obsession with proprietary data and know-how Is the current model of « all in-house » sustainable? Distributed (grid-enabled) public domain bioinformatics services will anyway become pervasive and will superceed capabilities available in-house