4 th Annual EPSRC e-science meeting The need for e-Science An industrial perspective Stephen Calvert – VP Cheminformatics GSKYike Guo – Imperial College.

Slides:



Advertisements
Similar presentations
Integrating ChemAxon technology into your End User Applications Java solutions for cheminformatics Ver. Mar., 2005.
Advertisements

Copyright Discovery Net Imperial College SARS Analysis on the Grid Discovery Net in Bioinformatics.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Discovery Workflow: (ServiceFlow) Programming the Grid Prof. Yike Guo Imperial College London.
The Role of Environmental Monitoring in the Green Economy Strategy K Nathan Hill March 2010.
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Principal Patent Analyst
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Agilent: The Company, The Myth, The Lengend. Agilent: Agilent Technologies Inc. (NYSE: A) is a world-wide, diverse technology company focused on expansion.
UK e-Science and the White Rose Grid Paul Townend Distributed Systems and Services Group Informatics Research Institute University of Leeds.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
CS2032 DATA WAREHOUSING AND DATA MINING
Topics in Computational Biology (COSI 230a) Pengyu Hong 09/02/2005.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Serono Science Scientific computing and high performance applications
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
1 The Discovery Informatics Framework Pat Rougeau President and CEO MDL Information Systems, Inc. Delivering the Integration Promise American Chemical.
CceHUB A Knowledge Discovery Environment for Cancer Care Engineering Research Ann Christine Catlin HUBzero Workshop November 7, 2008.
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Nurjana Technologies Company Presentation. Nurjana Technologies (NT) is a small business enterprise founded in 2012 and operating in Aerospace and Defence.
Yike Guo/Jiancheng Lin InforSense Ltd. 15 September 2015 Bioinformatics workflow integration.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
From GEANT to Grid empowered Research Infrastructures ANTONELLA KARLSON DG INFSO Research Infrastructures Grids Information Day 25 March 2003 From GEANT.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
Scalable Clustering on the Data Grid Patrick Wendel Moustafa Ghanem Yike Guo Discovery Net Department of Computing Imperial College,
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
Informatics Software and Services Jim Shaw BergenShaw International Integrate. Automate. Manage. Your company Logo In collaboration.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Page 1 SCAI Dr. Marc Zimmermann Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Grid-enabled drug discovery.
DAME: A Distributed Diagnostics Environment for Maintenance Duncan Russell University of Leeds.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
An overview of Bioinformatics. Cell and Central Dogma.
The User Perspective Michelle Osmond. The Research Challenge Molecular biology, biochemistry, plant biology, genetics, toxicology, chemistry, and more.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Distributed Data Mining in Discovery Net Dr. Moustafa Ghanem Department of Computing Imperial College London.
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Visual Knowledge ® Software Inc. Visual Knowledge BioCAD Case Study Parallels to Other Domains VK Semantic Web Server.
MDL Information Systems, Inc. Powering the Process of Invention Donna del Rey Director, Business Planning
High throughput biology data management and data intensive computing drivers George Michaels.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Biological Databases By: Komal Arora.
ATOM Accelerating Therapeutics for Opportunities in Medicine
Real-time BioPharmaceutical R&D
Bird of Feather Session
Presentation transcript:

4 th Annual EPSRC e-science meeting The need for e-Science An industrial perspective Stephen Calvert – VP Cheminformatics GSKYike Guo – Imperial College

4 th Annual EPSRC e-science meeting What is the “industrial” world like? Historically –Low volume cmpds/yr/chemist: 10,000s assay wells/yr –Low information diversity scientists generally dealt with limited types of data –reductionist approach limited information per experiment –Interpretation critical fro next step scientists required: –simple systems to assist in information monitoring –decision making resides with the scientist

4 th Annual EPSRC e-science meeting What is the “industrial” world like? What happened in the last 5 years? –“industrialisation” - Application of “principles of industrialisation” to drug discovery high volume –10,000 cmpd/yr/chemist/100+ million wells/yr –biology revolution Human genome –“system biology” – holistic view and interpretation –high content data --- images –multiple result types from each experiment – bio-markers, pathways –knowledge integration scientific discipline integration –scientists required: complex systems, algorithms, statistics……. decision making shared between systems and scientists “Informatics” essential – partnership not service

4 th Annual EPSRC e-science meeting How have we (IT) tackled the transition? Business as usual –problem centric view build applications integrate applications Educate scientists in the realms of IT –“Now I need to be an IT expert alongside chemistry, biology, genetics, robotics, engineering ……” –interesting time scale - generations Technology is our saviour! –client server, web services, java, C#, Corba, OO programming, extreme programming, grid computing, …..

4 th Annual EPSRC e-science meeting What are the results? chemistry “islands” of process & data –complex integration problem “spaghetti” joins our worlds - unsustainable - cost control with “IT” –mismatch in cycle time to change –engineered out serendipity –service role reversed infrastructure samples screening “library” design data

4 th Annual EPSRC e-science meeting How could we do it differently? result in: –handing control of science back to the scientist –match cycle times to change –Simplify how can we merge the 2 worlds? –physical, information

4 th Annual EPSRC e-science meeting Doodling in knowledge and experiment space no predefined steps capture what was done don’t restrict what can be done? don’t restrict the non-obvious Information Resources IC 50 Assay Exclusion Lists Structure Validation Other Assay... Q: - are these results real? Q: - what do I know about these compounds? Q: - what other data can I acquire? this is workflow – isn’t it? physical & information worlds merge

4 th Annual EPSRC e-science meeting Doodling in knowledge & experiment space Need access to world-class scientific algorithms and tools Need access to disparate data sources from multiple locations Intuitive & flexible GUI design/analysis Framework needs to be very generic Ability to construct a “just-in-time” application Need to serving the requirements of a varied user community –both in terms of scientific and technical know-how Capture and dissemination of “Best practice” within a creative environment to enhance efficiency company wide

4 th Annual EPSRC e-science meeting Discovery Net Overview Funding : –One of the Eight UK National e-Science Projects ( £ 2.4 M) Key Features: –Allow Scientists to Construct, Share and Execute Complex Knowledge Discovery Processes & Services –Allow Institutions to Manage and Utilise the Compositional Services as its Intellectual Properties Applications: –Life Science –Environmental Modelling –Geo-hazard Prediction Achievement : –For the First time Discovery Net Realises the Dynamic Construction of Compositional Services on GRID for Real Time Knowledge Discovery and Decision Making Goal : Constructing the World ’ s First Infrastructure for Global Wide Knowledge Discovery on the Grid of Web Services Using GRID Resources Scientific Information Scientific Discovery In Real Time Literature Databases Operational Data Images Instrument Data Real Time Data Integration Dynamic Application Integration Discovery Services Process Knowledge Management Workflow = Compositional Service

4 th Annual EPSRC e-science meeting Enterprise Wide Integrative Scientific Decision Making Platform with Discovery Net Workflow Constructing a ubiquitous workflow : by scientists –Integrate information resources/software applications cross-domain –Support innovation and capture the best practice of your scientific research Warehousing workflows: for scientists –Manage discovery processes within an organisation –Construct an enterprise process knowledge bank Deployment workflow: to scientists –Turn a workflows into reusable applications/services –Turn every scientist into a solution builder

4 th Annual EPSRC e-science meeting An Integrative Analysis Example: Interactive&Interactive Scientific Discovery with Workflow Relational data mining Text mining Spectrum data mining Chemical sequence data model Visualizing relational data clusters Visualizing multidimensional data Visualizing sequence data Visualizing pathway data Text mining visualization Visualizing cluster statistics Visualizing serial/spectrum data Decision tree model of metabonomic profile Chemical structure visualization Relational data mining Text mining Spectrum data mining Chemical data model

4 th Annual EPSRC e-science meeting Discovery Net Commercialisation Discovery Net Research CS : Workflow for Informatics on SOA Sensor : Sensor Data Processing and Mining Application : Life, Environmental and Geo-physical Sciences DeltaDot Research : Commercialisation (Imperial College Spin Out Companies): Workflow technology HT sensor processing KDE Informatics Platform Label Free HT bioSensors Life Science Industry

4 th Annual EPSRC e-science meeting library design - GSK Process of selecting the molecules I want to make from the universe of molecules Toolbox: scientific models, chemical handling, chemical properties, data access, statistics, data visualisation, …. Scientists can doodle in chemical space –Capture how scientists made decisions New algorithms, data sources added in < 1 hour

4 th Annual EPSRC e-science meeting The 2003 SARS outbreak KDE Example2 : SARS Genome Annotation Relationship between SARS and other virus Mutual regions identification Homology search against viral genome DB Annotation using Artemis and GenSense Gene prediction Phylogenetic analysis Exon prediction Splice site prediction Immunogenetics Multiple sequence alignment Microarray analysis Bibliographic databases Key word search GeneSense Ontology D-Net: Integration, interpretation, and discovery Epidemiological analysis Predicted genes SARS patients diagnosis Homology search against protein DB Homology search against motif DB Protein localization site prediction Protein interaction prediction Relationship between SARS virus and human receptors prediction Classification and secondary structure prediction Bibliographic databases Genbank Annotation using Artemis and GenSense China SARS Virtual Lab based on Discovery Net Achievement: Dynamic Construction of Compositional Services:  Rapid construction of applications via composition of existing web services using workflow.  Instant deployment of analytical workflows as new web services with resource mapping.  Integrated workflow, provenance and service management  Collaborative construction of workflows by large numbers of researchers Requirements:  Rapid constructing and sharing mission critical discovery services  Integration of diverse bioinformatics applications  Support collaborative research between geographically distributed researchers  Deploying services as easy to use tools for real time decision making

4 th Annual EPSRC e-science meeting Compositional Services for SARS Mutation Analysis  50 data resource  > 200 software applications and services Designed on top of the web service environment Used by more than 200 scientists Result published in >

4 th Annual EPSRC e-science meeting Future Challenge: GSK- InforSense & IC e-Science Collaboration Workflow Fusion : Applying advanced performance programming technology for dynamic optimization of workflow execution Workflow Abstraction : Investigating abstraction mechanisms for building workflow hierarchy and higher order composition forms Dynamic Service Composition: Investigating service ontology for dynamic composing services with workflow Workflow Metadata Model : Building up a generic meta data model for scientific workflow management and workflow warehousing Man – machine interface – free scientists from IT speak

4 th Annual EPSRC e-science meeting How can you help? encourage focused research in key issues SCIENTISTS facing in industries catalyst the joint work in these focused fields between academics, industry and commercial software vendors facilitate the solution-oriented communication between computer scientists and domain scientists in both academic and industry

4 th Annual EPSRC e-science meeting e-Science A politician's view: ‘ [The e-Science platform ] intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information. ’ Tony Blair A Scientist ’ s View: [The e-Science platform ] should help me to do my scientific research free from the complexity of IT