From Bio-Informatics towards e-BioScience L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit.

Slides:

Advertisements

Similar presentations

VL-e generic services: Scientific visualization techniques (volume rendering, surface extraction) Image processing algorithms (registration, segmentation)

Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.

Wrapup. NHGRI strategic plan What does the NIH think genomics should be for the next 10 years? [Nature, Feb. 2011]

Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏

Virtual Laboratory for e-Science (VL-e) Henri Bal Department of Computer Science Vrije Universiteit Amsterdam vrije Universiteit.

© 2006 IBM Corporation IBM Software Group Relevance of Service Orientated Architecture to an Academic Infrastructure Gareth Greenwood, e-learning Evangelist,

VL-e PoC Architecture and the VL-e Integration Team David Groep VL-e work shop, April 7 th, 2006.

Structural Genomics – an example of transdisciplinary research at Stanford Goal of structural and functional genomics is to determine and analyze all possible.

Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.

IBU 'A bioinformatic Problem Solving Environment in the e-BioLab' VL-e Sub Program 1.5: Bioinformatics Timo Breit Micro-Array Department & Integrative.

1 genSpace: Community- Driven Knowledge Sharing for Biological Scientists Gail Kaiser’s Programming Systems Lab Columbia University Computer Science.

Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.

E-Science and Grid The VL-e approach L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit.

Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.

Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.

Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.

Virtual Lab AMsterdam VLAM-G Project VLAM-G developers team Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit.

BIOCMS: Resource Integration and Web Application Framework for Bioinformatics DHUNDY R BASTOLA †, *, ANIL KHADKA †, MOHAMMAD SHAFIULLAH † AND HESHAM ALI.

Medical Informatics Basics

9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

ICP ICT and Company Practise College 1 Dinsdag 3 april 2007 Geleyn Meijer.

Beyond the Human Genome Project Future goals and projects based on findings from the HGP.

Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.

Introduction to Pharmacoinformatics

GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.

DOE Resources & Facilities for Biological Discovery : Realizing the Potential Presentation to the BERAC 25 April 2002.

Medical Informatics Basics

Bioinformatics and medicine: Are we meeting the challenge?

Medical Informatics Basics Lection 1 Associated professor Andriy Semenets Department of Medical Informatics.

Integrated Biomedical Information for Better Health Workprogramme Call 4 IST Conference- Networking Session.

E-science in the Netherlands Maria Heijne TU Delft Library Director / Chair Consortium of University Libraries and National Library.

Fundamentals of Information Systems, Third Edition2 Principles and Learning Objectives Artificial intelligence systems form a broad and diverse set of.

Using the VL-E Proof of Concept Environment Connecting Users to the e-Science Infrastructure David Groep, NIKHEF.

Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.

Facilitate Scientific Data Sharing by Sharing Informatics Tools and Standards Belinda Seto and James Luo National Institute of Biomedical Imaging and Bioengineering.

Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.

Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.

Harbin Institute of Technology Computer Science and Bioinformatics Wang Yadong Second US-China Computer Science Leadership Summit.

Bioinformatics Core Facility Guglielmo Roma January 2011.

Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.

Systems Biology ___ Toward System-level Understanding of Biological Systems Hou-Haifeng.

The Scaling and Validation Programme PoC David Groep & vle-pfour-team VL-e Workshop NIKHEF SARA LogicaCMG IBM.

ICT infrastructure for Science: e-Science developments Henri Bal Vrije Universiteit Amsterdam.

FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.

ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.

BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.

Virtual Lab for e-Science Towards a new Science Paradigm.

Chapter 6 CASE Tools Software Engineering Chapter 6-- CASE TOOLS

Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.

Scaling and Validation Programme David Groep & vle-pfour-team VL-e SP Meeting NIKHEF SARA LogicaCMG IBM.

Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.

A Report on CAMDA’01 Biointelligence Lab School of Computer Science and Engineering Seoul National University Kyu-Baek Hwang and Jeong-Ho Chang.

26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.

Virtual Lab AMsterdam VLAMsterdam Abstract Machine Toolbox A.S.Z. Belloum, Z.W. Hendrikse, E.C. Kaletas, H. Afsarmanesh and L.O. Hertzberger Computer Architecture.

High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.

High throughput biology data management and data intensive computing drivers George Michaels.

1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.

VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.

Presenter: Bradley Green.  What is Bioinformatics?  Brief History of Bioinformatics  Development  Computer Science and Bioinformatics  Current Applications.

SCI-BUS Sílvia Delgado Olabarriaga e-BioScience Group Bioinformatics Laboratory Dept of Epidemiology, Biostatistics and Bioinformatics.

Rafael Jimenez ELIXIR CTO BioMedBridges Life science requirements from e-infrastructure: initial results from a joint BioMedBridges workshop Stephanie.

Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.

Business process management (BPM)

Clouds , Grids and Clusters

CCNT Lab of Zhejiang University

Making “Open Data” Work: Challenges for Data Integration in Genomics Research

Business process management (BPM)

Model-Driven Analysis Frameworks for Embedded Systems

VL-e PoC Architecture and the VL-e Integration Team

Data Warehousing and Data Mining

Presentation transcript:

From Bio-Informatics towards e-BioScience L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam

Background information experimental sciences There is a tendency to look ever deeper in: Matter e.g. Physics Universe e.g. Astronomy Life e.g. Life sciences Instrumental consequences are increase in detector: Resolution & sensitivity Automation & robotization Therefore experiments change in nature & become increasingly more complex

Impact in the life sciences Impact of high throughput methods e.g. Omics experimentation genome ===> genomics

New technologies in Life Sciences research University of Amsterdam cell GenomicsTranscriptomicsProteomicsMetabolomics RNA protein metabolites DNA Methodology/ Technology

Omics impact

Impact in the life sciences Impact of high throughput methods e.g. Omics experimentation genome ===> genomics Instrumentation being used in omics experimentation: Transcriptomics via among others; micro-arrays Proteomics via among others; Mass Spectroscopy (MS) Metabolomics via among others; MS & Nuclear Magnetic Resonance (NMR)

Results in Paradigm shift in Life sciences Past experiments where hypothesis driven Evaluate hypothesis Complement existing knowledge Present experiments are data driven Discover knowledge from large amounts of data

Life sciences research: from gene to function Gene DNA NH 2 COOH Protein Genome-wide micro-array analysis “High-throughput” protein-analysis mRNA AAAAAAAAA function-2 function-1 function-n Whole-genome sequence projects Protein function: -prediction by bioinformatics -proof by laboratory research cell nucleus Gene expression by RNA synthesis mRNA translation by protein synthesis

Developments towards Bio- informatics & e-Science Experiments become increasingly more complex Driven by increase of detector developments Results in an increase in amount and complexity of data Something has to be done to harness this development Bio-informatics to translate data into useful biological, medical, pharmaceutical & agricultural knowledge

The what of Bioinformatics Bioinformatics is redefining rules and scientific approaches, resulting in the ‘new biology’. Within this new paradigm the traditional scientific boundaries are blurred, leaving no clear line between ‘dry or computational’ and ‘wet-based’ approaches

Role of bioinformatics cell Data generation/validation Data integration/fusion Data usage/user interfacing GenomicsTranscriptomicsProteomicsMetabolomics Integrative/System Biology RNA protein metabolites DNA methodology Bioinformatics

Two sides of Bioinformatics The scientific responsibility to develop the underlying computational concepts and models to convert complex biological data into useful biological and chemical knowledge Technological responsibility to manage and integrate huge amounts of heterogeneous data sources from high throughput experimentation Need for e-Science support

Developments towards Bio- informatics & e-Science Experiments become increasingly more complex Driven by increase of detector developments Results in an increase in amount and complexity of data Something has to be done to harness this development Bio-informatics to translate data into useful biological, medical, pharmaceutical & agricultural knowledge Virtualization of experimental resources enabling sharing & leading to e-BioScience

Life science/genomics research consortia and industry Grid infrastructure Bioinformatics e-Science & research infrastructure e-Bioscience and life science innovation domain e-Bioscience & research infrastructure Life science application areas Generic e-Science ICT development and support Network infrastructure and computing capacity

Why e-BioScience There is an increasing necessity to use results from other scientist e.g. share data & information :

Re-use and sharing of biological data (2) Information content of omics data extremely high, however, Data subject to noise, biological and technical variation How to induce biological principles from these genome-wide data sets? Approach: develop methodology for “reverse engineering” of biological mechanisms. Biggest challenge in bioinformatics today. Need for external data sources for in-silico experimentation Two practices for re-use and sharing of data Collectively compile huge amounts of relevant data and make these available to the community. Examples: Bio-banking, compendia (e.g. NIH’s Affymetrix SNP repository). Re-use information from different and diverse experiments to discover phenomena

Re-use and sharing of biological data (2) Compendium example: re-use and sharing of Huntington data Datasets: 404 Affymetrix Gene chips of measurements on extremely rare human brain samples (Hodges et al. Hum. Mol. Genetics, 2006) Available from NCBI GEO database (MIAME) Goal: find genes involved in Huntington’s Disease Approach: Reanalyze gene expression data Combine genotype data and clinical data (e.g. using SigWin) Extend experiments with own ChIP on chip data

Resource Identification software Repository of relevant meta-information from: Data warehouses e.g. GEO, ArrayExpress, Protein Interaction database Literature (Mining of PubMed using Collexis) Information resources specialized on diseases, genes, proteins, e.g. OMIM, GenBank, Ensembl

Why e-BioScience There is an increasing necessity to use results from other scientist e.g. share data & information: Data repositories  Cohort studies in  Bio-banking  Biodiversity Expensive and complex equipment  Mass Spectroscopy  MRI  Other

Problems for the realization of e-BioScience Life Science field is still in an early stage of development and: First principles are not understood at all As a consequence experimental methods are not well established and will not for a time to come Because of the new forms of omics instrumentation there is a need for design for experimentation methods Lack correct logging of conditions under which experiments are done is production of large amounts of data that request among others statistical techniques for interpretation As a consequence results are multi interpretable

Problems for the realization of e-BioScience Problems for bioinformatics & e-Bioscience: Rationalisation at this early stage is almost impossible Pre- standardization & standardization almost non existent Where there are standards they are inadequate because multi interpretable (like MIAME for micro-array’s) In addition there are commercial end-user products that are difficult to integrate Users lack the training necessary to handle these complex experimental situation Only possible solution is to create a flexible experimentation environment for the end-users

Role of ICT in e-BioScience e-Science is a new form of science methodology complementing theoretical and experimental sciences. It is using generic methods and an ICT infrastructure to support this methodology. Web services as a paradigm/way of using/accessing information Grid is as a method of accessing & sharing computing resources by virtualization What is missing in e-BioScience: Connection between biological problem & e-Bioscience User oriented tools that can be re-used and extended General model of ICT based integration Semantic support  ontology’s and semantic support for workflows to make user knowledge explicit

Consequences for bioinformatics & e-BioScience Considerable amounts of experimentation is necessary before a well established methodology will emerge The VL-e approach might be a good model & produces an environment in which the necessary experimentation can be realized

Enhancing the scientific process: e-BioLab Problem domain experts can focus on the biology because they are shielded from technical details by e-scientists. Viewpoints on the research question and the data semi-instantaneously can be expressed and visualized. Ideas and analyses can be retained and documented. Facilities for remote collaboration are present*. * Rauwerda et al., 2 nd IEEE International Conference on e-Science and Grid Computing (submitted) Readily accessible data + models data mining Small integration experiments + integration methods Easy visualization Vague results Basic model of problem area e-BioOperator Biologists e-BioScientist Motivation: Interacting with the problem domain requires an environment in which the domain can be opened up and ideas, hunches and notions on the data and crude models of the biology can be visualized A tangible space in which biologists, aided by e-scientists, will have the full potential of VL-e at their disposal. An actual laboratory in which: Problem domain experts (biologists, medical doctors) and scientists from enabling disciplines jointly and in a creative manner work on the analyses and design of –omics experiments. Basic concept of e-BioLab:

Enhancing the scientific process: e-BioLab (2) Realization: Large high resolution display (26.2 Mpixel) with high bandwidth (10 Gbit/s) connection to render cluster Full access to computational facilities and GRID middleware of VL-e e-whiteboards and tablet PCs to share and store ideas High definition video cameras for remote collaboration Highly adaptable lab configuration. Research into: Problem Solving Environments for biology under study formulation of scientific workflows that allow for sufficient interactivity and guarantee reproducibility Maintaining an electronic lab journal for e-science experimentation Methods for: Information Management of omics data Biological Domain Interaction / Resource Identification Modeling of Biological Information and Knowledge Remote scientific co-operation Man-machine interaction

High resolution displays in e-bioscience Clustering Video remote collaboration Gene lists Remote whiteboard SOM Interesting PathwaysGO catagories Literature Mining GSEA Example: concurrently display in a discussion with a remote partner Clustering results of microarray experiments Interesting pathways that are predominant in certain clusters Gene Ontology categories Results from literature mining Gene Set Enrichment of categories identified in literature mining Notions depicted on the e-whiteboards

Virtual Lab for e-Science research Philosophy Multidisciplinary research and development of related ICT infrastructure Generic application support Application cases are drivers for computer & computational science and engineering research Problem solving partly generic and partly specific Re-use of components via generic solutions whenever possible

Generic e-Science services Generic e-Science services Grid Services Harness multi-domain distributed resources Technology push Domain Specific tools Application pull Domain generic e-BioScience services Microarray pipeline Mass spectroscopy pipeline Pathway visualization Protein annotation Generic e-Science services

Generic e-Science services Generic e-Science services Grid Services Harness multi-domain distributed resources Technology push Domain generic e-Science services Domain generic e-Science services Generic e-Science services Domain Specific tools Micro-array Transcriptomics pipeline Mass spectroscopy Proteomics pipeline Domain Generic services Application pull

Bioinformatics methods in VL-e (1) Example 1 – An application specific method modified by e-science into a generic one: SigWin* Starting point: Application specific method for detecting windows of increased gene expression on chromosomes** (implemented in C and perl for SAGE technology) Motivation: Broad interest from molecular biology in positional behaviour of any measurement data that can be mapped onto DNA sequences SigWin e-Science version: GRID-based modular workflow for detecting windows of significance in any sequence of values Widely applicable from gene expression to meteorology data Modules reusable for alternative workflows, e.g. protein modification Scalable to very large datasets * Inda et al., 2 nd IEEE International Conference on e-Science and Grid Computing (submitted) ** Versteeg et al, Genome Research, 2003

Bioinformatics methods: SigWin Significant window detector Generalisation of RIDGE method Human gene expression Temperature in Amsterdam DNA curvature of the Escherichia coli chromosome

Bioinformatics methods in VL-e (2) Example 2 – An application specific method composed of generic and specific modules in a workflow: OligoRAP* Purpose: a re-annotation workflow for oligo libraries Motivation: rapidly evolving knowledge in genome analysis requires frequent re-assessment of the molecules which are used to measure gene-expression. OligoRAP Uses set of application generic (BIOMOBY) BLAT and BLAST sequence alignment (web)services. Uses application specific (BIOMOBY) annotation analysis service BIOMOBY: de-facto standard for bio-informatics webservices. Joint work of sequence analysis lab and micro-array lab Workflow: Adjustable filtering criteria make quality level of oligos explicit Workflow provenance makes re-annotation reproducible. * P. Neerincx, H. Rauwerda, F. Verster, A. Kommadath, T.M. Breit, J.A.M. Leunissen, Poster ISMB 2006

Virtual Lab for e-Science research Philosophy Multidisciplinary research and development of related ICT infrastructure Generic application support Application cases are drivers for computer & computational science and engineering research Problem solving partly generic and partly specific Re-use of components via generic solutions whenever possible Rationalization of experimental process Reproducible & comparable Two research experimentation environments Proof of concept for application experimentation Rapid prototyping for computer & computational science experimentation

Medical Diagnosis and Imaging Problem Solving Environment Partners: Universiteit van Amsterdam (UvA) Academisch Medisch Centrum (AMC) Vrije Universiteit Medisch Centrum (VUMC) Philips Research Philips Medical Systems TU Delft IBM Applications: 1.Eddy current reduction 2.Matched Masked Bone Elimination 3.Functional brain imaging, DWI and fiber tracking 4.MR virtual colonoscopy 5.Parallel MEG data analyses 6.Grid-based data storage, retrieval and sharing 7.Interactive 3D medical visualization Objective: To study the design and implementation of a PSE for medical diagnosis and imaging to support and enhance the clinical diagnostic and therapeutic decision process

Brain Imaging and Fiber Tractography Diffusion Weighted Imaging (DWI) Restricted Brownian motion results in anisotropy that can be measured >= 6 measurements, reduced to tensor per voxel Largest eigenvectors give diffusion vector Whole volume fiber tracking can take many hours Depends on size of volume and number of measurements per voxel Suitable for parallelization Visualization techniques

Medical Diagnosis and Imaging Problem Solving Environment VL-e generic services: Provides: Scientific visualization techniques Image processing algorithms Uses: Experiment editor Parallel processing techniques Application specific services: Access to PACS, DICOM Interfaces to medical scanners (MRI) In-house developed algorithms: Eddy Current Reduction Matched Masked Bone Elimination Patient privacy Grid Middleware Surfnet Virtual Laboratory VL-e Environment … Medical Applications … Grid services: Storage facilities (SRB) High Performance Computing platforms High Performance Visualization platforms

Eddy current reduction Shear, magnification and translation as a result of residual currents in DWI 2D matching to correct Computationally expensive Parallelization through domain decomposition Computing cycles via Grid Integrated PACS solution Effects of residual eddy currents on Philips 3T Intera with DWI. Figure by Erik-Jan Vlieger, AMC.

Medical Diagnosis and Imaging Problem Solving Environment 2D/3D visualization VL experiment topology Image processing, Data storage Filtering, analyses, simulation Data retrieval, acquisition

The situation in the Netherlands Netherlands Bio-Informatics Center (NBIC) was set up as part of the Dutch Genomics Initiative Netherlands Genomics Initiative (NGI) Its aim was to organize bio-informatics in the Netherlands and to generate sufficient critical mass also to support as a technology center the other genomics initiatives Organizational structure: Board of directors  Dr van Kampen scientific director  Drs R. Kok executive director  Prof. Dr. Hertzberger adjunct scientific director Board of overseeing International Advisory board Scientific Committee Program Steering Group

Current NBIC activities Currently NBIC runs three programs and took the initiative and participates in another three joint activities besides collaboration such as with SURF (networking) and VL-e (e-Science): NBIC programs: BioRange: a bio-informatics research program of 25 M$ & 25 M$ matching BioAssist: a 10 M$ support program BioWise: a 3 M$ education program Participation in : Computation life sciences: a 5 M$ program with among others physics, chemistry and computational science Pilot grid roll out: a 3M$ Grid rollout & support with Dutch Foundation for computing (NCF) and others BIG GRID: a 35M$ GRID and e-Science program in the Netherlands together with NCF, physics, VL-e and others

Program activities Bio Range has four program lines: Micro array related bio-informatics Proteomics related bio-informatics Integrated bio-informatics Informatics research for Bio-informatics All program lines comprise a number of collaborative projects with participation of groups all over the Netherlands Bio Assist runs two program lines Establishment of e-bioscience support environment Establishment of generic e-science infrastructure In future also addition towards biomedical as was illustrated

The VL-e infrastructure Grid Middleware Surfnet Application specific service Application Potential Generic service & Virtual Lab. services Grid & Network Services Virtual Laboratory VL-e Proof of Concept Environment Telescience Medical Application Bio Informatics Applications VL-e Experimental Environment Virtual Lab. rapid prototyping (interactive simulation) Additional Grid Services (OGSA services) Network Service (lambda networking) VL-e Certification Environment Test & Cert. Compatibility Test & Cert. Grid Middleware Test & Cert. VL-software

Grid Middleware Surfnet Network Service (lambda networking) Virtual Laboratory VL-E Experimental Environment VL-E Proof of concept Environment Telescience Medical Application Bio Applicatio ns Rapid prototyping (interactive simulation) Additional Grid Services (OGSA services) e-Science Roll out Application feedback Stable Application & VL-e component Unstable Application & VL-e component Grid Middleware Surfnet Virtual Laboratory Big Grid xxxx BioAssist Total 25M$ support + 25M$ matching Total 35 M$ support

Conclusions Omics experiments change the face of life sciences Bioinformatics can be considered to be an essential enabler and is a form of e-Science Will help to realize necessary paradigm shift in Life Science experimentation Better support of experimentation & optimal use of ICT infrastructure requires rationalization experimentation process Information management essential technology Bioinformatics can not be decoupled from e-Bio- science applications e-Bioscience also has to comprise biomedical applications