Yike Guo/Jiancheng Lin InforSense Ltd. 15 September 2015 Bioinformatics workflow integration.

Slides:

Advertisements

Similar presentations

Introducing InforSense Kanishka Karunanayake

Advertisements

Copyright Discovery Net Imperial College SARS Analysis on the Grid Discovery Net in Bioinformatics.

Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.

BiGCaT Bioinformatics Hunting strategy of the bigcat.

Discovery Workflow: (ServiceFlow) Programming the Grid Prof. Yike Guo Imperial College London.

Bioinformatics (and Systems Biology?) in Biomedical Research Donald Dunbar Systems Biology Club 30th November 2005.

Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.

Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○

Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.

Evidence-Based Information Retrieval in Bioinformatics

Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.

Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.

Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.

ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics

Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.

Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.

Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.

Topics in Computational Biology (COSI 230a) Pengyu Hong 09/02/2005.

Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.

Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

1 FACS Data Management Workshop The Immunology Database and Analysis Portal (ImmPort) Perspective Bioinformatics Integration Support Contract (BISC) N01AI40076.

Immune Cell Ontology for Networks (ICON) Immunology Ontologies and Their Applications in Processing Clinical Data June 11-13, Buffalo, NY.

Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.

Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.

Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.

9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

1 The Discovery Informatics Framework Pat Rougeau President and CEO MDL Information Systems, Inc. Delivering the Integration Promise American Chemical.

CceHUB A Knowledge Discovery Environment for Cancer Care Engineering Research Ann Christine Catlin HUBzero Workshop November 7, 2008.

Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.

Life Sciences Integrated Demo Joyce Peng Senior Product Manager, Life Sciences Oracle Corporation

DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.

A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.

Integrated Biomedical Information for Better Health Workprogramme Call 4 IST Conference- Networking Session.

Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

4 th Annual EPSRC e-science meeting The need for e-Science An industrial perspective Stephen Calvert – VP Cheminformatics GSKYike Guo – Imperial College.

Integrating BioMedical Text Mining Services into a Distributed Workflow Environment Rob Gaizauskas, Neil Davis, George Demetriou, Yikun Guo, Ian Roberts.

Introduction to caArray caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.

Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.

Browsing the Genome Using Genome Browsers to Visualize and Mine Data.

Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.

3/24/2005 TIGP 1 Bioinformatics for Microarray Studies at IBS Pei-Ing Hwang, Ph.D. Mar. 24, 2005.

Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.

Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.

EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.

Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.

An overview of Bioinformatics. Cell and Central Dogma.

Bioinformatics and Computational Biology

XML-Based Grid Data System for Bioinformatics Development Noppadon Khiripet, Ph.D Wasinee Rungsarityotin, MS Chularat Tanprasert, Ph.D Royol Chitradon.

Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani

Oracle Spatial Network Data Model Overview Oracle Life Sciences User Group Meeting Susie Stephens Life Sciences Product Manager Oracle Corporation.

SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.

GO based data analysis Iowa State Workshop 11 June 2009.

Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.

BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun.

Copyright OpenHelix. No use or reproduction without express written consent1 1.

Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.

Gene Set Analysis using R and Bioconductor Daniel Gusenleitner

High throughput biology data management and data intensive computing drivers George Michaels.

1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.

Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III

Expression Data Integration Microarray Gene Expression Database Meeting Sunday 14th November 1999.

Biological Databases By: Komal Arora.

Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.

Presentation transcript:

Yike Guo/Jiancheng Lin InforSense Ltd. 15 September 2015 Bioinformatics workflow integration

Life Science Challenges  Information resides on different:  Granularity levels (individual records vs. massive repositories)  Abstraction levels (models ranging from entire systems to compound patterns)  Domain levels (clinical, sequence, instrument…)  Researchers  Grouped in Virtual Organizations (VOs)  Working on the Grid  Need to communicate across physical and scientific/cultural barriers  Tools  Legacy, well-established in the process  Novel, essential to innovation  In need of a consistent infrastructure to connect the two groups

Discovery Informatics in Post-Genome Era ATGCAAGTCCCT AAGATTGCATAA GCTCGCTCAGTT polymorphism patient records epidemiology linkage maps cytogenetic maps physical maps sequences alignments expression patterns physiology receptors signals pathways secondary structure tertiary structure

Integrative Analytics Workflow Environment Data Applications Components Inbuilt Analytics Oracle Data Preprocess Files DB Workflow Warehouse Informatician Deployed Web App for End Users Portal Oracle DM Matlab R KXEN WEKA S-Plus SAS Integrative Analytics Workflow Environment 3 rd Party & Custom Apps MDL Spotfire Daylight Healthcare Web Services BioTeam iNquiry Data Analysis Group

InforSense Workflow Life Cycle  Constructing a ubiquitous workflow : by scientists  Integrate your information resources/software applications cross- domain  Support innovation and capture the best practice of your scientific research  Warehousing workflows: for scientists  Manage discovery processes in your organisation  Construct an enterprise process knowledge bank  Deployment workflow: to scientists  Turn your workflows into reusable applications  Turn every scientist into a solution builder

Workflow Creation, Integration, and Deployment Data Sources Select:1 Data Mining / Statistics Connect data and components in GUI Connect: 2 Workflow describes complex data processing and analysis “In database” processing & analytics Execute: 3 Define parameters of workflow to expose Deploy:4 Publish as: portlet, web application, SOAP service, command line app Data Processing / Transformation 3 rd Party applications (e.g.Haploview) Interactive data visualization / reporting “Cluster / Grid” execution

Biology to Chemistry  Novel sequences are compared to known protein structures  The resulting set of ligands on these matching structures is used to search small molecule databases for similar compounds  Compounds are then analyzed using KDE tools such as PCA and clustering to provide a diverse, representative subset for further assays

Navigating KEGG pathways  Gene names from EMBL are used to query KEGG via their Webservice API for appropriate pathways  Further Webservice API calls allow navigation of the data to find:  Pathway compounds  Other genes in the pathways  Visualization of query genes on their pathways

cDNA sequence annotation and alignment  A novel cDNA is annotated using EMBOSS tools, and a BLAST similarity search perfomed against human proteins  Annotations used to aid identification of predicted proteins derived from the cDNA

Ortholog analysis using BLAST  Sequence libraries from 2 organisms are cross-compared using BLAST to determine the best bi-directional matches of sufficient quality

Clustering of Affymetrix data with R  Native Affymetrix CEL files are loaded using R/Bioconductor  Differentially expressed genes calculated using KDE statistical nodes  The resulting list of genes is then clustered using HCLUST in R

Microarray analysis using text mining  Microarray data normalized in KDE  Upregulated genes annotated from Pubmed to obtain a set of related scientific papers  Text mining used to mine the paper collection and extract information most relevant to the researcher

Genetic data Mouse ID Cage ID Environmental conditions Management records Normal Diet Fat Fed Physiological Data prior change In Diet Weight Blood analysis Urine analysis Physiological Data after change In Diet. One time point in end-point experiment Several time points in longitudinal study Weight Blood analysis Physiological parameters Metabonomics Urine analysis Physiological parameter Metabonomics Tissue sampling Liver,Fat, Muscle, Kidney Metabonomics Proteomics (general, glyco-, phospho- proteomics) Transcriptomics Culling conditions Endpoint Culling or death 6 to 10 animals Sampling conditions Sample Storage conditions Ref of Biological assays used across the study Data Formats Affymetrix XLS files Chromatograms Filemaker Pro Metabonomics NMR spectra Raw Data Normalised Data Processed Data Similar data will be recorded regarding experiments performed with cells lines cDNA arrays ATF, GAL files Time  BAIR project Biological Atlas of Insulin Resistance

Collaborative Visualisation

Literature mining and compound analysis

Grid Computing

BAIR Portal

Integrative support   Information:   Data models to support individual domains (sequences, NMR profiles…) and methods to map them into generic analysis (tables, text)   Annotation databases integrated through Web Service APIs   Researchers   Sharing of work and knowledge through reusable workflow components   Aim for minimum technical overhead when linking new resources   Tools   Focus on integration methods rather than one-off tool linkage   Researchers able to link to standard tools without the need for an IT specialist   Databases accessed through aggregators (SRS, BioMart…)