Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Slides:



Advertisements
Similar presentations
Research Infrastructures WP 2012 Call 10 e-Infrastructures part Topics: Construction of new infrastructures (or major upgrades) – implementation.
Advertisements

Particle physics – the computing challenge CERN Large Hadron Collider –2007 –the worlds most powerful particle accelerator –10 petabytes (10 million billion.
Cloud Resource Broker for Scientific Community By: Shahzad Nizamani Supervisor: Peter Dew Co Supervisor: Karim Djemame Mo Haji.
Cloud Computing for Education & Cloud Learning Minjuan Wang to BT Research Center (Abu Dhabi) Educational Technology San Diego State University
Café for Routine Genetic Data Exchange (Café RouGE) Human Variome Project Meeting, Paris 2010 Dr Owen Lancaster.
Microarray for DNA & RNA Mosa Alzowelei BME 11/12/2014.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Aleksi Kallio CSC – IT Center for Science Chipster and collaboration with other bioinformatics platforms.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
Flexible Services for the Support of Research Project Overview.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Molecular Biology Dr. Chaim Wachtel April 4, 2013.
Integromics: a grid-enalbled platform for integration of advanced bioinformatics tools and data Luca Corradi Luca Corradi BIO-Lab,
BUILDING HYBRID APPS WITH DYNAMICS CRM & WINDOWS AZURE Guy Riddle & George Doubinski Dynamics CRM MVP’s SESSION CODE: DEV-DYN-MID306 (c) 2011 Microsoft.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Lecture 8 – Platform as a Service. Introduction We have discussed the SPI model of Cloud Computing – IaaS – PaaS – SaaS.
Bioinformatics Core Facility Ernesto Lowy February 2012.
Building service testbeds on FIRE D5.2.5 Virtual Cluster on Federated Cloud Demonstration Kit August 2012 Version 1.0 Copyright © 2012 CESGA. All rights.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.
SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI CloudBroker Platform integration into WS-PGRADE/gUSE Zoltán Farkas MTA.
PaaS for the Modern Web A powerful self service platform for developers A flexible hosting solution for IT Web Sites for Windows Server Scalable Scale.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Dr Andrew Harrison Departments of Mathematical Sciences and Biological Sciences University of Essex Looking for signals in tens of thousands.
Chapter 11 An Introduction to Visual Basic 2008 Why Windows and Why Visual Basic How You Develop a Visual Basic Application The Different Versions of Visual.
European Grid Initiative Federated Cloud update Peter solagna Pre-GDB Workshop 10/11/
T.Jadczyk, Bioinformatics Applications in the Virtual Laboratory Bioinformatics Applications in the Virtual Laboratory Tomasz Jadczyk AGH University of.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Effect of Single Nucleotide Polymorphism in Affymetrix probes Olivia Sanchez-Graillet Departments of Biological Sciences and Mathematical Sciences University.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Co-funded.
Introduction to caArray caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.
Bioinformatics Core Facility Guglielmo Roma January 2011.
Chapter 11 An Introduction to Visual Basic 2005 Why Windows and Why Visual Basic How You Develop a Visual Basic Application The Different Versions of Visual.
Molecular Biology Dr. Chaim Wachtel May 28, 2015.
Tsute (George) Chen Bioinformatics Core Department of Microbiology The Forsyth Institute March 24 th, 2015 HOMD A Tour to the Data and Tools.
Automating Operational and Management Tasks in Microsoft Operations Management Suite and Azure
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Windows Azure poDRw_Xi3Aw.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Business Engagement Program for SMEs Javier Jiménez Business Development.
Bellevue College Cloud Workshops Try: Cloud services Friday, May 6, 2016 Azure Introduction Fawad Khan.
Bellevue College Cloud Workshops Try: Cloud services Friday, May 6, 2016 Azure Virtual Machines (VM) Fawad Khan.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
Canadian Bioinformatics Workshops
ENEA GRID & JPNM WEB PORTAL to create a collaborative development environment Dr. Simonetta Pagnutti JPNM – SP4 Meeting Edinburgh – June 3rd, 2013 Italian.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Accessing the VI-SEEM infrastructure
Ecological Niche Modelling in the EGI Cloud Federation
Tools and Services Workshop
Development of an interactive pipeline for Genome wide association analysis Falola Damilare & Adigun Taiwo – Covenant University Bioinformatics research.
Joslynn Lee – Data Science Educator
Cloud Data platform (Cloud Application Development & Deployment)
Platform as a Service.
Azure IaaS 101.
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
EOSCpilot All Hands Meeting 8 March 2018 Pisa
Azure Enables Mobility, Easy Sync and Share, and Allows Companies to Retain Data Control MINI-CASE STUDY “Azure provides the full stack of technology that.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Anja Burghardt, Institute for Employment Research (IAB)
Anjuman College of Engineering & Technology Computer Science & Engineering Department Subject Code: BECSE408T Subject Name: (ELECTIVE-III)Clustering &
Microsoft Virtual Academy
Presentation transcript:

Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG Workshop, Oxford Dr Andrew Harrison, University of Essex Dr Hugh Shanahan, Royal Holloway, University of London

Introduction The Affymetrix GeneChip Micro-array data Venus-C pilot project R scripts on Azure Cloud Results to date Our Experience

We are developing informatics tools to aid the analysis of Affymetrix chips (GeneChips, Exon arrays). Micro-arrays are the data read from GeneChips Affymetrix GeneChip ArrayExpress is an example of a public database containing microarrays and other data from biological experiments

DNA and RNA

Probe cells of an Affymetrix Gene chip contain millions of identical 25-mers 25-mer

Affymetrix GeneChip Hybridization – fragments of RNA stick to the probes

Affymetrix GeneChip Fluorescence

Micro-array datasets Fluorescence data put into.cel files Many 1000s of experiments Many 100s of micro-arrays for each GeneChip >1Tb data to analyse 1000s of published papers using Affymetrix GeneChips This data is a free resource to researchers

Going Forward... Currently we analyse flaws in Genechip data next generation sequencing Future is new genomic technology known as next generation sequencing Petabytes of data being generated faster than it can be analysed Cloud solutions needed for storage of and access to this data

Venus-C Pilot Project VENUS-C is a project funded under the European Commissions 7th Framework Programme with computing resources from Microsoft Joint co-operation between computing service providers and scientific user communities Aim: to develop, test and deploy a large, Cloud computing infrastructure for science and SMEs (small and medium-sized enterprises) in Europe.

Venus-C Infrastructure 3 main areas dealing with standards: – VM management (OCCI and OVF) – Job submission (BES) – Cloud data storage (CDMI) Other specifications, such as – WS-Security Programming model: – Task based submission: Generic Worker role

cTQm Project Overview B L O B Storage Public database Scripts, R libs and key data uploaded via Azure webpage

Cloud / Grid Interfaces Amazon EC2: Amazon EC2: Command line interface into Linux terminal NGS: NGS: Portal or Command Line to Linux machine Azure: Azure: Webpage interface to a Windows machine, Visual Studio 2010, C#

Bioinformatics Results to date Uploading of datasets into Cloud storage is underway Success with R scripts on Azure to confirm results in published paper* Minor problems with ArrayExpress to solve Work is extending to more GeneChip types Still need user authentication / accounting * Nucleic Acids Research, 2011, 1-9, Normalised Affymetrix expression data are biased by G-quadruplex formation, by Hugh P. Shanahan, Farhat N. Memon, Graham J. G. Upton and Andrew P. Harrison

Our Experience Azure Cloud is a steep learning curve for a Linux-based scientist Vast datasets can be made available Applications can be user-friendly Scalability makes Cloud approach attractive Costs need to be assessed Enables scientists in developing countries to perform genome analysis

Acknowledgements and thanks to:- Dr Andrew Harrison, University of Essex Dr Hugh Shanahan, Royal Holloway, University of London Department of Mathematical Sciences, University of Essex European Commissions 7th Framework Programme Venus-C Microsoft and Venus-C project Organisers Analysis of Affymetrix expression data using R on Azure Cloud