WV-INBRE West Virginia IDeA Network of Biomedical Research Excellence Managing the NextGen data pipeline Jim Denvir, Ph.D.

Slides:



Advertisements
Similar presentations
Metasearch Unbound: Corporate Library Perspectives Utilizing Unbundled Software Probing Pros & Cons Canvassing Capabilities Reserving Resources Customizing.
Advertisements

GCSE ICT Networks & Security..
MATERIAL STORAGE SYSTEMS Gülşah GEDİK
Computing Infrastructure
Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
Copyright © 2006 Quest Software SQL 2005 Disk I/O Performance By Bryan Oliver SQL Server Domain Expert.
Backing Up Your Computer Hard Drive Lou Koch June 27, 2006.
Evis Trandafili Polytechnic University of Tirana Albania Functional Programming Languages 1.
Sieglinde Schreiner-Linford Low cost e-journal Production As-low-cost-as-possible-for-a- chosen-level-of-quality e-journal production.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Dawei Lin, Ph.D. Director, Bioinformatics Core UC Davis Genome Center July 20, 2008, SLIMS (Solexa sequencing.
World’s Leading Provider of Turn-key Compute Solutions for NGS / Bioinformatics.
2009 Library Technology Plans Analysis and Summary by eiNetwork May 2009.
Usability Inspection n Usability inspection is a generic name for a set of methods based on having evaluators inspect or examine usability-related issues.
Next generation sequencing Why? What? How? Marcel Dinger Developmental Biology Divisional Seminar 7 October 2010.
S ELECTION OF WEB HOST AND WEB PAGE SYSTEM. W EB HOST stores all the pages of your website and makes them available to computers connected to the Internet.
Flow Cytometry Shared Resource Bioinformatics Improvements/Bluearc Storage.
Air Quality Data Analysis Using Open Source Tools
IT Job Roles Task 20. Software Engineer Job Description Software engineers are responsible for creating and maintaining software of various different.
Duncan Fraiser, Adam Gambrell, Lisa Schalk, Emily Williams
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
November 2009 Network Disaster Recovery October 2014.
What do researchers do with IT?What do they want from ITS? Heidi Fraser-Krauss University of York.
Anthony Atkins Digital Library and Archives VirginiaTech ETD Technology for Implementers Presented March 22, 2001 at the 4th International.
Software Depot Service CPTE 433 John Beckett. What? A centralized source for software in your organization. Managed by the SA group. Provides supported.
The Indiana University School of Medicine (IUSM) Bioinformatics Service Core started in 2007 in the Center for Computational Biology and Bioinformatics.
A337 File Design Computerized and Manual Systems 4/4/2012.
© Paradigm Publishing, Inc. 5-1 Chapter 5 Application Software Chapter 5 Application Software.
Development of a Hydro/Meteorological Data Management System For improved water management Objectives Establish a hydrological Data Management System (DMS)
Bioinformatics Core Facility Ernesto Lowy February 2012.
Using Microsoft ACCESS to develop small to medium applications on campus.
CLOUD COMPUTING  IT is a service provider which provides information.  IT allows the employees to work remotely  IT is a on demand network access.
CMSC 345 Fall 2000 Unit Testing. The testing process.
Class 5 Computer Software. Outline System Software Application Software (“Applications”) Markup languages for Internet (HTML, XML) User Interface Client-Server.
BACKUP AND ARCHIVING DATA BACKUP AND RECOVERY OF DATA.
A337 File Design Computerized and Manual Systems 11/10/2009.
DDN & iRODS at ICBR By Alex Oumantsev History of ICBR  Campus wide Interdisciplinary Center for Biotechnology Research  Core Facility  Funded by the.
The NIH Roadmap and the Human Microbiome Project Francis S. Collins, M.D., Ph.D. National Human Genome Research Institute April 22, 2007.
Chapter 13 Information Technology Economics. Agenda IT Organization IT Productivity IT Benefits Data Center Economics.
+ => Bioinformatics: from Sequence to Knowledge Outline: Introduction to bioinformatics The TAU Bioinformatics unit Useful bioinformatics issues and databases:
Bioinformatics Core Facility Guglielmo Roma January 2011.
10/07/2008 Open Source Software An Introduction to FOSS in Libraries and Information Centers Christopher Ritzo GSLIS PhD. Candidate IMLS Scholar, Youth.
Travel Time Value Calculator: The Development of an Analysis Utility in Cube/Voyager.
Or, how we created LIVE.PSU.EDU and NEWSWIRES.PSU.EDU without blowing our budget Or, how we created LIVE.PSU.EDU and NEWSWIRES.PSU.EDU without blowing.
Cancer Center Support Grant Site Review Date Cancer Center Support Grant Site Review Date Genomics High-Throughput Facility (GHTF) and Bioinformatics Core.
National Diploma Unit 4 Introduction to Software Development Program specification & development.
A Tour of the VCU Computer Center Presentation by John Owens Photography by Zoltan Forray.
Open Source Solutions for Education all materials ©2004 the rsmart group Open Source Applications for Higher Education.
R&D Operation Best Practice for Start Up Start a Business And Change the world Alfred Boediman, Ph.D.
Presentation on “Technology used by university student”
Cloud Computing ENG. YOUSSEF ABDELHAKIM. Agenda :  The definitions of Cloud Computing.  Examples of Cloud Computing.  Which companies are using Cloud.
Canadian Bioinformatics Workshops
Stock Intermediate II and Higher Business Management.
Identify Assess Plan Implementation. Managed Service Providers (MSPs) must use the best and latest technologies on the market to deliver services. IT.
HOW TO CHOOSE THE BEST CLOUD ACCOUNTING SOFTWARE? You can use cloud-based software from any device with an internet connection. Online accounting means.
Models for a sustainable NGS University Resources and fEC Peter Clarke Feb 22th 2007 NeSC and Academic Director of ECDF (Edinburgh Compute and Data Facility)
QuasR: Quantify and Annotate Short Reads in R Anita Lerch, Dimos Gaidatzis, Florian Hahne and Michael Stadler Friedrich Miescher Institute for Biomedical.
Unit 2: Cyber Security Part 3 Monitoring Tools & other Security Products.
Backing Up Workstations: How to Protect Yourself on the Cheap
Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Lecture 6. Information systems
Doing more with less: Evaluation with the Rapid Cycle Evaluation Coach
Bioinformatic analysis using Jetstream, a cloud computing environment
Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-
System Review – The Forgotten Implementation Step
Bioinformatics Core Staffing
Transnational access HIVA KU Leuven) Expertise in indicators and comparative analysis about quality of work at EU level research infrastructure About.
Presentation transcript:

WV-INBRE West Virginia IDeA Network of Biomedical Research Excellence Managing the NextGen data pipeline Jim Denvir, Ph.D.

NextGen data challenges NextGen Sequencing produces very large data sets – Order of Terabytes (10 12 bytes) per run Data analysis requires considerable computing power and specialist management Main challenge is in distilling useful information from raw data WV-INBRE West Virginia IDeA Network of Biomedical Excellence

Core Facility support Bioinformatics and Genomics core facilities provide support for investigators needing to have NextGen Sequencing data analyzed – Perform analysis from early part of pipeline – Perform downstream analysis, or provide support and software for individual investigators Depending on needs and expertise of investigator WV-INBRE West Virginia IDeA Network of Biomedical Excellence

NextGen Analysis Pipeline Image Analysis Base Calling Demultiplexing* Alignment SNP calling or RNA Read Counting Statistical Analysis Functional Analysis CASAVA (Illumina) or open source (Tuxedo Suite, R/Bioconductor) * May require custom scripts Partek or R/Bioconductor Real Time Analysis performed by RTA software on sequencer IPA Automated Core Facility Investigator WV-INBRE West Virginia IDeA Network of Biomedical Excellence

Commercial Tools Examples: RTA, CASAVA, Partek, IPA Pros: – Short learning curve Potentially can be used by individual investigators – Usually come with technical support and training Cons: – Expensive – Closed, proprietary source code WV-INBRE West Virginia IDeA Network of Biomedical Excellence

Open Source Examples: R/Bioconductor, Tuxedo suite Pros: – Free – Open source Enables rapid, community-led improvement Potentially more academically reviewable Cons: – Steeper learning curve Typically prohibitive for individual investigators – Sparse technical support WV-INBRE West Virginia IDeA Network of Biomedical Excellence

Tools developed on site Pros: – Can fill in missing functionality from available tools – Customized exactly to our needs – Potential for a revenue source Cons: – Development is very time consuming WV-INBRE West Virginia IDeA Network of Biomedical Excellence

Roadmap Experience from microarray data analysis suggests: – Start with commercial tools Rapid start-up enables us to focus on learning scientific basis for the analyses – Transition to open-source tools for some parts of pipeline Probably mid 2012-mid 2014 Provides for financial saving further down the road Sometimes better received by journal reviewers Initial steps of analysis pipeline and functional analysis will still be managed by commercial software – Develop custom solutions only when needed WV-INBRE West Virginia IDeA Network of Biomedical Excellence

Storing Data Archiving data from NextGen experiments requires a large amount of disc space Once analysis is complete, some raw image data will be deleted – Storage of data is more expensive than re-running an experiment! – Will consider exceptions for experiments which cannot be repeated WV-INBRE West Virginia IDeA Network of Biomedical Excellence

NextGen analysis server Genomics Core has a Linux server for managing analysis and storing data – Housed in Drinko library and managed by central campus IT staff Has 42 Terabytes of usable disc space – Uses redundant system to allow for potential of drive failures without losing data – Additionally, IT will back up data off site WV-INBRE West Virginia IDeA Network of Biomedical Excellence

Things to remember Core facilities are there to help! At experimental design stage, be sure you understand what analysis the core facility will perform – Would you prefer to have IPA done by the core, or would you prefer control over that stage If so, do you need training and/or support? WV-INBRE West Virginia IDeA Network of Biomedical Excellence

Questions ? WV-INBRE West Virginia IDeA Network of Biomedical Excellence Presentation available at