The Center for Computational Genomics and Bioinformatics Christopher Dwan Mike Karo Tim Kunau.

Slides:



Advertisements
Similar presentations
Calera High School Dawn Bone
Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
© Franz Kurfess Project Topics 1 Topics for Master’s Projects and Theses -- Winter Franz J. Kurfess Computer Science Department Cal Poly.
Center for Computational Genomics and Bioinformatics U NIVERSITY OF M INNESOTA Source View Community Integrative Bioinformatics (NSF) Arabidopsis (reference.
Packard BioScience. Packard BioScience What is ArrayInformatics?
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Computer 101. Intro Computers are showing up everywhere you look Computers check out your groceries, pump your gas, dispense money at the ATM, turn.
Basic Unix Dr Tim Cutts Team Leader Systems Support Group Infrastructure Management Team.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
VMware vCenter Server Module 4.
Mr. Mark Welton.  Good documentation is key in a network design  Well-written documentation saves both time and money  Makes troubleshooting issues.
Online Personal Finance Management Tool CHI 170 Project Final Presentation by Bert Gao; Bey Wang; Francisco Crespo.
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
Computer System Alanoud Al Saleh. Computer systems Are defined as: A machine for solving problems. Specifically the modern computer is high-speed electronic.
Computer performance.
 What is a Computer What is a Computer  Functions of Computer Functions of Computer  Input Device of a Computer Input Device of a Computer  Output.
Anthony Atkins Digital Library and Archives VirginiaTech ETD Technology for Implementers Presented March 22, 2001 at the 4th International.
Chapter 1 CSF 2009 Computer Abstractions and Technology.

Bioinformatics Core Facility Ernesto Lowy February 2012.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
COP1220/CGS2423 Introduction to C++/ C for Engineers Professor: Dr. Miguel Alonso Jr. Fall 2008.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 2.
Why does my perfectly working App Crash and Burn in Production? Matt Kramer Project Manager, STL Boeing Scalability Test Lab cell.
Basic Concepts Of CITRIX XENAPP.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Computer Basics Terminology - Take Notes. What is a computer? well, what is the technical definition A computer is a machine that changes information.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
Mrs. Ulshafer August, 2013 Java Programming Chapter 1.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Bioinformatics Core Facility Guglielmo Roma January 2011.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Wellcome Trust Sanger Institute Informatics Systems Group Ensembl Compute Grid issues James Cuff Informatics Systems Group Wellcome Trust Sanger Institute.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Introduction to Enterprise Resource Planning ERP.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Near Real-Time Verification At The Forecast Systems Laboratory: An Operational Perspective Michael P. Kay (CIRES/FSL/NOAA) Jennifer L. Mahoney (FSL/NOAA)
1 CS1430: Programming in C++ Section 2 Instructor: Qi Yang 213 Ullrich
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Installation of Storage Foundation for Windows High Availability 5.1 SP2 1 Daniel Schnack Principle Technical Support Engineer.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Senem KUMOVA METİN // Fall CS 115 Introduction to Programming Introduction to Computing.
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Improving the Research Bootstrap of Condor High Throughput Computing for Non-Cluster Experts Based on Knoppix Instant Computing Technology RIKEN Genomic.
WEB MONITORING E6125 Web enHanced Information Management Presentation on Design of Web Monitoring applications. By Satyajeet Shaligram Columbia University.
Introduction To Computer Programming – 1A Computer Parts, Words, and Definition Herriman High School.
Canadian Bioinformatics Workshops
Bioinformatics Computation in the Cloud A Joint Collaboration Between Microsoft’s External Research and eXtreme Computing Groups
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
The CLoud Infrastructure for Microbial Bioinformatics
Computing challenges in working with genomics-scale data
Computer Basics Recap and Virtual Machines
An Overview of the Computer System
Tools and Services Workshop
Joslynn Lee – Data Science Educator
BDII Performance Tests
Genomic Data Clustering on FPGAs for Compression
생물정보학 Bioinformatics.
Is System X for Me? Cal Ribbens Computer Science Department
Hadoop Clusters Tess Fulkerson.
An Overview of the Computer System
An Overview of the Computer System
Types of Computers Mainframe/Server
CS246: Search-Engine Scale
ICS103 Programming in C 1: Overview of Computers And Programming
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

The Center for Computational Genomics and Bioinformatics Christopher Dwan Mike Karo Tim Kunau

Outline Perspective Processing tasks & requirements Computational solutions Interesting issues

Funding chart

The “Bioinformatics” component “Pipeline” data processing and storage 100Kb data <5sec processing time 10,000+ / month The problem: Interface (batch & dependancy management) Similarity search Search against one or more ~10GB databases The Problem: Data movement & memory »(much easier on dedicated resources)

The “bioinformatics” component “Unigene” assembly Traditional long run, big memory compute problem Comes at the end of the other two types The problem: algorithms Clustering / Pattern Discovery Conference driven Causes us to redo the other tasks

The “bioinformatics” component “Data warehouses” –Mirroring and cross checking other public resources –Local Oracle implementation of public databases for local users (Genbank / Swiss- PROT / Medicago …)

The “bioinformatics” component Microarray data Image data (~1MB per image) requires processing and storage Unknown normalization, errors, etc. requires that we simply keep all the raw data. Web based display of results Visualization…

Computational resources ~100 CPU Opportunistic Condor “Flock” Not dedicated Configuration can change without warning No permanent local data storage Machines sit on desks. “flocking” with Madison, CS dept, other labs Reciprocity can hurt a LOT. Server farms Intel / Alpha Hard to find money to buy dedicated machines, esp. on single organism projects.

Software and user issues An intuitive interface to parallel and batch systems gives uninformed users a great deal of power. Tools from outside: Poor scalability Tools from inside: Poor portability

Heuristic algorithms Many bioinformatics tools are heuristic rather than complete searches. These searches can return different results on different machines (dynamic thresholds, 32 vs. 64 bit math, …) How do we tell “different” from “erroneous?”

Thank you: The Condor team at Madison Sanger Center

Collaborations are the key Christopher Dwan Mike Karo Tim Kunau