HTCondor for machine learning in biology

Slides:

Advertisements

Similar presentations

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.

Advertisements

High Throughput Computing and Protein Structure Stephen E. Hamby.

Cyberinfrastructure and Scientific Collaboration: Application of a Virtual Team Performance Framework to GLOW II Teams Sara Kraemer Chris Thorn Value-Added.

Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.

Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.

Epistasis Analysis Using Microarrays Chris Workman.

Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.

Welcome to HTCondor Week #14 (year #29 for our project)

Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.

Expanding Access to OpenClinica Data via LabKey Server Peter Hussey 2013 OpenClinica Global Conference Founding Partner, LabKey Software June 21, 2013.

Hideko Mills, Manager of IT Research Infrastructure

Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.

Cry1wild type. A multicamera image acquisition platform In different rooms for different purposes we have approximately 20 of these CCD cameras equipped.

Interpreting Microarray Expression Data Using Text Annotating the Genes Michael Molla, Peter Andreae, Jeremy Glasner, Frederick Blattner, Jude Shavlik.

Custom Spotfire Applications for use in Drug Discovery Chris Louer Team Leader, Cheminformatics © 2001, GlaxoSmithKline, Inc. - All Rights Reserved.

Computing and Communications and Biology Molecular Communication; Biological Communications Technology Workshop Arlington, VA 20 February 2008 Jeannette.

Miron Livny Center for High Throughput Computing Computer Sciences Department University of Wisconsin-Madison Open Science Grid (OSG)

Copyright OpenHelix. No use or reproduction without express written consent1.

Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.

A quick introduction to Oncinfo Lab Dr. Habil Zare, PhD PI of Oncinfo Lab Department of Computer Science Texas State University 18 September 2015.

Integrating the Bioinformatic Technology Group into your research programme Introduction People and Skills Examples Integrating the BTG Contacts BHRC Away.

An Integrated Computational Framework for Systems Biology Ram Samudrala University of Washington How does the genome of an organism specify its behaviour.

Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.

BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,

Center for Predictive Computational Phenotyping (CPCP): Training Plans May 15, 2015 Debora Treu and Whitney Sweeney Center for Predictive Computational.

Why Write A Grant? Elaine M. Hylek, MD, MPH Professor of Medicine Associate Director, Education and Training Division BU CTSI Section of General Internal.

COMP24111: Machine Learning Ensemble Models Gavin Brown

Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.

I NSTITUTE for G ENOMIC B IOLOGY Nathan D. Price Department of Chemical and Biomolecular Engineering Center for Biophysics and Computational Biology Institute.

Use of Machine Learning in Chemoinformatics

1 Modelling and Simulation EMBL – Beyond Molecular Biology Physics Computational Biology Chemistry Medicine.

신기술 접목에 의한 신약개발의 발전전망과 전략 LGCI 생명과학 기술원. Confidential LGCI Life Science R&D 새 시대 – Post Genomic Era Genome count ‘The genomes of various species including.

Multi-scale network biology model & the model library 多尺度网络生物学模型 -- 兼论模型库的建立与应用 Jianghui Xiong 熊江辉

Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.

Comparing TensorFlow Deep Learning Performance Using CPUs, GPUs, Local PCs and Cloud Pace University, Research Day, May 5, 2017 John Lawrence, Jonas Malmsten,

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

The CompTox Chemistry Dashboard: an informational data hub at the

High-Throughput Machine Learning from EHR Data

Python for data analysis Prakhar Amlathe Utah State University

Journal club Jun , Zhen.

Licenses and Interpreted Languages for DHTC Thursday morning, 10:45 am

Matt Link Associate Vice President (Acting) Director, Systems

Computer Science Department, University of Missouri, Columbia

An Artificial Intelligence Approach to Precision Oncology

MODELLING INTERACTOMES

Miron Livny John P. Morgridge Professor of Computer Science

APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY

Condor: Job Management

Learning Sequence Motif Models Using Expectation Maximization (EM)

Inferring Models of cis-Regulatory Modules using Information Theory

Basic machine learning background with Python scikit-learn

Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2018 Anthony Gitter

Machine Learning Ali Ghodsi Department of Statistics

Identifying Signaling Pathways

Eukaryotic Gene Finding

Dean Martin Cadwallader Dean of the Graduate School

PABIO 590B Advanced Topics in Bioinformatics

Genomes and Their Evolution

A Short Tutorial on Causal Network Modeling and Discovery

Linking Genetic Variation to Important Phenotypes

From Data to Therapies Research in Xinghua Lu’s Lab

Graph Neural Networks Amog Kamsetty January 30, 2019.

Adam P. Deveau, Victoria L. Bentley, Jason N. Berman

A Suppression Strategy for Antibiotic Discovery

Harnessing the Antioxidant Power with ARE-Inducing Compounds

Satisfying the Ever Growing Appetite of HEP for HTC

Adam P. Deveau, Victoria L. Bentley, Jason N. Berman

SYGNALing a Red Light for Glioblastoma

Artificial Intelligence and Trusted Microelectronics

Bioinformatic analyses suggest that PI3K/AKT signaling may be a key downstream pathway of tazarotene signaling. Bioinformatic analyses suggest that PI3K/AKT.

Presentation transcript:

HTCondor for machine learning in biology Anthony Gitter University of Wisconsin-Madison Morgridge Institute for Research May 22, 2018 gitter@biostat.wisc.edu www.biostat.wisc.edu/~gitter @anthonygitter These slides are licensed under CC BY 4.0 by Anthony Gitter. See links for third-party image attribution.

Drug discovery and GPU computing Image: Morgridge Moayad Alnammi Shengchao Liu Sam Gelman

High-throughput chemical screening Given a protein of interest, identify chemicals that may have the ability to control the protein Image: Norah Trent

> 1 million compounds Computational chemical prioritization New protein target > 1 million compounds Suggested test order 5* 30390 201 980000 2* 458 … Chemical compound ≈ key Protein ≈ lock

Chemical representations for machine learning Chemception 2-D searching tutorial Wikipedia SMILES PubChem See Ching et al. 2018

Training neural networks We’ve seen 2-100x speedups on GPUs Some jobs still require days on GPUs

Expanding our GPU capacity UW grid Cooley condor_annex

Software dependencies Chemical screening software System libraries NVIDIA driver XXX.YY

Tradeoffs of conda Pros: Easy to install Anaconda Environments are shareable Easy to update packages in environment Cons: Still depend on system libraries conda_submit gpu-job.sub condor install numpy

Predictive models perform well in experimental tests Train on 75k chemicals, PriA-SSB inhibition Choose among many possible models Models select 250 of 25k new chemicals 64 active chemicals in the 25k Model we selected is the best, finds 40 of the 62 Random forest outperforms the neural networks Now testing on much larger chemical libraries

Gene networks and single cell expression Atul Deshpande

What are the relationships among genes inside a cell? Measure snapshots of gene abundance Gene C “Pseudo” time Abundance Gene D Gene A Gene B Gene C Gene D

For each gene, learn which genes control it Gene A A B C D X Gene B Gene C Gene D

Divide network inference into many small computational jobs Gene C “Pseudo” time Abundance Gene D D A ? B C 100 batches of genes X 10 random bootstraps X 100 parameter combinations

Acknowledgments UW-Madison, CHTC Researchers, collaborators Lauren Michael Christina Koch Miron Livny Todd Miller Aaron Moate Tim Cartwright Jaime Frey Greg Thain Neil Van Lysel Dakota Chambers Researchers, collaborators Moayad Alnammi Atul Deshpande Sam Gelman Shengchao Liu Other group members Spencer Ericksen Scott Wildman Michael Hoffmann Andrew Voter James Keck Ron Stewart

Funding and computing resources UW-Madison Center for High Throughput Computing NVIDIA NIH Commons Credits Center for Predictive Computational Phenotyping NIH U54 AI117924 UW Carbone Cancer Center NIH P30 CA014520 UW-Madison VCRGE with funding from WARF PhRMA Foundation Morgridge Institute for Research NSF CAREER 1553206