Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services.

Slides:



Advertisements
Similar presentations
Saya Web Interface Project Edward Rafaelov & Vladimir Postel DEC Advisors: Prof. Shlomi Dolev & Michael Orlov.
Advertisements

6.033: Intro to Computer Networks Layering & Routing Dina Katabi & Sam Madden Some slides are contributed by N. McKewon, J. Rexford, I. Stoica.
CCNA – Network Fundamentals
Chapter 7: Transport Layer
NSF Site Visit HYDRA Using Windows Desktop Systems in Distributed Parallel Computing.
Windows HPC Server 2008 Presented by Frank Chism Windows and Condor: Co-Existence and Interoperation.
Distributed components
VL-e PoC Introduction Maurice Bouwhuis VL-e work shop, April 7 th, 2006.
1 Object-Oriented Software Development Project Aaron Christopher.
MIGSOCK Migratable TCP Socket in Linux Demonstration of Functionality Karthik Rajan Bryan Kuntz.
A CHAT CLIENT-SERVER MODULE IN JAVA BY MAHTAB M HUSSAIN MAYANK MOHAN ISE 582 FALL 2003 PROJECT.
Grid Programming Environment (GPE) Grid Summer School, July 28, 2004 Ralf Ratering Intel - Parallel and Distributed Solutions Division (PDSD)
How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.
TOPIC 1 – SERVER SIDE APPLICATIONS IFS 234 – SERVER SIDE APPLICATION DEVELOPMENT.
- 1 - Grid Programming Environment (GPE) Ralf Ratering Intel Parallel and Distributed Solutions Division (PDSD)
Chapter 9 Message Passing Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 7: Transport Layer Introduction to Networking.
Distributed Software Development VLab project plan.
TCP/IP: Basics1 User Datagram Protocol (UDP) Another protocol at transport layer is UDP. It is Connectionless protocol i.e. no need to establish & terminate.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Transport Layer Layer #4 (OSI-RM). Transport Layer Main function of OSI Transport layer: Accept data from the Application layer and prepare it for addressing.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
SUSE Linux Enterprise Desktop Administration Chapter 12 Administer Printing.
Component 9 – Networking and Health Information Exchange Unit 1-1 ISO Open Systems Interconnection (OSI) This material was developed by Duke University,
1 Version 3.0 Module 11 TCP Application and Transport.
E-science grid facility for Europe and Latin America OurGrid E2GRIS1 Rafael Silva Universidade Federal de Campina.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Distributed Systems: Concepts and Design Chapter 1 Pages
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
Grid Computing I CONDOR.
Greg Thain Computer Sciences Department University of Wisconsin-Madison cs.wisc.edu Interactive MPI on Demand.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Dr. John P. Abraham Professor University of Texas Pan American Internet Applications and Network Programming.
The Transport Layer application transport network data link physical application transport network data link physical application transport network data.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Windows Network Programming ms-help://MS.MSDNQTR.2004JAN.1033/winsock/winsock/windows_sockets_start_page_2.htm 井民全.
The Socket Interface Chapter 21. Application Program Interface (API) Interface used between application programs and TCP/IP protocols Interface used between.
PROGRESS: ICCS'2003 GRID SERVICE PROVIDER: How to improve flexibility of grid user interfaces? Michał Kosiedowski.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
SMBL and Blast Joe Rinkovsky Unix Systems Support Group Indiana University.
Manchester Computing S. M. Pickles, R. Haines, R. L. Pinning and A. R. Porter UK e-Science All Hands Meeting, nd September 2004 Practical Tools.
WSV207. Cluster Public Cloud Servers On-Premises Servers Desktop Workstations Application Logic.
Hyun-Jin Choi, CORE Lab. E.E. 1 httperf – A Tool for Measuring Web Server Performance Dec Choi, Hyun-Jin David Mosberger and Tai Jin HP Research.
WebFlow High-Level Programming Environment and Visual Authoring Toolkit for HPDC (desktop access to remote resources) Tomasz Haupt Northeast Parallel Architectures.
The Gateway Computational Web Portal Marlon Pierce Indiana University March 15, 2002.
PROGRESS: GEW'2003 Using Resources of Multiple Grids with the Grid Service Provider Michał Kosiedowski.
Sharing Resources Lesson 6. Objectives Manage NTFS and share permissions Determine effective permissions Configure Windows printing.
1 Network Communications A Brief Introduction. 2 Network Communications.
Process-to-Process Delivery:
SCI-BUS project Pre-kick-off meeting University of Westminster Centre for Parallel Computing Tamas Kiss, Stephen Winter, Gabor.
Application Layer Functionality and Protocols Abdul Hadi Alaidi
Chapter 3 Internet Applications and Network Programming
FTP & TFTP Server Ferry Astika Saputra.
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
ما هي خدمة بروتوكول نقل الملفات؟
Scalable, distributed database system built on multicore systems
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
Time Gathering Systems Secure Data Collection for IBM System i Server
SCTP-based Middleware for MPI
FTP AND COMMAND PROCESSING IN FTP
Operating Systems Structure
New Tools In Education Minjun Wang
IS 4506 Configuring the FTP Service
Presentation transcript:

Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services Indiana University

Problem Description Turn Windows desktop systems in STC labs (around 2000) into a parallel scientific computer

Discussion Parallel applications need coordination through message passing MPI does not handle ephemeral processes well Multiplexing communication among processes Ports brokered among multiple parallel sessions

What do we have ? Condor NT, vanilla universe match-making, file transfer, fair sharing, job submission, suspension, preemption, restart, security Test application – fastDNAml-p Parallel application, master-worker model, small granularity of work

How we did it Simple Message Brokering Library (SMBL) Process and Port Manager (PPM) A mechanism for users to submit jobs (web portal)

SMBL An IO multiplexing server in charge of message delivery for each parallel session (serialize communication) SMBL client library implements selected MPI-like calls Both the server and the client library are based on a TCP socket abstraction library

Process and Port Manager Assigns port to each of the SMBL server process start the SMBL server and application processes on demand direct workers to their servers

The Portal Apache based PHP web interface Creates and submits the condor submit files

The Big Picture The shaded box indicates components hosted on multiple desktop computers

Statistics Red: total owner Blue: total idle Green: total Condor

Scalability Issues Needed big server Adjusted condor_config MAX_JOBS_RUNNING 1000 SHADOW_SIZE_ESTIMATE 900KB MAX_STARTD_LOG 640KB Lost workers because of per-process file descriptor limit (1024)

Summary Built a large parallel scientific computing facility using Condor Built parallel message passing library to deal with ephemeral resources Built port broker to handle multiple parallel sessions Built web portal It is open source, visit: