TORQUE Kerry Chang CCLS December 13, 2010. O UTLINE Torque How does it work? Architecture MADA Demo Results Problems Future Improvements.

Slides:

Advertisements

Similar presentations

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.

Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.

EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.

Workload management Owen Maroney, Imperial College London (with a little help from David Colling)

LIBRA: Lightweight Data Skew Mitigation in MapReduce

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

MapReduce Online Veli Hasanov Fatih University.

Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.

Running DiFX with SGE/OGE Helge Rottmann Max-Planck-Institut für Radioastronomie Bonn, Germany DiFX Meeting Sydney.

Slide 1 of 10 Job Event Basics A Job Event is the name for the collection of components that comprise a scheduled job. On the iSeries a the available Job.

Bookshelf.EXE - BX A dynamic version of Bookshelf –Automatic submission of algorithm implementations, data and benchmarks into database Distributed computing.

Presented by: Priti Lohani

ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health.

A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter ： S.Y.Chen.

Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.

DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.

ISG We build general capability Purpose After this tutorial, you should: Be comfortable submitting work to the batch queuing system of olympus and be familiar.

Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,

ALEPH version Services / Task Manager South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD © South Dakota Library.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

Ch 4. The Evolution of Analytic Scalability

CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.

Management of Source Code Integrity Presented by O/o the Accountant General (A&E), Jammu and Kashmir.

MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.

Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Bigben Pittsburgh Supercomputing Center J. Ray Scott

March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

Introduction to Hadoop and HDFS

f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.

Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve ， Devendra Dahiphale ， Amit Chhajer 報告 : 饒展榕.

COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.

1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.

How computer’s are linked together.

1 Overview of the Application Hosting Environment Stefan Zasada University College London.

Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley

Introduction to the Grid N1™ Grid Engine 6 Software.

Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.

Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.

Review of Condor,SGE,LSF,PBS

Using Map-reduce to Support MPMD Peng

Unified scripts ● Currently they are composed of a main shell script and a few auxiliary ones that handle mostly the local differences. ● Local scripts.

1 Batch Processing And JES Stephen S. Linkin Houston Community College © HCCS and IBM 200 ©HCCS & IBM® 2008 Stephen Linkin.

Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human.

WSV207. Cluster Public Cloud Servers On-Premises Servers Desktop Workstations Application Logic.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Copyright © 2012, SAS Institute Inc. All rights reserved. SAS ® GRID AT PHAC SAS OTTAWA PLATFORM USERS SOCIETY, NOVEMBER 2012.

Using Map-reduce to Support MPMD Peng

HUBbub 2013: Developing hub tools that submit HPC jobs Rob Campbell Purdue University Thursday, September 5, 2013.

Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.

Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.

1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

Condor DAGMan: Managing Job Dependencies with Condor

Applied Operating System Concepts

OpenPBS – Distributed Workload Management System

Belle II Physics Analysis Center at TIFR

Hadoop MapReduce Framework

IW2D migration to HTCondor

Architecture & System Overview

Pablo Saiz CAF and Grid User Forum

Using the Parallel Universe beyond MPI

Introduction to Makeflow and Work Queue

Genre1: Condor Grid: CSECCR

Overview of Workflows: Why Use Them?

Licensing Overview January 2019.

Working in The IITJ HPC System

Presentation transcript:

TORQUE Kerry Chang CCLS December 13, 2010

O UTLINE Torque How does it work? Architecture MADA Demo Results Problems Future Improvements

T ORQUE – W HAT IS IT ? Open source project by Cluster Resources Inc. Cluster resource manager Manages batch jobs A series of programs to be executed without manual intervention Manages distributed compute nodes Distributed servers on which to execute batch jobs

T ORQUE A RCHITECTURE

T ORQUE S CHEDULER Currently using standard built-in schedule (FIFO) MOAB – more advanced scheduler

W HAT HAVE I DONE ? Used MADA as an application of TORQUE Treated the application as a blackbox Text parallelization on input Created a series of scripts for text manipulation and job submission to Torque queue Linear improvement in processing time by using Torque

MADA System for Morphological Analysis and Disambiguation for Arabic Input file is separated line by line

MADA A RCHITECTURE

H OW DO THE SCRIPTS WORK ? 1) First split the text file evenly across the number of specified jobs to be submitted 2) Create a script for each newly split text file e.g. If you wanted to run 5 jobs, split the text into 5 files and create a script to run each of the 5 files. 3) Submit each script to Torque 4) Concatenate the output of each script

D EMO Demonstration of Torque and MADA 3 Output Files file.bw file.bw.mada file.bw.mada.tok

R ESULTS 30 lines

R ESULTS 300 Lines

R ESULTS 3,000 Lines

R ESULTS 30,000 Lines

R ESULTS Network – Local Temp comparison (seconds) NetworkLocal TempImprovement ,4771, ,54413, ,105131,64913,456

P ROBLEMS How do we know when MADA has finished and we can concatenate the results? Where do we run MADA and have the results output to? Submission to compute node hangs Use smarter scheduler Supply machines dedicated to running Torque jobs

F UTURE I MPROVEMENTS Pipeline many jobs to Torque Work from local temp folders instead of on the network Split and rebuild certain output files by looking at provided testing.madaconfig file MADA TOKAN Preprocessor

Q UESTIONS