Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster December 3, 2013 - Matlab Parallelization with the Matlab Distributed.

Slides:



Advertisements
Similar presentations
Introduction to Grid Application On-Boarding Nick Werstiuk
Advertisements

MATLAB Parallel Computing Toolbox A.Hosseini Course : Professional Architecture.
Database Architectures and the Web
OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Parallel Computing in Matlab
SLA-Oriented Resource Provisioning for Cloud Computing
Introduction CSCI 444/544 Operating Systems Fall 2008.
Windows HPC Server 2008 Presented by Frank Chism Windows and Condor: Co-Existence and Interoperation.
Technical Architectures
Parallel Programming Models and Paradigms
Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.
Chapter 12 Distributed Database Management Systems
Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
18.337: Image Median Filter Rafael Palacios Aeronautics and Astronautics department. Visiting professor (IIT-Institute for Research in Technology, University.
 Introduction Introduction  Definition of Operating System Definition of Operating System  Abstract View of OperatingSystem Abstract View of OperatingSystem.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Matlab ® Distributed Computing Server CBI Laboratory Release: R2012a Sept. 28, 2012 By: CBI Development Team.
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
TPB Models Development Status Report Presentation to the Travel Forecasting Subcommittee Ron Milone National Capital Region Transportation Planning Board.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Computer System Architectures Computer System Software
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
© 2008 The MathWorks, Inc. ® ® Parallel Computing with MATLAB ® Silvina Grad-Freilich Manager, Parallel Computing Marketing
GPU Computing with CBI Laboratory. Overview GPU History & Hardware – GPU History – CPU vs. GPU Hardware – Parallelism Design Points GPU Software.
Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
INFORMATION SYSTEM-SOFTWARE Topic: OPERATING SYSTEM CONCEPTS.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
UNIX Unit 1- Architecture of Unix - By Pratima.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Enabling the use of e-Infrastructures with.
Full and Para Virtualization
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
NGS computation services: APIs and.
Parallel Computing with MATLAB Modified for 240A UCSB Based on Jemmy Hu University of Waterloo
Scaling up R computation with high performance computing resources.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Advanced Computing Facility Introduction
Matlab® Distributed Computing Server
GWE Core Grid Wizard Enterprise (
Introduction to Operating System (OS)
Advanced Computing Facility Introduction
Oct. 27, By: CBI Development Team
Building and running HPC apps in Windows Azure
Operating System Introduction.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster December 3, 2013 - Matlab Parallelization with the Matlab Distributed Computing Server at the CBI cluster CBI Workshop Upcoming Workshop Tuesday Dec. 3rd, 2013: "Matlab Parallelization with the Matlab Distributed Computing Server at the CBI cluster" The CBI has a 64-Worker Matlab Distributed Computing Server (MDCS) installed on the cluster. The MDCS can not only speed up your computations, but can also eliminate the standard Matlab license usage. This workshop will teach you how to write a parallel Matlab code and how to run the code on the MDCS from your desktops/laptops. Agenda: Basic Parallelization Concepts presentation Utilizing the Distributed Computing Server on the CBI Cheetah Cluster with Matlab Hands-on demos When: 10:00am - 12:00pm, Tuesday Dec. 3rd, 2013 Speakers: Zhiwei Wang, the director of CBI, Nelson Ramirez, the senior software developer at CBI Where: CBI, UTSA, BSE 3.114 Acknowledgements: David Noriega: CBI System Administrator.

Overview Parallelization with Matlab using Parallel Computing Toolbox(PCT) Matlab Distributed Computing Server Introduction Benefits of using the MDCS Hardware/Software/Utilization @ CBI MDCS Usage Scenarios Hands-on Training

Parallelization with Matlab PCT The Matlab Parallel Computing Toolbox provides access to multi-core, multi-system(MDCS), GPU parallelism. Many built-in Matlab functions directly support parallelism ( e.g. FFT ) transparently. Parallel constructs such as going from for loops to parfor loops. Allows handling of many different types of parallel software development challenges. MDCS allows scaling of locally developed parallel enabled Matlab applications. 12 workers locally on a multicore desktop For loops  parfor Interactive(pmode) & Batch mode Distributed arrays ( spmd )

Parallelization with Matlab PCT Distributed / Parallel algorithm characteristics Memory Usage & CPU Usage Load a 4 Gigabyte file into Memory  Calculate averages Communication/Data IO patterns Read file 1 ( 10 Gigabytes )  Run a function Worker B Send data to worker A  run a function  return data to worker B Dependencies Function 1 Function 2 Function 3 Hardware resource contention ( e.g. 16 cores each trying to read /write a set of files, bandwidth limitations on RAM ) Managing large #’s of small files  Filesystem contention Key: Focus on dependency analysis How much of your program is independent determines potential parallelism at fixed data size( Amdahl’s Law ) [ S(N) = 1 / [ (1-P)+(P/N)], N-> inf, Speedup -> 1/(1-P) Gustafson’s Law ( You might not be able to get rid certain serial dependencies, but you may be able to tackle larger problems with more workers in the parallel sections ) S(P) = P-alpha*(P-1), P=# processors, S=speedup, alpha = sequential fraction of parallel runtime e.g. Car average mph example: ( Amdahl’s law: total trip distance is fixed, Gustafson’s law: total trip distance can expand as you have more “fuel” “parallel compute power” Data transfer vs. Compute ( Arithmetic Intensity ) Cost of moving the data from CPU to GPU needs to be taken into account. GPU may provide large benefit when ( compute >> data I/O ) Going to the store to get 100 items with 10 workers: you ideally only want to make 1 trip for all 100 items Even if all 10 workers go to get their items in parallel, not much benefit if you make 10 round trips. Resource contention Data transfer bandwidth( Memory bandwidth, Network bandwidth ) Resource limits ( memory, disk ) Hardware limits Memory cache line sizes, Memory alignment issues, Disk block sizes, Cache sizes, # Queues, etc. Physical data organization ( e.g. Row Major vs. Column Major ) Conditional (if-else) minimization Ideally you would hope to have 0 if statements in your functions…. Not always feasible for algorithm correctness. Synchronization Algorithm correctness many times requires some type of synchronization Many more variables affect function, program, … as well as system level parallelism…. A function may be highly parallelizable, but overall system parallelism may involve looking at different levels of parallel to achieve good solution.

Parallelization with Matlab PCT Applications have layers of parallelism: For optimal solution, must look at the application as a whole. Scalability: use as many workers as possible in an efficient manner Matlab PCT + MDCS framework automates much of the complexity in developing parallel & distributed apps Clusters The Matlab PCT & MDCS addresses all these layers. MDCS Worker Processes ( a.k.a. “Labs”) The workers never request regular Matlab or toolbox licenses. The only license an MDCS worker ever uses is an MDCS worker license. Toolboxes are unlocked to an MDCS worker based on the licenses owned by the client during the job submission process. CPU’s, Multi-Cores GPU Cards/External Accelerator Cards

Parallelization with Matlab PCT & MDCS CPU’s, Multi-Cores MDCS Cluster Distributed loops: parfor Interactive development mode (matlabpool/pmode) Distributed Arrays(spmd) Scale out with the MDCS Cluster in Batch Job Submission Mode Develop algorithms on your local system, at the multi-core level; then seamlessly scale on the MDCS cluster @ CBI.

MDCS Benefits Performance: Scaling in compute & memory MDCS Worker Processes ( a.k.a. “Labs”) The workers never request regular Matlab or toolbox licenses. The only license an MDCS worker ever uses is an MDCS worker license( of which we have up to 64 ). Toolboxes are unlocked to an MDCS worker based on the licenses owned by the client during the job submission process. Wonderful parallel algorithm development environment with the superior visualization & profiling capabilities of the Matlab environment. Many built-in functions are parallel enabled: fft, lu, svd… Distributed arrays allow development of data – parallel algorithms Enable the scaling of codes that cannot be compiled using the Matlab Compiler Toolbox. Allows you to go from development on a laptop directly to running on up to 64 MDCS Labs. ( Some simulations can go from years of runtime to days of runtime on 64 MDCS Labs) The Matlab Distributed Computing Server is a cluster software infrastructure built over the Message Passing Interface to allow the scaling of Parallel Compute Toolbox enabled codes. Performance: Provides compute+memory scalability Licensing: Workers only need their MDCS worker license( 64 @ CBI Lab ), leaving regular Matlab and Toolbox licenses available for others to use(e.g. Statistics Toolbox). A regular Matlab license + Parallel Compute Toolbox license is only needed during the job submission process. Parallel Compute Toolbox constructs (parfor ~ OpenMP, spmd ~M PI, spmd codistributed arrays ~ MPI) scale seamlessly from a local system to the Distributed Server. Code development can be done on a user's workstation, then, when ready, the MDCS can be used to scale that code in both memory and compute dimensions. Multiple levels of parallelism can be implemented using the MDCS: Independent jobs(Distributed jobs ~ Task Computing) Complex fully parallel algorithms that require inter-process communication and synchronization( Parallel jobs~ parfor, spmd, labSend, labReceive, spmd Co-distributed arrays) Performance: Scaling in compute & memory Running PCT code on MDCS Profile vs Local Profile On a local profile, limit of 12 labs( R2012a) + Memory limits, IPC limits Up to 64 labs on CBI MDCS cluster ( limited by MDCS worker licenses ) Minimize regular Matlab +Toolbox license utilization( e.g. Statistics toolbox ) Each MDCS worker uses only a single MDCS worker license No regular Matlab licenses or toolbox licenses are checked out by MDCS workers Running code requiring non-compilable toolboxes ( SimBiology, others ) without using up licenses Job queues allow scaling to large number of jobs For example, running many jobs for a parameter scan of a time consuming parallel enabled simulation. Submit the jobs and the MDCS scheduler will manage the rest. Rapid prototyping of parallel algorithms Using Matlab+PCT+MDCS instead of C/C++/Fortran+OpenMP+ MPI directly Memory scaling w/ co-distributed arrays Minimize single-node memory utilization Can enable processing larger datasets in a distributed manner. Many built-in algorithms & toolboxes have some Parallel Compute Toolbox enablement e.g. fft, lu, svd, many more with SPMD(co-distributed arrays)

MDCS Structure The MDCS cluster is accessible via the Cheetah system @ the CBI Laboratory. ssh –Y username@cheetah.cbi.utsa.edu qlogin This takes you to an interactive development node, where you can setup your connection to the MDCS cluster. ( An allocation needs to be created on a per project basis, during a consulting meeting. )

Hardware/Software/Utilization @ CBI MDCS worker processes run on 4 physical servers Dell PowerEdge M910: Four x 16 core systems, 4x64GB RAM, 2x Intel Xeon 2.26 Ghz/system with 8 cores per processor Total of 64 cores, with 256 GB total RAM distributed among systems Max 64 MDCS worker licenses available Subsets of MDCS workers can be created based on project needs Physical server implementation of MDCS @ CBI LAB.

Usage scenarios Local system: Interactive Use: ( matlabpool / spmd / pmode / mpiprofile ) Local system(e.g. one of the Workstations @ CBI ) as part of initial algorithm development. MDCS: Non-interactive Use: Job&Task based 2 main types: Independent vs. Communicating Jobs Both types can be used with either the local( on a non-cluster workstation ) or MDCS profile. - Local mode should be used for design/development, but for performance testing, the MDCS should be used. In local mode, each worker(“lab”) is mapped to an OS process running a Matlab worker. Starting up workers locally incurs overhead. - Up to 12 local workers can be used ( e.g. on a local workstation or laptop with Matlab + PCT ). The key point is that the same exact code that is developed locally can be run on the MDCS.

2 main types of workloads can be implemented with the MDCS: MDCS Workloads 2 main types of workloads can be implemented with the MDCS: A job is logically decomposed into a set of tasks. The job may have 1 or more tasks, and each task may or may not have additional parallelism within it. CASE 1: Independent Within a job the parallelism is fully independent, we have the opportunity to use MDCS workers to offload some of the independent work units. The code will not make use parallel language features such as parfor, spmd. Note: In many cases, parfor can be transformed into a set of tasks. createJob() + createTask(), createTask(), … createTask() CASE 2: Communicating Within a single job the parallelism is more complex, requiring the workers to communicate or when parfor, spmd, codistributed arrays(language features are used from Parallel Compute Toolbox). createCommunicatingJob(), createTask() Case 1: Just like a grid scheduler. Each task is completely independent. For example, this way is well suited for parameters scanning types of workloads. Many times, parfor can be converted into a set of independent tasks and submitted to the MDCS. Case 2: There is communication within a single Task. For example, parfor requires communication between workers involved since the work of a single loop must be partitioned, data must be transferred to workers, and results must be gathered on the main node. All this is handled automatically by Matlab Parallel Compute Toolbox. CASE 1: Independent Within a job the parallelism is fully independent, we have the opportunity to use MDCS workers to offload some of the independent work units. The code will not make use parallel language features such as parfor, spmd. Note: In many cases, parfor can be transformed into a set of tasks. createJob() + createTask(), createTask(), … createTask() CASE 2: Communicating Within a single job the parallelism is more complex, requiring the workers to communicate or when parfor, spmd, codistributed arrays(language features are used from Parallel Compute Toolbox). createCommunicatingJob(), createTask() Note: interactive use of the matlabpool/spmd command is not recommended on the MDCS since it will lock up workers and bypasses Matlab scheduler. However, using it with the local configuration on a workstation is a useful way to develop and test your algorithm. Only 1 task can be created within a communicating job. matlabpool command should never found in the source code submitted to MDCS.

MDCS Working Environment Click on the “Parallel” Button

MDCS Working Environment The Cluster Profile Manager allows you to run tests to ensure your connection to the MDCS is working properly.

Interactive Mode Sample(parfor) For well mapping workloads, parfor can yield exceptional performance improvement From years to days / days to hours for certain workloads: ideally case are long running jobs with little or no inter-job communication. Standard for loop In interactive mode, ( matlabpool ) parfor will automatically distribute the work for the set of loop iterations amongst MDCS workers. It is important to only run in interactive mode if you have exclusive access reserved for a set of MDCS workers, as interactive sessions impede others running on the MDCS cluster. % Matlab singleThreaded tic; for i = 1:64 for j = 1:100 % Some very long running process dataaverage(i) = mean(mean(fft2(rand(1000,1000)))); end toc; [ 484.43 seconds running on compute-5-1, singleThreaded] % Matlab implicit parallelism ( Matlab by default tries to use as many cores as are available ) [ 206.53 seconds running on compute-5-1, 16 cores, Matlab implicit multi-threading ] % Parfor demo matlabpool open 2 parfor i = 1:64 matlabpool close [260.191 seconds running on MDCS] matlabpool open 4 [152.24 seconds running on MDCS] matlabpool open 8 [84.007462 seconds running on MDCS ] matlabpool open 16 [48.179335 seconds running on MDCS ] matlabpool open 20 [ 41.122800 seconds running on MDCS ] matlabpool open 24 [ 37.596923 seconds running on MDCS ] matlabpool open 28 [ 35.194691 seconds running on MDCS ] matlabpool open 32 [ 30.990455 seconds running on MDCS ] matlabpool open 48 [ 30.002205 seconds running on MDCS ] matlabpool open 64 [ 22.792108 seconds running on MDCS ] Parfor enabled on the MDCS

MDCS Scaling ( Batch Mode ) Processing many images in batch mode, with 1 job + independent tasks for each image to be processed. Many times better scaling is achieved by moving up a level on the parallelism ladder. Instead of assigning more and more workers to process a single image, why not process more and more images with 1 worker per image?

MDCS Scaling( Batch mode ) MDCS job display panel.

MDCS Scaling ( Batch mode ) Results of batch parameter scan using MDCS Workers with 1 image per worker.

Summary Applied examples of using MDCS in Batch mode available as part of hands-on section or via consulting appointment for more in-depth MDCS usage information. We can allocate a subset of MDCS workers on a per project basis. Summary.

Summary Wonderful parallel algorithm design & development environment Scale out codes up to 64 Matlab MDCS workers Both distributed compute & memory Standard Matlab+Toolbox license usage minimization Many options to approach parallelization of computational workloads. Parfor, spmd, distributed arrays, communicating jobs, batch independent jobs….

Acknowledgements This project received computational, research & development, software design/development support from the Computational System Biology Core/Computational Biology Initiative, funded by the National Institute on Minority Health and Health Disparities (G12MD007591) from the National Institutes of Health. URL: http://www.cbi.utsa.edu

Contact Us http://cbi.utsa.edu

Appendix A See the parforFFTdemo.m file for full source code.

Local Mode: Matlab Worker Process/Thread Structure Parallel Toolbox constructs can be tested in local mode, the “lab” abstraction allows the actual process used for a lab to reside either locally or on a distributed server node. MPI used for inter-process communication between “Labs”, Matlab Worker Processes Note: Matlab uses as many threads as there are physical cores on a system by default. http://www.mathworks.com/help/matlab/ref/maxnumcompthreads.html ( The –singleCompThread option can be added to the command line when starting Matlab to force Matlab to use a single computational thread ) The maxNumCompThreads function will return the current maximum number of computational threads being used.

Local Mode Scaling Sample(parfor) Using more workers(“labs”) than available physical cpu cores will not improve performance.

Interactive Mode Sample(pmode/spmd) Each lab handles a piece of the data. Results are gathered on lab 1. Client session requests the complete data set to be sent to it using lab2client In interactive mode, ( pmode ) you have direct command line access to multiple labs. Where each lab is identified by the variable “labindex” This allows you to create distributed arrays and have each workers process different sections of the matrix. http://www.mathworks.com/help/distcomp/spmd.html http://www.mathworks.com/help/distcomp/gather.html

Local vs. MDCS Mode Compare (parfor) Shows adding more workers to process a single image. There are many options when choosing a parallelization strategy, might be better to use a single worker per image and run the through a set of batch jobs with 1 job per image.

Appendix B: MDCS Access Access to MDCS provided via Cheetah Cluster. On Linux: ssh –Y username@cheetah.cbi.utsa.edu qlogin matlab & Access available via both Windows & Linux systems.

Appendix B: MDCS Access Access to MDCS provided via Cheetah Cluster. On Windows: Using PuTTY + Xming w/X11 forwarding qlogin matlab & Access available via both Windows & Linux systems. Refer to the CBI xforwarding guide: https://www.cbi.utsa.edu/faq/xforwarding

References [1] http://www.mathworks.com/products/parallel-computing/ ( Parallel Computing Toolbox reference ) [2] http://www.mathworks.com/help/toolbox/distcomp/f1-6010.html#brqxnfb-1 (Parallel Computing Toolbox) [3] http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html ( Parallel Computing Toolbox ) [4] http://www.mathworks.com/products/distriben/supported/license-management.html ( MDCS License Management ) [5] http://www.mathworks.com/cmsimages/dm_interact_wl_11322.jpg ( MDCS Architecture Overview ) [6] http://www.mathworks.com/cmsimages/62006_wl_mdcs_fig1_wl.jpg ( MDCS Architecture Overview: Scalability ) [7] http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html ( Built-in MDCS support ) [8] http://www.mathworks.com/products/datasheets/pdf/matlab-distributed-computing-server.pdf ( MDCS Licensing ) [9] http://www.psc.edu/index.php/matlab ( MDCS @ PCS) [10] http://www.mathworks.com/products/compiler/supported/compiler_support.html ( Compiler Support for MATLAB and Toolboxes ) [11] http://www.mathworks.com/support/solutions/en/data/1-2MC1RY/?solution=1-2MC1RY ( SGE Integration ) [12] http://www.mathworks.com/company/events/webinars/wbnr30965.html?id=30965&p1=70413&p2=70415 ( MDCS Administration ) [13] http://www.mathworks.com/help/toolbox/mdce/f4-10664.html ( General MDCE Workflow ) [14] http://www.mathworks.com/help/toolbox/distcomp/f3-10664.html ( Independent Jobs with MDCS ) [15] http://cac.engin.umich.edu/swafs/training/pdfs/matlab.pdf ( MDCS @ Umich ) [16] http://www.mathworks.com/products/optimization/examples.html?file=/products/demos/shipping/optim/optimparfor.html ( Optimization toolbox example ) [17] http://www.mathworks.com/products/distriben/examples.html ( MDCS Examples ) [18] http://www.mathworks.com/support/product/DM/installation/ver_current/ ( MDCS Installation Guide R2012a ) [19] http://www.psc.edu/index.php/matlab ( MDCS @ PSC ) [20] http://rcc.its.psu.edu/resources/software/dmatlab/ ( MDCS @ Penn State ) [21] http://ccr.buffalo.edu/support/software-resources/compilers-programming-languages/matlab/mdcs.html ( MDCS @ U of Buffalo) [22] http://www.cac.cornell.edu/wiki/index.php?title=Running_MDCS_Jobs_on_the_ATLAS_cluster ( MDCS @ Cornell ) [23] http://www.mathworks.com/products/distriben/description3.html ( MDCS Licensing ) [24] http://www.mathworks.com/cmsimages/dm_interact_wl_11322.jpg ( MDCS Architecture )

References [25] http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/bqxooam-1.html ( Built-in functions that work with distributed arrays ) [26] http://www.rz.rwth-aachen.de/aw/cms/rz/Themen/hochleistungsrechnen/nutzung/nutzung_des_rechners_unter_windows/~sxm/MATLAB_Parallel_Computing_Toolbox/?lang=de ( MDCS @ Aachen University ) [27] http://www.mathworks.com/support/solutions/en/data/1-9D3XVH/index.html?solution=1-9D3XVH ( Compiled Matlab Applications using PCT + MDCS) [28] http://www.hpc.maths.unsw.edu.au/tensor/matlab ( MDCS @ UNSW ) [29] http://blogs.mathworks.com/loren/2012/04/20/running-scripts-on-a-cluster-using-the-batch-command-in-parallel-computing-toolbox/ ( Batch command ) [30] http://www.rcac.purdue.edu/userinfo/resources/peregrine1/userguide.cfm#run_pbs_examples_app_matlab_licenses_strategies ( MDCS @ Purdue ) [31] http://www.mathworks.com/help/pdf_doc/distcomp/distcomp.pdf ( Parallel Computing Toolbox R2012a ) [32] http://www.nccs.nasa.gov/matlab_instructions.html ( MDCS @ Nasa ) [33] http://www.mathworks.com/help/toolbox/distcomp/rn/bs8h9g9-1.html ( PCT, MDCS R2012a interface changes ) [34] http://www.mathworks.com/help/toolbox/distcomp/createcommunicatingjob.html ( Communicating jobs ) [35] http://www.mathworks.com/products/parallel-computing/examples.html?file=/products/demos/shipping/distcomp/paralleltutorial_dividing_tasks.html ( Moving parfor loops to jobs+tasks ) [36] http://people.sc.fsu.edu/~jburkardt/presentations/fsu_2011_matlab_tasks.pdf ( MDCS @ FSU: Task based parallelism ) [37] http://www.icam.vt.edu/Computing/fdi_2012_parfor.pdf ( MDCS @ Virginia Tech: Parfor parallelism ) [38] http://www.hpc.fsu.edu/ ( MDCS @ FSU, HPC main site ) [39] http://www.mathworks.com/help/toolbox/distcomp/rn/bs8h9g9-1.html ( PCT Updates in R2012a ) [40] http://www.mathworks.com/help/distcomp/using-matlab-functions-on-codistributed-arrays.html ( Built in functions available for Co-Distributed arrays ) [41] http://scv.bu.edu/~kadin/Tutorials/PCT/matlab-pct.html ( Matlab PCT @ Boston University ) [42] http://www.circ.rochester.edu/wiki/index.php/MatlabWorkshop#Example_using_distributed_arrays_for_FFT [43] http://www.advancedlinuxprogramming.com/alp-folder/alp-ch04-threads.pdf [44] http://www.mathworks.com/products/distriben/parallel/accelerate.html [45] http://www.mathworks.com/products/distriben/examples.html?file=/products/parallel-computing/includes/parallel.html [46] http://en.wikipedia.org/wiki/Gustafson%27s_law [47] http://www.mathworks.com/help/distcomp/index.html [48] http://www.mathworks.com/cmsimages/43623_wl_dm_using_paralles_forloops_wl.jpg [49] http://www.mathworks.com/help/distcomp/mpiprofile.html