Download presentation
Presentation is loading. Please wait.
Published byMabel Stevenson Modified over 9 years ago
1
TEXAS ADVANCED COMPUTING CENTER Deployment of NMI Components on the UT Grid Shyamal Mitra
2
2 Outline TACC Grid ProgramTACC Grid Program NMI Testbed ActivitiesNMI Testbed Activities Synergistic ActivitiesSynergistic Activities –Computations on the Grid –Grid Portals
3
3 TACC Grid Program Building GridsBuilding Grids –UT Campus Grid –State Grid (TIGRE) Grid ResourcesGrid Resources –NMI Components –United Devices –LSF Multicluster Significantly leveraging NMI Components and experienceSignificantly leveraging NMI Components and experience
4
4 Resources at TACC IBM Power 4 System (224 processors, 512 GB Memory, 1.16 TF)IBM Power 4 System (224 processors, 512 GB Memory, 1.16 TF) IBM IA-64 Cluster (40 processors, 80 GB Memory, 128 GF)IBM IA-64 Cluster (40 processors, 80 GB Memory, 128 GF) IBM IA-32 Cluster (64 processors, 32 GB Memory, 64 GF)IBM IA-32 Cluster (64 processors, 32 GB Memory, 64 GF) Cray SV1 (16 processors, 16 GB Memory, 19.2 GF)Cray SV1 (16 processors, 16 GB Memory, 19.2 GF) SGI Origin 2000 (4 processors, 2 GB Memory, 1 TB storage)SGI Origin 2000 (4 processors, 2 GB Memory, 1 TB storage) SGI Onyx 2 (24 processors, 25 GB Memory, 6 Infinite Reality-2 Graphics pipes)SGI Onyx 2 (24 processors, 25 GB Memory, 6 Infinite Reality-2 Graphics pipes) NMI components Globus and NWS installed on all systems save the Cray SV1NMI components Globus and NWS installed on all systems save the Cray SV1
5
5 Resources at UT Campus Individual clusters belonging to professors inIndividual clusters belonging to professors in –engineering –computer sciences –NMI components Globus and NWS installed on several machines on campus Computer laboratories having ~100s of PCs in the engineering and computer sciences departmentsComputer laboratories having ~100s of PCs in the engineering and computer sciences departments
6
6 Campus Grid Model “Hub and Spoke” Model“Hub and Spoke” Model Researchers build programs on their clusters and migrate bigger jobs to TACC resourcesResearchers build programs on their clusters and migrate bigger jobs to TACC resources –Use GSI for authentication –Use GridFTP for data migration –Use LSF Multicluster for migration of jobs Reclaim unused computing cycles on PCs through United Devices infrastructure.Reclaim unused computing cycles on PCs through United Devices infrastructure.
7
7 UT Campus Grid Overview LSF
8
8 NMI Testbed Activities Globus 2.2.2 – GSI, GRAM, MDS, GridFTPGlobus 2.2.2 – GSI, GRAM, MDS, GridFTP –Robust software –Standard Grid middleware –Need to install from source code to link to other components like MPICH-G2, Simple CA Condor-G 6.4.4 – submit jobs using GRAM, monitor queues, receive notification, and maintain Globus credentials. LacksCondor-G 6.4.4 – submit jobs using GRAM, monitor queues, receive notification, and maintain Globus credentials. Lacks –scheduling capability of Condor –checkpointing
9
9 NMI Testbed Activities Network Weather Service 2.2.1Network Weather Service 2.2.1 –name server for directory services –memory server for storage of data –sensors to gather performance measurements –useful for predicting performance that can be used for a scheduler or “virtual grid” GSI-enabled OpenSSH 1.7GSI-enabled OpenSSH 1.7 –modified version of OpenSSH that allows login to remote systems and transfer files between systems without entering a password – requires replacing native sshd file with GSI-enabled OpenSSH
10
10 Computations on the UT Grid Components used – GRAM, GSI, GridFTP, MPICH-G2Components used – GRAM, GSI, GridFTP, MPICH-G2 Machines involved – Linux RH (2), Sun (2), Linux Debian (2), Alpha Cluster (16 processors)Machines involved – Linux RH (2), Sun (2), Linux Debian (2), Alpha Cluster (16 processors) Applications run – PI, Ring, SeismicApplications run – PI, Ring, Seismic Successfully ran a demo at SC02 using NMI R2 componentsSuccessfully ran a demo at SC02 using NMI R2 components Relevance to NMIRelevance to NMI –must build from source to link to MPICH-G2 –should be easily configured to submit jobs to schedulers like PBS, LSF, or Loadleveler
11
11 Computations on the UT Grid Issues to be addressed on clustersIssues to be addressed on clusters –must submit to local scheduler: PBS, LSF or Loadleveler –compute nodes on subnet; cannot communicate with compute nodes on another cluster –must open ports through firewall for communication –version incompatibility – affects source code that are linked to shared libraries
12
12 Grid Portals HotPage – web page to obtain information on the status of grid resourcesHotPage – web page to obtain information on the status of grid resources –NPACI HotPage (https://hotpage.npaci.edu) –TIGRE Testbed portal (http://tigre.hipcat.net) Grid Technologies EmployedGrid Technologies Employed –Security: GSI, SSH, MyProxy for remote proxies –Job Execution: GRAM Gatekeeper –Information Services: MDS (GRIS + GIIS), NWS, Custom information scripts –File Management: GridFTP
13
13
14
14 GridPort 2.0 Multi-Application Arch. (Using Globus as Middleware)
15
15 Future Work Use NMI components where possible in building gridsUse NMI components where possible in building grids Use Lightweight Campus Certificate Policy for instantiating a Certificate Authority at TACCUse Lightweight Campus Certificate Policy for instantiating a Certificate Authority at TACC Build portals and deploy applications on the UT GridBuild portals and deploy applications on the UT Grid
16
16 Collaborators Mary ThomasMary Thomas Dr. John BoisseauDr. John Boisseau Rich ToscanoRich Toscano Jeson MartajayaJeson Martajaya Eric RobertsEric Roberts Maytal DahanMaytal Dahan Tom UrbanTom Urban
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.