Some Design Notes Iteration - 2 Method - 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine - SQlite - additional overhead? Creates an HPC job per file and sends ACK Store the job ID and file ID in a database table Asynchronously check for job status and once a job completes with exit 0, send ack for the corresponding file id - Can this be done? Another option is to acknowledge immediately and if the job the fails resend the message - Can this be done? Extractor processing logic Resides in the HPC Is called by the extractor main program Downloads file using Clowder APIs Processes the files and uploads back previews and metadata using APIs Method - 1A All above steps are included Staging in and staging out SFTP the files from main extractor to HPC file system Helps in optimizing HPC time usage Method - 2 An elasticity control script listens to RabbitMQ messages Once the number of messages increase, it creates multiple instances of a special extractor This is more of a manual approach
Start Wait for RabbitMQ message SSH into Login node Run PBS script to submit Job to HPC queue Job Status? Queued / Running Send ACK to RabbitMQ End Failed Process File Use HPC? Yes No pyClowder HPC Flowchart (Iteration 2) Create PBS Script from HPC XML file and Config file Database Store HPC job ID, file ID, and status Read settings from Config file Read settings from Config file Completed with Exit Status 0 Get record Connect to RabbitMQ Message received? Message received? Yes No Upload Preview / Metadata Update records Wait in HPC Queue Job Picked up by HPC? Process File Upload Preview / Metadata Yes No Synchronous steps Asynchronous steps Inside HPC Environment Exit from Login node
Data/Meta data (MongoDB) Extraction Bus (RabbitMQ) Clowder Web Application Web Browser Client Clowder VM Data/Meta data (MongoDB) Main Extractor Extractor VM Job #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): GCN-51 Job #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): GCN-65 Job #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): GCN-40 Clowder APIs HPC Compute Nodes pyClowder HPC XML File HPC XML File pyClowder HPC Architecture Diagram (Iteration 2)
Some Design / Implementation Questions (from JIRA) Iteration - 1 Does the extractor program files need to be copied to the login node through program? If code compilation is needed this might created additional overhead. – Assume that the program is present in the HPC environment in the compiled format Another option is to assume that the extractor will run from within the HPC environment. I.e. the code is already present in the HPC. Is this a safe assumption to make? – Yes What are the exceptions that need to be handled? – Exception in main extractor – Exception in extraction job in HPC – Job aborts due to reasons at HPC side - requested wall-time or memory exceeded – The VM from where the main extractor is running crashes? What is expected out of the user who sets up the HPC extractor? Or what shall be provided by pyClowder and what shall be done by the one who writes the extractor? – Try to make this as generic as possible. Can extractor logic be put in a separate file? Otherwise, how will the HPC machine pick up the job file? – Need to find a workaround. Need to keep the extractor structure unchanged.
Start Get RabbitMQ message Use HPC? SSH into Login node Transfer Extractor Program to Login Node via SFTP Submit Job to HPC queue Job Status? Yes Queued / Running Send ACK to RabbitMQ Completed with Exit Status 0 End Failed Process File No Extraction Successful? Yes No Flowchart (Iteration 1)
Data/Meta data (MongoDB) Extraction Bus (RabbitMQ) Clowder Web Application Web Browser Client Clowder VM Data/Meta data (MongoDB) Main Extractor Extractor VM Job #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): GCN-51 Job #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): GCN-65 Job #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): #!/usr/bin/env python import pika import sys import logging import json import traceback... def main(): GCN-40 Clowder APIs HPC Compute Nodes pyClowder HPC XML File HPC XML File Architecture Diagram (Iteration 1)