Python on AWS Lambda: Practical Applications Brian Morton Chief Information Officer - DVIDS
Online, searchable archive for DoD public media 1.5 million images 300K+ videos Fully open source stack* All content copyright free Publicly accessible API
*Classic City Computing Partnership
AWS Basic Glossary AWS = Amazon Web Services (the cloud) EC2 = Elastic Compute (virtual hosts) S3 = Simple Storage Service (storage in the cloud) ELB = Elastic Load Balancer (Haproxy in the cloud) SES = Simple Email Service (email in the cloud) SQS = Simple Queue Service (RabbitMQ in the cloud) RDS = Relational Database Service (Mysql in the cloud) Lambda = “Serverless” compute
AWS Lambda “Function” as infrastructure Uses spare cycles on AWS compute nodes (EC2 virtualization hosts) Bills in 100ms increments of CPU time based on memory profile allocated Runs in response to many types of events (S3 file delivery, SNS/SQS, SES) Designed to run up to 100 concurrent instances by default (can be exceeded) Languages currently supported: Java, Node.JS, Python More info: https://aws.amazon.com/lambda/faqs/
Lambda Example virtualenv zip import MySQLdb import PIL import django
Practical Application 1: Image Resizing https://bitbucket.org/dvids/awslambda-python-resize-image Performs read/write with S3 bucket Reads original image from bucket path Performs resizing according to json config in package Can be overridden through event arguments Writes resized images back to bucket Could be called from S3 upload, API Gateway, Simple Email Service and others
Allocated Used Image Size Memory (MB) Price per 100ms ($) Time (ms) Cost Total Cost 2604482 128 0.000000208 n/a 192 0.000000313 256 0.000000417 210 17000 0.00007089 170.136 320 0.000000521 13000 0.00006773 162.552 384 0.000000625 10900 0.000068125 163.5 448 0.000000729 9500 0.000069255 166.212 512 0.000000834 8500 576 0.000000938 229 7800 0.000073164 175.5936 640 0.000001042 7200 0.000075024 180.0576 1152 0.000001875 4200 0.00007875 189 1216 0.00000198 5200 0.00010296 247.104 1280 0.000002084 3800 0.000079192 190.0608 1344 0.000002188 3400 0.000074392 178.5408 1408 0.000002292 3600 0.000082512 198.0288 1472 0.000002396 3300 0.000079068 189.7632 1536 0.000002501 3200 0.000080032 192.0768 Asset count 2400000
Practical Application 2: Social Data Mining https://github.com/rokclimb15/aws-lambda-python-twitter-search Searches Twitter for posts containing words or hashtags Topic and item count can be overridden through event args Returns number of Tweets, input topic, and length of time to search Wired to API Gateway with public endpoint Next logical step might be storage in Solr or RDS for analysis Could be called from API endpoint, or perhaps SES?
Sample Application: ETL Data Processing Following diagram shows ETL data processing pipeline S3 file upload triggers Lambda Initial function parses file, writes records to SQS Each SNS from queue instantiates a new Lambda Processing function does work (validation, lookup, soundex, etc) Data written to RDS or Redshift
1. File upload to S3 bucket 2. Parsing/validate Lambda invoked 3. Records queued to SQS 4. SNS notifications instantiate processing Lambdas 5. Processed data written to persistent storage
Questions: bmorton@dvidshub.net