ISDA + OpenStack Rob Kooper
BrownDog
NSF ACI Data Program Geoffrey Fox $5,000,000 2014-2019 Ken Koedinger Middleware and High Performance Analytics Libraries for Scalable Data Science Ken Koedinger $4,830,819 2014-2019 Building a Scalable Infrastructure for Data-Driven Discovery and Innovation in Education Kenton McHenry $10,519,716 2013-2018 Kenton McHenry $10,519,716 2013-2018 Alex Szalay $7,603,723 2013-2018 Long Term Access to Large Scientific Data Sets: The SkyServer and Beyond Michael Levine $4,902,601 2013-2018 The Data Exacell Xiaohui Carol Song $3,409,029 2013-2018 Integrating Geospatial Capabilities into HUBzero Reagan Moore $8,300,992 2011-2016 Steven Ruggles $7,993,266 2011-2016 Margaret Hedstrom $8,000,000 2011-2016 Margaret Hedstrom $8,000,000 2011-2016 Bill Michener $21,194,548 2009-2014 Golam Choudhury $10,085,120 2009-2014
“Big Data” At least two big components: Large quantities of data Large varieties of data “Long-Tail” Number of grants Dollars http://www.slideshare.net/rheimann04/big-social-data-the-social-turn-in-big-data http://www.slideshare.net/rheimann04/big-social-data-the-social-turn-in-big-data
The Problem Addressed by Brown Dog Large collections of un-curated and/or unstructured digital data (“long-tail” data) Many file formats No metadata No useful filenames No useful directory structure No textual contents
What Is Needed Means of deciphering the bytes that make up digital data so that one can retrieve its contents Data Structures (e.g. images, 3D points, sound waves, strings, fields, matrices, etc…) Means of indexing data contents so that large collections of data can be searched and desired data found An ability to compare data
What Is Typically Needed To Do This The file format specifications describing how contents are represented within the file’s bytes, the software used to create and view the data, software to convert to a format that is accessible, and the execution environment (platform, operating system, libraries, other software, etc…). The existence of metadata describing the data (possibly as simple as useful file/directory names), in order to search/index data.
Clowder
Manage Raw Files and Derived Metadata Image taken from camera
File Uploaded to DTS
Extracted Metadata in Web Interface
Extracted Metadata from Service API
RabbitMQ RabbitMQ vhost (clowder) Image Extractor clowder.ncsa Clowder *.file.image.# ncsa.image.preview Image Extractor *.file.image.# *.file.image.# dts.ncsa DTS Faces Face Extractor *.file.image.# *.file.composed.zip Shape File imlczo.ncsa IMLCZO GeoExtractor GeoTiff File *.file.image.tiff Clowder instances Exchanges Bindings Queues Extractors
OpenStack
Projects 5 projects + 1 generalized project ISDA project for our group Allows members to start/stop test instances General instances BrownDog Compute nodes + data nodes Other projects 1 compute node + 1 data node split into smaller pieces
Servers No more ISDA vm servers Use openstack to host server Volumes store server information Use puppet to manage servers Easy to create instances Command line access to create server
Servers @ ISDA Currently using 134 monitored machines 12 physical machines 102 VM on XEN + ESXI 6 VM on openstack (will only increase)
Elasticity and BrownDog RabbitMQ used for messages Every operation is a message A message queue for each operation Elasticity code monitors RabbitMQ Based on number of message start new instances If load below number of messages stop instances Elasticity code can start VM images Multiple instances of code in VM image Docker images
Throw Away Instances Many of the same instances Used for running same software many times Clowder Extractors Clowder Tool instances Use CORE-OS with docker Pass in cloud-init to initialize instance At boot time download docker container and start All instances can be turned off and restarted