ISDA + OpenStack Rob Kooper.

Slides:



Advertisements
Similar presentations
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Advertisements

Management Information Systems, Sixth Edition
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Presented by Mina Haratiannezhadi 1.  publishing, editing and modifying content  maintenance  central interface  manage workflows 2.
Flexible Services for the Support of Research Project Overview.
Cloud Computing Systems Lin Gu Hong Kong University of Science and Technology Sept. 21, 2011 Windows Azure—Overview.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
© 2012 IBM Corporation Build a low-touch, highly scalable cloud with IBM SmartCloud Provisioning.
Utility Computing Casey Rathbone 1http://cyberaide.org.edu.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Opensource for Cloud Deployments – Risk – Reward – Reality
Customized cloud platform for computing on your terms !
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
LARGE SCALE DEPLOYMENT OF DAP AND DTS Rob Kooper Jay Alemeda Volodymyr Kindratenko.
Software Architecture
National Center for Supercomputing Applications University of Illinois at Urbana–Champaign NCSA Brown Dog An Overview Kenton McHenry, Ph.D. Senior Research.
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
Introduction to the Adapter Server Rob Mace June, 2008.
Some Design Notes Iteration - 2 Method - 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
National Center for Supercomputing Applications University of Illinois at Urbana–Champaign NCSA Brown Dog PaaS for SaaS for PaaS Rob Kooper Senior Research.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
CSCE 315 – Programming Studio Spring Goal: Reuse and Sharing Many times we would like to reuse the same process or data for different purpose Want.
AN ORGANISATION FOR A NATIONAL EARTH SCIENCE INFRASTRUCTURE PROGRAM Virtual Geophysics Laboratory (VGL): Scientific workflows Exploiting the Cloud Josh.
Broadening Access to Geospatial Capabilities Carol Song, Larry Biehl, Rosen Center for Advanced Computing Venkatesh Merwade, School of Civil Engineering.
A Technical Overview Bill Branan DuraCloud Technical Lead.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Vignesh Ravindran Sankarbala Manoharan. Infrastructure As A Service (IAAS) is a model that is used to deliver a platform virtualization environment with.
Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R
Presented By:. What is JavaHelp: Most software developers do not look forward to spending time documenting and explaining their product. JavaSoft has.
StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INFSO-RI Demonstration StratusLab First.
1 This Changes Everything: Accelerating Scientific Discovery through High Performance Digital Infrastructure CANARIE’s Research Software.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Mission: Be a leader in the digital curation research and education fields, and foster interdisciplinary partnerships using Big Records and Archival Analytics.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Writing & Deploying Clowder Extractors Max Burnette, ISDA June 16, 2016.
Geoffrey Fox Panel Talk: February
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Joslynn Lee – Data Science Educator
ReproZip: Computational Reproducibility With Ease
Status and Challenges: January 2017
LOCO Extract – Transform - Load
Cross Platform Development using Software Matrix
IBC233- iSeries Business Computing Summer 2006 Welcome!
Usage of Openstack Cloud Computing Architecture in COE Seowon Jung Systems Administrator, COE
Web Services CO5027.
CHAPTER 3 Architectures for Distributed Systems
Project Ideas with ISDA Software
IBM AS 400 online Training in Hyderabad
University of Technology
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Replication Middleware for Cloud Based Storage Service
Big Data - in Performance Engineering
Tutorial Overview February 2017
Do we have our heads in the cloud? THE US NATIONAL VIRTUAL OBSERVATORY
Middleware, Services, etc.
bitcurator-access-webtools Quick Start Guide
Quoting and Billing: Commercialization of Big Data Analytics
Loaders and Linkers.
Client/Server Computing and Web Technologies
DIBBs Brown Dog BDFiddle
DIBBs Brown Dog Tutorial Setup
Presentation transcript:

ISDA + OpenStack Rob Kooper

BrownDog

NSF ACI Data Program Geoffrey Fox $5,000,000 2014-2019 Ken Koedinger Middleware and High Performance Analytics Libraries for Scalable Data Science Ken Koedinger $4,830,819 2014-2019 Building a Scalable Infrastructure for Data-Driven Discovery and Innovation in Education Kenton McHenry $10,519,716 2013-2018 Kenton McHenry $10,519,716 2013-2018 Alex Szalay $7,603,723 2013-2018 Long Term Access to Large Scientific Data Sets: The SkyServer and Beyond Michael Levine $4,902,601 2013-2018 The Data Exacell Xiaohui Carol Song $3,409,029 2013-2018 Integrating Geospatial Capabilities into HUBzero Reagan Moore $8,300,992 2011-2016 Steven Ruggles $7,993,266 2011-2016 Margaret Hedstrom $8,000,000 2011-2016 Margaret Hedstrom $8,000,000 2011-2016 Bill Michener $21,194,548 2009-2014 Golam Choudhury $10,085,120 2009-2014

“Big Data” At least two big components: Large quantities of data Large varieties of data “Long-Tail” Number of grants Dollars http://www.slideshare.net/rheimann04/big-social-data-the-social-turn-in-big-data http://www.slideshare.net/rheimann04/big-social-data-the-social-turn-in-big-data

The Problem Addressed by Brown Dog Large collections of un-curated and/or unstructured digital data (“long-tail” data) Many file formats No metadata No useful filenames No useful directory structure No textual contents

What Is Needed Means of deciphering the bytes that make up digital data so that one can retrieve its contents Data Structures (e.g. images, 3D points, sound waves, strings, fields, matrices, etc…) Means of indexing data contents so that large collections of data can be searched and desired data found An ability to compare data

What Is Typically Needed To Do This The file format specifications describing how contents are represented within the file’s bytes, the software used to create and view the data, software to convert to a format that is accessible, and the execution environment (platform, operating system, libraries, other software, etc…). The existence of metadata describing the data (possibly as simple as useful file/directory names), in order to search/index data.

Clowder

Manage Raw Files and Derived Metadata Image taken from camera

File Uploaded to DTS

Extracted Metadata in Web Interface

Extracted Metadata from Service API

RabbitMQ RabbitMQ vhost (clowder) Image Extractor clowder.ncsa Clowder *.file.image.# ncsa.image.preview Image Extractor *.file.image.# *.file.image.# dts.ncsa DTS Faces Face Extractor *.file.image.# *.file.composed.zip Shape File imlczo.ncsa IMLCZO GeoExtractor GeoTiff File *.file.image.tiff Clowder instances Exchanges Bindings Queues Extractors

OpenStack

Projects 5 projects + 1 generalized project ISDA project for our group Allows members to start/stop test instances General instances BrownDog Compute nodes + data nodes Other projects 1 compute node + 1 data node split into smaller pieces

Servers No more ISDA vm servers Use openstack to host server Volumes store server information Use puppet to manage servers Easy to create instances Command line access to create server

Servers @ ISDA Currently using 134 monitored machines 12 physical machines 102 VM on XEN + ESXI 6 VM on openstack (will only increase)

Elasticity and BrownDog RabbitMQ used for messages Every operation is a message A message queue for each operation Elasticity code monitors RabbitMQ Based on number of message start new instances If load below number of messages stop instances Elasticity code can start VM images Multiple instances of code in VM image Docker images

Throw Away Instances Many of the same instances Used for running same software many times Clowder Extractors Clowder Tool instances Use CORE-OS with docker Pass in cloud-init to initialize instance At boot time download docker container and start All instances can be turned off and restarted