MIT Lincoln Laboratory Cloud HPC- 1 AIR 22-Sep-2009 Cloud Computing – Where ISR Data Will Go for Exploitation 22 September 2009 Albert Reuther, Jeremy.

Slides:



Advertisements
Similar presentations
The HPEC Challenge Benchmark Suite
Advertisements

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Libra: An Economy driven Job Scheduling System for Clusters Jahanzeb Sherwani 1, Nosheen Ali 1, Nausheen Lotia 1, Zahra Hayat 1, Rajkumar Buyya 2 1. Lahore.
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Chapter 1: The Database Environment
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or
11 Application of CSF4 in Avian Flu Grid: Meta-scheduler CSF4. Lab of Grid Computing and Network Security Jilin University, Changchun, China Hongliang.
Evaluating Caching and Storage Options on the Amazon Web Services Cloud Gagan Agrawal, Ohio State University - Columbus, OH David Chiu, Washington State.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
1 Processes and Threads Creation and Termination States Usage Implementations.
Database Systems: Design, Implementation, and Management
Distributed and Parallel Processing Technology Chapter2. MapReduce
Configuration management
Fast Crash Recovery in RAMCloud
Mehdi Naghavi Spring 1386 Operating Systems Mehdi Naghavi Spring 1386.
Sweet Storage SLOs with Frosting Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Ion Stoica, Randy Katz.
Chapter 1: Introduction to Scaling Networks
Shredder GPU-Accelerated Incremental Storage and Computation
Discovering Computers Fundamentals, 2012 Edition
The Platform as a Service Model for Networking Eric Keller, Jennifer Rexford Princeton University INM/WREN 2010.
Source: IEEE Pervasive Computing, Vol. 8, Issue.4, Oct.2009, pp. 14 – 23 Author: Satyanarayanan, M., Bahl, P., Caceres, R., Davies, N. Adviser: Chia-Nian.
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
ICS 434 Advanced Database Systems
Database System Concepts and Architecture
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Introduction to Computer Administration Introduction.
Processes Management.
Executional Architecture
Implementation Architecture
1 Chapter 11: Data Centre Administration Objectives Data Centre Structure Data Centre Structure Data Centre Administration Data Centre Administration Data.
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
Grupo de Arquitectura de Computadores, Comunicaciones y Sistemas (Computer Architecture, Communications and Systems Group)
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Andrew Prout, William Arcand, David Bestor, Chansup Byun, Bill Bergeron, Matthew Hubbell, Jeremy Kepner, Peter Michaleas, Julie Mullen, Albert Reuther,
Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Matthew Hubbell, Jeremy Kepner, Andrew McCabe, Peter Michaleas, Julie Mullen, David O’Gwynn,
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Ch 4. The Evolution of Analytic Scalability
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
Software Architecture
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
HPEC-1 MIT Lincoln Laboratory Session 3: Cloud Computing Albert Reuther/ MIT Lincoln Laboratory HPEC Conference 16 September 2010.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
MIT Lincoln Laboratory 1 of 4 MAA 3/8/2016 Development of a Real-Time Parallel UHF SAR Image Processor Matthew Alexander, Michael Vai, Thomas Emberley,
Organizations Are Embracing New Opportunities
Introduction to Distributed Platforms
Software Systems Development
Platform as a Service.
Introduction to HDFS: Hadoop Distributed File System
Hadoop Clusters Tess Fulkerson.
Ch 4. The Evolution of Analytic Scalability
Hadoop Technopoints.
Introduction to Apache
Presentation transcript:

MIT Lincoln Laboratory Cloud HPC- 1 AIR 22-Sep-2009 Cloud Computing – Where ISR Data Will Go for Exploitation 22 September 2009 Albert Reuther, Jeremy Kepner, Peter Michaleas, William Smith This work is sponsored by the Department of the Air Force under Air Force contract FA C Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

MIT Lincoln Laboratory Cloud HPC- 2 AIR 22-Sep-2009 Outline Persistent surveillance requirements Data Intensive cloud computing Introduction Cloud Supercomputing Integration with Supercomputing System Preliminary Results Summary

MIT Lincoln Laboratory Cloud HPC- 3 AIR 22-Sep-2009 Sensor Capability Shadow-200 Net Centric Users Raw Data Images Detects Tracks Risk Tip/Queue Act Requires Massive Computing Power Persistent Surveillance: The “New” Knowledge Pyramid DoD missions must exploit High resolution sensors Integrated multi-modal data Short reaction times Many net-centric users Global Hawk Predator Shadow-200

MIT Lincoln Laboratory Cloud HPC- 4 AIR 22-Sep-2009 Wide-Area EO GMTI Bluegrass Dataset (detection/tracking) High-Res EO (Twin Otter) Vehicle Ground Truth Cues Terabytes of data; multiple classification levels; multiple teams Enormous computation to test new detection and tracking algorithms

MIT Lincoln Laboratory Cloud HPC- 5 AIR 22-Sep-2009 Persistent Surveillance Data Rates Persistent Surveillance requires watching large areas to be most effective Surveilling large areas produces enormous data streams Must use distributed storage and exploitation

MIT Lincoln Laboratory Cloud HPC- 6 AIR 22-Sep-2009 Cloud Computing Concepts Data Intensive Computing Compute architecture for large scale data analysis –Billions of records/day, trillions of stored records, petabytes of storage o Google File System 2003 o Google MapReduce 2004 o Google BigTable 2006 Design Parameters –Performance and scale –Optimized for ingest, query and analysis –Co-mingled data –Relaxed data model –Simplified programming Community: Utility Computing Compute services for outsourcing IT –Concurrent, independent users operating across millions of records and terabytes of data o IT as a Service o Infrastructure as a Service (IaaS) o Platform as a Service (PaaS) o Software as a Service (SaaS) Design Parameters –Isolation of user data and computation –Portability of data with applications –Hosting traditional applications –Lower cost of ownership –Capacity on demand Community:

MIT Lincoln Laboratory Cloud HPC- 7 AIR 22-Sep-2009 Advantages of Data Intensive Cloud: Disk Bandwidth Cloud computing moves computation to data –Good for applications where time is dominated by reading from disk Replaces expensive shared memory hardware and proprietary database software with cheap clusters and open source –Scalable to hundreds of nodes Traditional: Data from central store to compute nodes Cloud: Data replicated on nodes, computation sent to nodes Scheduler C/C++ Scheduler C/C++

MIT Lincoln Laboratory Cloud HPC- 8 AIR 22-Sep-2009 Outline Cloud stack Distributed file systems Distributed database Distributed execution Introduction Cloud Supercomputing Integration with Supercomputing System Preliminary Results Summary

MIT Lincoln Laboratory Cloud HPC- 9 AIR 22-Sep-2009 Cloud Software: Hybrid Software Stacks Cloud implementations can be developed from a large variety of software components –Many packages provide overlapping functionality Effective migration of DoD to a cloud architecture will require mapping core functions to the cloud software stack –Most likely a hybrid stack with many component packages MIT-LL has developed a dynamic cloud deployment architecture on its computing infrastructure –Examining performance trades across software components Distributed file systems –File-based: Sector –Block-based: Hadoop DFS Distributed database: HBase Compute environment: Hadoop MapReduce Applications Application Framework Job Control App Services Cloud Services Hardware Linux OS Relational DB App Servers Services MapReduce HBase Sector HDFS Cloud Storage

MIT Lincoln Laboratory Cloud HPC- 10 AIR 22-Sep-2009 P2P File system (e.g., Sector) Low-cost, file-based, “read-only”, replicating, distributed file system Manager maintains metadata of distributed file system Security Server maintains permissions of file system Good for mid sized files (Megabytes) –Holds data files from sensors ManagerSecurity Server Client SSL Data Workers

MIT Lincoln Laboratory Cloud HPC- 11 AIR 22-Sep-2009 Parallel File System (e.g., Hadoop DFS) Low-cost, block-based, “read-only”, replicating, distributed file system Namenode maintains metadata of distributed file system Good for very large files (Gigabyte) –Tar balls of lots of small files (e.g., html) –Distributed databases (e.g. HBase) Namenode Client Metadata Data Datanodes

MIT Lincoln Laboratory Cloud HPC- 12 AIR 22-Sep-2009 Distributed Database (e.g., HBase) Database tablet components spread over distributed block-based file system Optimized for insertions and queries Stores metadata harvested from sensor data (e.g., keywords, locations, file handle, …) Namenode Client Metadata Data Datanodes

MIT Lincoln Laboratory Cloud HPC- 13 AIR 22-Sep-2009 Distributed Execution (e.g., Hadoop MapReduce, Sphere) Each Map instance executes locally on a block of the specified files Each Reduce instance collects and combines results from Map instances No communication between Map instances All intermediate results are passed through Hadoop DFS Used to process ingested data (metadata extraction, etc.) Namenode Client Metadata Data Datanodes Reduce Map

MIT Lincoln Laboratory Cloud HPC- 14 AIR 22-Sep-2009 Hadoop Cloud Computing Architecture LLGrid Cluster Hadoop Namenode/ Sector Manager/ Sphere JobMaster Sequence of Actions 1.Active folders register intent to write data to Sector. Manager replies with Sector worker addresses to which data should be written. 2.Active folders write data to Sector workers. 3.Manager launches Sphere MapReduce-coded metadata ingester onto Sector data files. 4.MapReduce-coded ingesters insert metadata into Hadoop HBase database. 5.Client submits queries on Hbase metadata entries. 6.Client fetches data products from Sector workers. Sequence of Actions 1.Active folders register intent to write data to Sector. Manager replies with Sector worker addresses to which data should be written. 2.Active folders write data to Sector workers. 3.Manager launches Sphere MapReduce-coded metadata ingester onto Sector data files. 4.MapReduce-coded ingesters insert metadata into Hadoop HBase database. 5.Client submits queries on Hbase metadata entries. 6.Client fetches data products from Sector workers. Sector- Sphere Hadoop Datanod e

MIT Lincoln Laboratory Cloud HPC- 15 AIR 22-Sep-2009 Examples Compare accessing data –Central parallel file system (500 MB/s effective bandwidth) –Local RAID file system (100 MB/s effective bandwidth) In data intensive case, each data file is stored on local disk in its entirety Only considering disk access time Assume no network bottlenecks Assume simple file system accesses Scheduler C/C++ Scheduler C/C++

MIT Lincoln Laboratory Cloud HPC- 16 AIR 22-Sep-2009 E/O Photo Processing App Model Two stages –Determine features in each photo –Correlate features between current photo and every other photo Photo size: 4.0 MB each Feature results file size: 4.0 MB each Total photos: 30,000

MIT Lincoln Laboratory Cloud HPC- 17 AIR 22-Sep-2009 Persistent Surveillance Tracking App Model Each processor tracks region of ground in series of images Results are saved in distributed file system Image size: 16 MB Track results: 100 kB Number of images: 12,000

MIT Lincoln Laboratory Cloud HPC- 18 AIR 22-Sep-2009 Outline Cloud scheduling environment Dynamic Distributed Dimensional Data Model (D4M) Introduction Cloud Supercomputing Integration with Supercomputing System Preliminary Results Summary

MIT Lincoln Laboratory Cloud HPC- 19 AIR 22-Sep-2009 Cloud Scheduling Two layers of Cloud scheduling –Scheduling the entire Cloud environment onto compute nodes Cloud environment on single node as single process Cloud environment on single node as multiple processes Cloud environment on multiple nodes (static node list) Cloud environment instantiated through scheduler, including Torque/PBS/Maui, SGE, LSF (dynamic node list) –Scheduling MapReduce jobs onto nodes in Cloud environment First come, first served Priority scheduling No scheduling for non-MapReduce clients No scheduling of parallel jobs

MIT Lincoln Laboratory Cloud HPC- 20 AIR 22-Sep-2009 Cloud vs Parallel Computing Parallel computing APIs assume all compute nodes are aware of each other (e.g., MPI, PGAS, …) Cloud computing API assumes a distributed computing programming model (computed nodes only know about manager) However, cloud infrastructure assumes parallel computing hardware (e.g., Hadoop DFS allows for direct comm between nodes for file block replication) Challenge: how to get best of both worlds?

MIT Lincoln Laboratory Cloud HPC- 21 AIR 22-Sep-2009 D4M: Parallel Computing on the Cloud D4M launches traditional parallel jobs (e.g., pMatlab) onto Cloud environment Each process of parallel job launched to process one or more documents in DFS Launches jobs through scheduler like LSF, PBS/Maui, SGE Enables more tightly-coupled analytics ManagerSecurity Server Client SSL Data Workers

MIT Lincoln Laboratory Cloud HPC- 22 AIR 22-Sep-2009 Outline Distributed file systems D4M progress Introduction Cloud Supercomputing Integration with Supercomputing System Preliminary Results Summary

MIT Lincoln Laboratory Cloud HPC- 23 AIR 22-Sep-2009 Distributed Cloud File Systems on TX-2500 Cluster Shared network storage Rocks Mgmt, 411, Web Server, Ganglia Service Nodes Dual 3.2 GHz EM64-T Xeon (P4) 8 GB RAM memory Two Gig-E Intel interfaces Infiniband interface Six 300-GB disk drives PowerEdge Nodes CPUs 3.4 TB RAM 0.78 PB of Disk 28 Racks LSF-HPC resource manager/ scheduler To LLAN Distributed File System Metadata Distributed File System Data Nodes MIT-LL Cloud Hadoop DFS Sector Number of nodes used 350 File system size TB452.7 TB Replication factor 32

MIT Lincoln Laboratory Cloud HPC- 24 AIR 22-Sep-2009 D4M on LLGrid Demonstrated D4M on Hadoop DFS Demonstrated D4M on Sector DFS D4M on HBase (in progress) ManagerSecurity Server Client SSL Data Workers

MIT Lincoln Laboratory Cloud HPC- 25 AIR 22-Sep-2009 Summary Persistent Surveillance applications will over-burden our current computing architectures –Very high data rates –Highly parallel, disk-intensive analytics Good candidate for Data Intensive Cloud Computing Components of Data Intensive Cloud Computing –File- and block-based distributed file systems –Distributed databases –Distributed execution Lincoln has Cloud experimentation infrastructure –Created >400 TB DFS –Developing D4M to launch traditional parallel jobs on Cloud environment

MIT Lincoln Laboratory Cloud HPC- 26 AIR 22-Sep-2009 Backups

MIT Lincoln Laboratory Cloud HPC- 27 AIR 22-Sep-2009 Outline Introduction –Persistent surveillance requirements –Data Intensive cloud computing Cloud Supercomputing –Cloud stack –Distributed file systems –Computational paradigms –Distributed database-like hash stores Integration with supercomputing system –Scheduling cloud environment –Dynamic Distributed Dimensional Data Model (D4M) Preliminary results Summary

MIT Lincoln Laboratory Cloud HPC- 28 AIR 22-Sep-2009 What is LLGrid? LAN Switch Network Storage Resource Manager Configuration Server Compute Nodes Service Nodes Cluster Switch To Lincoln LAN Users LLAN FAQs Web Site LLGrid is a ~300 user ~1700 processor system World’s only desktop interactive supercomputer –Dramatically easier to use than any other supercomputer –Highest fraction of staff using (20%) supercomputing of any organization on the planet Foundation of Lincoln and MIT Campus joint vision for “Engaging Supercomputing”

MIT Lincoln Laboratory Cloud HPC- 29 AIR 22-Sep-2009 SAR and GMTI EO, IR, Hyperspectral, Ladar StageSignal & Image Processing / Calibration & registration Detection & trackingExploitation AlgorithmsFront end signal & image processing Back end signal & image processing Graph analysis / data mining / knowledge extraction DataSensor inputsDense ArraysGraphs KernelsFFT, FIR, SVD, …Kalman, MHT, …BFS, DFS, SSSP, … ArchitectureEmbeddedCloud/GridCloud/Grid/Graph Efficiency25% - 100%10% - 25%< 0.1% Exploitation Detection & Tracking SIGINT Decision Support Diverse Computing Requirements Algorithm prototyping Front end Back end Exploitation Processor prototyping Embedded Cloud / Grid Graph Algorithm prototyping Front end Back end Exploitation Processor prototyping Embedded Cloud / Grid Graph

MIT Lincoln Laboratory Cloud HPC- 30 AIR 22-Sep-2009 Elements of Data Intensive Computing Distributed File System –Hadoop HDFS: Block-based data storage –Sector FS: File-based data storage Distributed Execution –Hadoop MapReduce: Independently parallel compute model –Sphere: MapReduce for Sector FS –D4M: Dynamic Distributed Dimensional Data Model Lightly-Structured Data Store –Hadoop HBase: Distributed (hashed) data tables