Distribution Statement A. Approved for public release; distribution is unlimited. Test and Evaluation/Science and Technology Program Rapid Data Analyzer.

Slides:



Advertisements
Similar presentations
Chapter 9. Performance Management Enterprise wide endeavor Research and ascertain all performance problems – not just DBMS Five factors influence DB performance.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Performance Testing - Kanwalpreet Singh.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Chapter 19: Network Management Business Data Communications, 5e.
Telecommunications Management /635 Network Management.
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
Chapter 19: Network Management Business Data Communications, 4e.
Information Retrieval in Practice
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
MSIS 110: Introduction to Computers; Instructor: S. Mathiyalakan1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
SDLC. Information Systems Development Terms SDLC - the development method used by most organizations today for large, complex systems Systems Analysts.
Panel Summary Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University XLDB 23-October-07.
Database Systems.
1 1 File Systems and Databases Chapter 1 Prof. Sin-Min Lee Dept. of Computer Science.
Overview of Search Engines
Passage Three Introduction to Microsoft SQL Server 2000.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
The big Data security Analytics Era Is Here Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
Tyson Condie.
1 BTEC HNC Systems Support Castle College 2007/8 Systems Analysis Lecture 9 Introduction to Design.
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Test & Evaluation/Science & Technology Program Net-Centric Systems Test.
What is a life cycle model? Framework under which a software product is going to be developed. – Defines the phases that the product under development.
material assembled from the web pages at
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Principles of Information Systems, Sixth Edition Systems Design, Implementation, Maintenance, and Review Chapter 13.
Chapter 4 Realtime Widely Distributed Instrumention System.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
(C) 2008 Clusterpoint(C) 2008 ClusterPoint Ltd. Empowering You to Manage and Drive Down Database Costs April 17, 2009 Gints Ernestsons, CEO © 2009 Clusterpoint.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
- Ahmad Al-Ghoul Data design. 2 learning Objectives Explain data design concepts and data structures Explain data design concepts and data structures.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Software Development Life Cycle by A.Surasit Samaisut Copyrights : All Rights Reserved.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Principles of Information Systems, Sixth Edition 1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
HNDIT23082 Lecture 06:Software Maintenance. Reasons for changes Errors in the existing system Changes in requirements Technological advances Legislation.
Web Log Data Analytics with Hadoop
+ Logentries Is a Real-Time Log Analytics Service for Aggregating, Analyzing, and Alerting on Log Data from Microsoft Azure Apps and Systems MICROSOFT.
Learning Objectives Understand the concepts of Information systems.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Chapter 7 Preliminary Construction The broad scope of implementation Preliminary construction in the SDLC Preliminary construction activities Preliminary.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
APRIL 10, Meeting Agenda  Prototype 2 Goals  Robust Connections Demo  System Diagnostics Tool Demo  Final Prototype Risk Mitigation  Final.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
York Laboratory Information Systems Restructuring Proposal IEM 5303 Term Paper Fall 2000 Chris M. Forth Senior Project Engineer York International.
Chapter 19: Network Management
MapReduce Compiler RHadoop
University of Maryland College Park
Hadoop Aakash Kag What Why How 1.
Search Engine Architecture
Applying Control Theory to Stream Processing Systems
Joseph JaJa, Mike Smorul, and Sangchul Song
Ch 4. The Evolution of Analytic Scalability
Overview of big data tools
Presentation transcript:

Distribution Statement A. Approved for public release; distribution is unlimited. Test and Evaluation/Science and Technology Program Rapid Data Analyzer for NST (RDAN) Institute for Networking and Security Research Industry Day The Pennsylvania State University April 3, 2014 Mr. Bruce Einfalt Applied Research Laboratory, The Pennsylvania State University UNCLASSIFIED

Distribution Statement A. Approved for public release; distribution is unlimited. 2 TRMC T&E/S&T Acknowledgement This project is funded by the Test Resource Management Center (TRMC) Test and Evaluation/Science & Technology (T&E/S&T) Program through the U.S. Army Program Executive Office for Simulation, Training, and Instrumentation (PEO STRI) under Contract No. W900KK-13-C-0015.

Distribution Statement A. Approved for public release; distribution is unlimited. 3 Outline T&E Need RDAN System Overview Cloud Computing Background RDAN Functional Diagram Key Science and Technology Innovations Potential RDAN Use Cases Summary and Future Work

Distribution Statement A. Approved for public release; distribution is unlimited. 4 Diverse data formats collected using separate T&E systems Modern distributed T&E events generate large amounts of unstructured data that are hard to analyze Current analysis techniques Data to Decision extremely slow (hrs/days/wks) High volume data Decision High Volume Data Collection –Unable to efficiently collect and analyze large unstructured data (i.e. text, voice, chat) and structured data across multiple sources and environments –Lack of capability to quickly review past historical data limits value of collected test records Adaptability and Scalability –Difficult to scale T&E system as data grows Unstruct Text Voice Struct Data Analysis tools –Data is often manually processed with slow response time from data to decision –Manual processing introduces human error –T&E systems lack automation tools to analyze large volumes of unstructured text, voice, and structured data –T&E systems lack ability to perform deep analysis on test event as it occurs Slow response time from Data to Decision RDAN T&E Need Rapid Analysis of High Volume Data (“Data to Decision”)

Distribution Statement A. Approved for public release; distribution is unlimited. 5 RDAN System Overview RDAN applies automated analysis using cloud computing technologies to reduce the time from Data to Decision Collect and analyze high-volume data from multiple data sources - Unstructured text data (Phase 1) - Voice data (Phase 2) - Structured data & TENA (Phase 3) Automatically analyze and index data to support rapid search and analysis operations on large data sets using cloud technologies Provide custom parallel algorithms and architecture to reduce time from Data to Decision from hours/days/weeks to seconds

Distribution Statement A. Approved for public release; distribution is unlimited. 6 Cloud Background: Private Big Data Processing Cloud New cutting edge technology - Hadoop publicly available in Apache Accumulo available in 2012 Optimized for processing big data - Software framework designed for scalability & fault tolerance - Storage of data on compute nodes eliminates I/O bottlenecks - Large clusters with inexpensive commodity components support massive aggregate I/O, CPU, and network capacity - System hardware can be precisely tuned for workload Private big data processing cloud architecture is optimized for processing large volumes of data ARL Test Cloud System

Distribution Statement A. Approved for public release; distribution is unlimited. 7 RDAN Functional Diagram Results Input Data Files Semantics Library Index Files Data Store Query Rule Mgt Query Filter Mgt Filter Library Indexing EngineData Conversion Engine Query & Analysis EngineUser Interface Query Results Tester / Analyst TENATENA Receive input and associated metadata Convert unstructured text, voice, and structured data to canonical format Apply natural language processing Perform user-defined automated analysis Ingest raw input and all derived data products into data store Generate statistics and indexes needed to support user queries and analysis Store index information in index files Execute queries and analysis including application of semantics rules Accept queries & return results Manage system configuration Legend Input data Data store / files Processing engine

Distribution Statement A. Approved for public release; distribution is unlimited. 8 Key S&T Innovations Diverse data formats collected using a single system New automated analysis and data fusion Data to Decision is rapid (seconds) High volume data Novel data ingestion architecture –Custom parallel algorithms provide scalability, high throughput, and low latency for storage, indexing, and analysis using the latest cloud technologies Pipelined indexing architecture –New data structures and algorithms significantly improve indexing throughput while maintaining low query latency Unstruct Text Voice Struct Data Extensible Canonical Data Format –Data-defined key/value store allows new sensors and analysis modules to be added to system without modifying system architecture Flexible Search and Analysis Tools –New semantics rules allow analysts to search and analyze data using high-level constructs –Custom entity extraction algorithms extend search to include natural language processing Rapid response time from Data to Decision Decision RDAN allows testers and analysts to perform near real-time analysis of complex distributed T&E events

Distribution Statement A. Approved for public release; distribution is unlimited. 9 Potential RDAN Applications PACER RDAN IA/Cyber T&E Exercises Distributed T&E Events Image & Video Analyze large unstructured and structured audit logs to rapidly detect cyber vulnerabilities Provide near real-time feedback about test events to improve test range utilization Rapidly collect, filter, store, and analyze diverse high volume data collected during T&E events Automate analysis of large volumes of image and video data collected during T&E events RDAN can support multiple T&E needs

Distribution Statement A. Approved for public release; distribution is unlimited. 10 Summary and Future Work RDAN is a prototype end-to-end system that utilizes the latest breakthroughs in cloud technologies to automatically analyze large volumes of unstructured, voice, and structured test data –Reduces the time from Data to Decision from hours/days/weeks to seconds –Offers interactive and automated analysis of both live and recorded data –Supports near real-time analysis of T&E events as they are taking place –Scalable and highly configurable to support multiple Test and Evaluation programs RDAN mitigates risk during T&E events by providing near real-time analysis of data of different types, structures, and sizes –Near real-time analysis of test event saves time and money –Automated processing of data minimizes human errors –Supports review of collected historical data during test Proposed extensions to RDAN system −Add support for high data rate image and video processing −Add processing support for languages other than English

Distribution Statement A. Approved for public release; distribution is unlimited. 11 Points of Contact Mr. Gil Torres Net-Centric System Test Executing Agent Ms. Kathy Smith Net-Centric System Test Deputy Executing Agent Mr. Bruce Einfalt Principal Investigator (PI) Mr. Greg Babich Co-PI Mr. Andrew Shaffer Technical Lead

Distribution Statement A. Approved for public release; distribution is unlimited. 12 Questions?