An Architecture-based Framework For Understanding Large-Volume Data Distribution Chris A. Mattmann USC CSSE Annual Research Review March 17, 2009.

Slides:



Advertisements
Similar presentations
Jeremy S. Bradbury, James R. Cordy, Juergen Dingel, Michel Wermelinger
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
1 Service Oriented Architectures (SOA): What Users Need to Know. OGF 19: January 31, 2007 Charlotte, NC John Salasin, Ph.D, Visiting Researcher National.
Architecture Representation
The design process IACT 403 IACT 931 CSCI 324 Human Computer Interface Lecturer:Gene Awyzio Room:3.117 Phone:
Modeling Human Reasoning About Meta-Information Presented By: Scott Langevin Jingsong Wang.
OASIS Reference Model for Service Oriented Architecture 1.0
AP 04/03 Dynamic (Re-) Configuration as Safeguard Mechanism in dynamically changing environments DCL Distributed Control Lab™ at HPI.
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Software Connectors.
Software Testing and Quality Assurance
Semantic description of service behavior and automatic composition of services Oussama Kassem Zein Yvon Kermarrec ENST Bretagne France.
CSCI 578 Software Architectures Dr. Chris Mattmann Tuesday, January 13, 2009.
OTS Integration Analysis using iStudio Jesal Bhuta, USC-CSE March 14, 2006.
A Framework for the Assessment and Selection of Software Components and Connectors in COTS-based Architectures Jesal Bhuta, Chris Mattmann {jesal,
Tera/Petabyte data distribution architectures Chris A. Mattmann USC-CSE Annual Research Review Monday, June 15, 2015Monday, June 15, 2015Monday, June 15,
Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.
Ensuring Non-Functional Properties. What Is an NFP?  A software system’s non-functional property (NFP) is a constraint on the manner in which the system.
1 Software Architecture: a Roadmap David Garlen Roshanak Roshandel Yulong Liu.
Software Requirements
Essential Software Architecture Ian Gorton CS590 – Winter 2008.
Course Instructor: Aisha Azeem
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
CSCI 578 Software Architectures Dr. Chris Mattmann Tuesday, August 27, 2013.
MPEG-21 : Overview MUMT 611 Doug Van Nort. Introduction Rather than audiovisual content, purpose is set of standards to deliver multimedia in secure environment.
An Online Knowledge Base for Sustainable Military Facilities & Infrastructure Dr. Annie R. Pearce, Branch Head Sustainable Facilities & Infrastructure.
From Use Cases to Test Cases 1. A Tester’s Perspective  Without use cases testers will approach the system to be tested as a “black box”. “What, exactly,
Architecting Web Services Unit – II – PART - III.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Actualog Social PIM Helps Companies to Manage and Share Product Information Using Secure, Scalable Ease of Microsoft Azure MICROSOFT AZURE ISV PROFILE:
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. NFP Design Techniques Software Architecture Lecture 20.
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. NFP Design Techniques Software Architecture Lecture 20.
Rational Unified Process Fundamentals Module 7: Process for e-Business Development Rational Unified Process Fundamentals Module 7: Process for e-Business.
MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.
CPSC 873 John D. McGregor Session 9 Testing Vocabulary.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Data Design and Implementation. Definitions Atomic or primitive type A data type whose elements are single, non-decomposable data items Composite type.
Software Connectors Acknowledgement: slides mostly from Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic,
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Software Connectors in Practice Software Architecture.
Basic Concepts and Definitions
Improving System Availability in Distributed Environments Sam Malek with Marija Mikic-Rakic Nels.
Software Connectors. What is a Software Connector? 2 What is Connector? – Architectural element that models Interactions among components Rules that govern.
Basic Concepts of Software Architecture. What is Software Architecture? Definition: – A software system’s architecture is the set of principal design.
SAP BI – The Solution at a Glance : SAP Business Intelligence is an enterprise-class, complete, open and integrated solution.
System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Software Connectors Software Architecture Lecture 7.
SURENDRA INSTITUTE OF ENGINEERING & MANAGEMENT PRESENTED BY : Md. Mubarak Hussain DEPT-CSE ROLL
Managing Data Resources File Organization and databases for business information systems.
CSCI 578 Software Architectures
Architecting Web Services
Architecting Web Services
Software Connectors.
Software Design and Architecture
Component Based Software Engineering
CSCI 578 Software Architectures
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
The Extensible Tool-chain for Evaluation of Architectural Models
Software Connectors – A Taxonomy Approach
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
The Extensible Tool-chain for Evaluation of Architectural Models
Software Connectors.
Service Oriented Architectures (SOA): What Users Need to Know.
Challenges with developing a Commercial P2P System
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Subject Name: SOFTWARE ENGINEERING Subject Code:10IS51
Presentation transcript:

An Architecture-based Framework For Understanding Large-Volume Data Distribution Chris A. Mattmann USC CSSE Annual Research Review March 17, 2009

Agenda Research Problem and Importance Our Approach –Classification –Selection –Analysis Evaluation –Precision, Recall, Accuracy Measurements –Speed Conclusion & Future Work

Research Problem and Importance Content repositories are growing rapidly in size At the same time, we expect more immediate dissemination of this data How do we distribute it… –In a performant manner? –Fulfilling system requirements? ?

Data Distribution Scenarios A medium-sized volume of data, e.g., on the order of a gigabyte needs to be delivered across a LAN, using multiple delivery intervals consisting of 10 megabytes of data per interval, to a single user. A Backup Site periodically connects across the WAN to the Digital Movie Repository to backup its entire catalog and archive of over 20 terabytes of movie data and metadata.

Data Distribution Problem Space

Insight: Software Architecture The definition of a system in the form of its canonical building blocks –Software Components: the computational units in the system –Software Connectors: the communications and interactions between software components –Software Configurations: arrangements of components and connectors and the rules that guide their composition

Data Distribution Systems Data Producer Data Consumer data ??? data Connector Insight: Use Software Connectors to model data distribution technologies Component

Impact of Data Distribution Technologies Broad variety of data distribution technologies Some are highly efficient, some more reliable P2P, Grid, Client/Server, and Event-based Some are entirely appropriate to use, some are not appropriate

Data Movement Technologies Wide array of available OTS “large- scale” connector technologies –GridFTP, Aspera, HTTP/REST, RMI, CORBA, SOAP, XML-RPC, Bittorrent, JXTA, UFTP, FTP, SFTP, SCP, Siena, GLIDE/PRISM-MW, and more Which one is the best one? How do we compare them –Given our current architecture? –Given our distribution scenarios & requirements?

Research Question What types of software connectors are best suited for delivering vast amounts of data to users, that satisfy their particular scenarios, in a manner that is performant, scalable, in these hugely distributed data systems?

Broad variety of distribution connector families P2P, Grid, Client/Server, and Event- based Though each connector family varies slightly in some form or fashion –They all share 3 common atomic connector constituents Data Access, Stream, Distributor Adapted from our group’s ICSE2000 Connector Taxonomy

Connector Tradeoff Space Surveyed properties of 13 representative distribution connectors, across all 4 distribution connector families and classified them –Client/Server SOAP, RMI, CORBA, HTTP/REST, FTP, UFTP, SCP, Commercial UDP Technology –Peer to Peer Bittorrent –Grid GridFTP, bbFTP –Event-based GLIDE, Sienna

Large Heterogeneity in Connector Properties

How do experts make these decisions? Performed survey of 33 “experts” Experts defined to be –Practitioners in industry, building data-intensive systems –Researchers in data distribution –Admitted architects of data distribution technologies General consensus? –They don’t the how and the why about which connector(s) are appropriate –They rely on anecdotal evidence and “intuition” 45% of respondents claimed to be uncomfortable being addressed as a data distribution expert.

Why is it bad to have these types of experts? Employ a small set of COTS, and/or pervasive distribution technologies, and stick to them –Regardless of the scenario requirements –Regardless of the capabilities at user’s institutions Lack a comprehensive understanding of benefits/tradeoffs amongst available distribution technologies –They have “pet technologies” that they have used in similar situations –These technologies are not always applicable and frequently only satisfy one or two scenario requirements and ignore the rest

Our Approach: DISCO Develop a software framework for: –Connector Classification Build metadata profiles of connector technologies, describing their intrinsic properties (DCPs) –Connector Selection Adaptable, extensible algorithm development framework for selecting the “right” connectors (and identifying wrong ones) –Connector Selection Analysis Measurement of accuracy of results –Connector Performance Analysis

DISCO in a Nutshell

Scenario Language Describes distribution scenarios e.g., 10 MB, 100 GB, etc., int + higher order unit e.g., 1, 10, int e.g., SSL/HTTP 1.0, Linux File System Perms, string from controlled value range 1-10, computed scale e.g., 1, 10, int

Distribution Connector Model Developed model for distribution connectors Identified combination of primitive connectors that a distribution connector is made from

Model defines important properties of each of the important “modules” within a distribution connector Defines value space for each property Defines each property Properties are based on the combination of underlying “primitive” connector constituents Model forms the basis for a metadata description (or profile) of a distribution connector Distribution Connector Model

Selection Algorithms So far –Let data system architects encode the data distribution scenarios within their system using scenario language –Let connector gurus describe important properties of connectors using architectural metadata (connector model) Selection Algorithms –Use scenario(s) and connector properties identify the “best” connectors for the given scenario(s)

Selection Algorithms Formal Statement of the problem

Selection algorithm interface ? Connector KB scenario (bbFTP, 0.157) (FTP,0.157) (GridFTP,0.157) (HTTP/REST, 0.157) (SCP, 0.157) (UFTP, 0.157) (Bittorrent, 0.021) (CORBA, 0.005) (Commercial UDP Technology, 0.005) (GLIDE, 0.005) (RMI, 0.005) (Sienna, 0.005) (SOAP, 0.005) This interface is desirable because it allows a user to rank and compare how “appropriate” each connector is, rather than having a binary decision Selection Algorithms

Selection Algorithm Approach White box –Consider the internal properties of a connector (e.g., its internal architecture) when selecting it for a distribution scenario Black box –Consider the external (observable) properties of the connector (such as performance) when selecting it for a distribution scenario

Develop complementary selection algorithms Users familiar with connector technologies develop score functions Relating observable properties (performance reqs) of connector to scenario dimensions Software architects fill out Bayesian domain profiles containing conditional probabilities Likelihood a connector, given attribute A and its value, and given scenario requirement, is appropriate for scenario S

Selection Analysis How do we make decisions based on a rank list? Insight: looking at the rank list, it is apparent that many connectors are similarly ranked, while many are not –Appropriate versus Inappropriate?

Selection Analysis (bbFTP, ) (FTP, ) (GridFTP, ) (HTTP/REST, ) (SCP, ) (UFTP, ) (Bittorrent, ) (CORBA, ) (Commercial UDP Technology, ) (GLIDE, ) (RMI, ) (Sienna, ) (SOAP, ) appropriateinappropriate

Selection Analysis

Employed k-means data clustering algorithm –k parameter defines how many sets data is partitioned into Allows for clustering of data points (x, y) around a “centroid” or mean value We developed an exhaustive connector clustering algorithm based on k-means –clusters connectors into 2 groups, appropriate, and inappropriate –uses connector rank value as y parameter (x is the connector name) –exhaustive in the sense that it iterates over all possible connector clusters (vanilla k-means is heuristic & possibly incomplete)

Tool Support Allows a user to utilize different connector knowledge bases, configure selection algorithms and execute them and visualize their results

Decision Process 87% 80.5% Precision - the fraction of connectors correctly identified as appropriate for a scenario Accuracy - the fraction of connectors correctly identified as appropriate or inappropriate for a scenario

Decision Process: Speed

Conclusions & Future Work Conclusions –Domain experts (gurus) rely on tacit knowledge and often cannot explain design rationale –Disco provides a quantification of & framework for understanding an ad hoc process –Bayesian algorithm has a higher precision rate Future Work –Explore the tradeoffs between white-box and black- box approaches –Investigate the role of architectural mismatch in connectors for data system architectures

Thank You! Questions?

Backup

Related Work Software Connectors –Mehta00 (Taxonomy), Spitznagel01, Spitznagel03, Arbab04, Lau05 Data Distribution/Grid Computing –Crichton01, Chervenak00, Kesselman01 COTS Component/Connector selection –Bhuta07, Mancebo05, Finkelstein05 Data Dissemination –Franklin/Zdonik97