The Top 10 Reasons Why Federated Can’t Succeed

Slides:



Advertisements
Similar presentations
Chapter 1 Overview of Databases and Transaction Processing.
Advertisements

The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway.
Chapter 19: Network Management Business Data Communications, 4e.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Overview of Databases and Transaction Processing Chapter 1.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Overview Distributed vs. decentralized Why distributed databases
Chapter 9: Moving to Design
1 Introduction Introduction to database systems Database Management Systems (DBMS) Type of Databases Database Design Database Design Considerations.
1 CONCENTRXSept 2000 Our Perspective “Integration without an architecture is like doing a jigsaw puzzle on your lap “ – R Tessier We look at the big picture.
Chapter 1 Overview of Databases and Transaction Processing.
Database Management Managerial Overview. Managing Data Resources Data are a vital organizational resource that need to be managed like other important.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Framework: ISA-95 WG We are here User cases Studies
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
1 CS 430 Database Theory Winter 2005 Lecture 1: Introduction.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Computer Measurement Group, India Optimal Design Principles for better Performance of Next generation Systems Balachandar Gurusamy,
material assembled from the web pages at
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Session-8 Data Management for Decision Support
1 XML Based Networking Method for Connecting Distributed Anthropometric Databases 24 October 2006 Huaining Cheng Dr. Kathleen M. Robinette Human Effectiveness.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Distributed Databases
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Chapter 1 Overview of Databases and Transactions.
Object storage and object interoperability
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Lecture 15 Page 1 CS 236 Online Evaluating Running Systems Evaluating system security requires knowing what’s going on Many steps are necessary for a full.
Chapter 1 Overview of Databases and Transaction Processing.
7.5 Using Stored-Procedure and Triggers NAME MATRIC NUM GROUP Muhammad Azwan Bin Khairul Anwar CS2305A Muhammad Faiz Bin Badrol Shah CS2305B.
System Architecture CS 560. Project Design The requirements describe the function of a system as seen by the client. The software team must design a system.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
James A. Senn’s Information Technology, 3rd Edition
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Chapter 19: Network Management
Information Systems in Organizations
Outline Basic concepts in computer security
On-Line Transaction Processing
Discovering Computers 2010: Living in a Digital World Chapter 14
Netscape Application Server
SuperComputing 2003 “The Great Academia / Industry Grid Debate” ?
Chapter 1: Introduction
Database Systems: Design, Implementation, and Management Tenth Edition
Outline Introduction Characteristics of intrusion detection systems
Introduction to NewSQL
Database Actors Welcome : To the third learning sequence “ DB ACTORS “
#01 Client/Server Computing
Chapter 16 Designing Distributed and Internet Systems
Introduction to Database Systems
Overview of Databases and Transaction Processing
Building a Database on S3
The Globus Toolkit™: Information Services
Introduction to Data Warehousing
TechEd /11/ :44 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
Chapter 17: Client/Server Computing
Database Actors.
PLANNING A SECURE BASELINE INSTALLATION
Overview of Networking
Terms: Data: Database: Database Management System: INTRODUCTION
The Database Environment
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Microsoft Virtual Academy
#01 Client/Server Computing
Data Warehouse and OLAP Technology
OU BATTLECARD: Oracle Data Integrator
Presentation transcript:

The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway

But First… What is our purpose as a community? Produce (wonderful) new ideas Structure the field Educate the workforce

A Brief History of Federation Multibase @1980 Many attempts since Functional Relational Object-oriented Logic-based XML Still not solved (think of last night) And never will be?

Number 10: Robustness Systems fail Sources slow or unavailable In a distributed system, more pieces => more failures Users don’t like failures

Number 9: Security Different systems have different security mechanisms Hard to create a single coherent view of permissions Distributed systems are more vulnerable More points of failure Hard to make security guarantees Data is often the corporate jewels It must be protected

Number 8: Updates Recording change isn’t always an UPDATE Application semantics must be accounted for Application APIs must be reckoned with ACIDity isn’t always achievable Not all data sources display ACID properties Varying degrees of support Strong transaction semantics not always possible or appropriate And always painful Changes to multiple sources must be coordinated Requirements for consistency vary

Number 7: Configurability Many architectures possible Even with pre-existing sources, many choices Little or no guidance on tradeoffs Lots of code to install Federation engine, data source clients Often choices here Lots of connections to define Need tooling to support

Number 6: Administration Monitoring is hard Not all sources have facilities to track events Variety of mechanisms for different events, and different sources Not always APIs Tuning is difficult Need to understand what must change Need to take appropriate actions Repairing is painful Distributed debugging Different vendors to deal with for fixes

Number 5: Semantic heterogeneity Hard to identify commonalities Same terms, different meanings Different terms, same meaning Different structures representing different interpretations Can’t integrate data effectively without them Can’t make sensible queries

Number 4: Insufficient Metadata Need metadata to integrate, configure, administer and query Every data source has different metadata No uniform standard Not always collected Tools to examine and exploit missing

Number 3: Performance (Data Movement) Distributed queries involve moving data Geographic distribution is common WAN is slow Large data volumes common Large numbers of objects Large objects Caching isn’t a complete answer Changes can be frequent and hard to track Storage is not unlimited

Number 2: Performance (Complexity) Decision-support appls do complex queries Many choices for how to execute Big differences in performance among choices Need data from diverse sources May not have enough power in source Performance at sources may vary Need expensive functions of data Function may not be implemented everywhere Flowing the data to the function expensive

Number 1: Performance (Pathlength) Simple queries (OLTP-like) incur huge overheads Processing and networking costs Simple queries are common Easier to write Automatically produced Workflows

So Why Will Federated Succeed? It has to Integration one of the top IT issues And it’s not going away Alternatives are expensive and/or painful Write it by hand EAI/Workflow Consolidation (warehouse, data marts…)

So Why Will Federated Succeed? (2) Simple scenarios exist Don’t need OLTP, high security, great robustness, … for all applications Customers know their data, or must learn anyway Needs are so great, compromise is possible

So Why Will Federated Succeed? (3) Progress on technology being made 20 years of distributed query processing Plumbing in place Commit protocols Reliable messaging Connectivity infrastructure XML (basic community agreement) XML data format XML schema Web services We’re getting closer

What would we do if it ever did work? Retire  Integrate the web? Data grids Data Google P2P database?

For Discussion Is research in this area warranted? What are the most important research topics? Did we miss any?