Digital Object Architecture: Building Information Management Infrastructure for Networks 20 September 2010 Larry Lannom Corporation for National Research.

Slides:



Advertisements
Similar presentations
Internet Evolution, Governance and the Digital Object Architecture Workshop on SCORM Sequencing and Navigation Gaithersburg, Maryland February 23, 2005.
Advertisements

Digital Object Architcture An open approach to Information Management on the Net Bibliotheca Alexandrina Dr. Robert E. Kahn Corporation for National Research.
Distributed Data Processing
Distributed Processing, Client/Server and Clusters
A Unified Approach to Combat Counterfeiting: Use of the Digital Object Architecture and ITU-T Recommendation X.1255 Robert E. Kahn President & CEO CNRI,
Transitioning to IPv6 April 15,2005 Presented By: Richard Moore PBS Enterprise Technology.
Chapter 19: Network Management Business Data Communications, 5e.
A Java Architecture for the Internet of Things Noel Poore, Architect Pete St. Pierre, Product Manager Java Platform Group, Internet of Things September.
8.
Active Directory: Final Solution to Enterprise System Integration
An Engineering Approach to Computer Networking
Chapter 2 Network Models.
Technical Architectures
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
Protocols and the TCP/IP Suite
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Internet Telephony Helen J. Wang Network Reading Group, Jan 27, 99 Acknowledgement: Jimmy, Bhaskar.
What Is TCP/IP? The large collection of networking protocols and services called TCP/IP denotes far more than the combination of the two key protocols.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Module – 7 network-attached storage (NAS)
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Communicating over the Network Network Fundamentals – Chapter 2.
Lecture slides prepared for “Business Data Communications”, 7/e, by William Stallings and Tom Case, Chapter 8 “TCP/IP”.
Evolved from ARPANET (Advanced Research Projects Agency of the U.S. Department of Defense) Was the first operational packet-switching network Began.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Network Topologies.
Introduction to UDDI From: OASIS, Introduction to UDDI: Important Features and Functional Concepts.
Protocols and the TCP/IP Suite Chapter 4. Multilayer communication. A series of layers, each built upon the one below it. The purpose of each layer is.
11 REVIEWING MICROSOFT ACTIVE DIRECTORY CONCEPTS Chapter 1.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Chapter 2 The Infrastructure. Copyright © 2003, Addison Wesley Understand the structure & elements As a business student, it is important that you understand.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Lecture 2 TCP/IP Protocol Suite Reference: TCP/IP Protocol Suite, 4 th Edition (chapter 2) 1.
WSIS Forum 2011 May 19, 2011 Presentation by Robert E. Kahn
Lecture On Database Analysis and Design By- Jesmin Akhter Lecturer, IIT, Jahangirnagar University.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
SIGNALING. To establish a telephone call, a series of signaling messages must be exchanged. There are two basic types of signal exchanges: (1) between.
The Digital Object Architecture A presentation at Louisiana State University Baton Rouge, Louisiana August 26, 2005 Robert E. Kahn Corporation for National.
Reflections on the Digital Object Architecture by Robert E. Kahn, CNRI A presentation at a Symposium on Trusted Repositories in Rome, Italy on November.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Managing Digital Objects on the Net by Robert E. Kahn Corporation for National Research Initiatives Reston, Virginia National Online 2001 New York City.
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
OS Services And Networking Support Juan Wang Qi Pan Department of Computer Science Southeastern University August 1999.
NT SECURITY Introduction Security features of an operating system revolve around the principles of “Availability,” “Integrity,” and Confidentiality. For.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Distributed database system
Critical Decisions, Myths & Lessons Learned in Networking What is important at the time may be only apparent with hindsight What seems important at the.
CHAPTER 4 PROTOCOLS AND THE TCP/IP SUITE Acknowledgement: The Slides Were Provided By Cory Beard, William Stallings For Their Textbook “Wireless Communication.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Object storage and object interoperability
Introduction to Active Directory
1 Active Directory Service in Windows 2000 Li Yang SID: November 2000.
Active Directory. Computers in organizations Computers are linked together for communication and sharing of resources There is always a need to administer.
1 CS 502: Computing Methods for Digital Libraries Guest Lecture William Y. Arms Identifiers: URNs, Handles, PURLs, DOIs and more.
Directory Services CS5493/7493. Directory Services Directory services represent a technological breakthrough by integrating into a single management tool:
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Planning an Active Directory Deployment Lesson 1.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
Building a Data Warehouse
The Client/Server Database Environment
CHAPTER 3 Architectures for Distributed Systems
Protocols and the TCP/IP Suite
An Introduction to Computer Networking
Protocols and the TCP/IP Suite
Presentation transcript:

Digital Object Architecture: Building Information Management Infrastructure for Networks 20 September 2010 Larry Lannom Corporation for National Research Initiatives

Corporation for National Research Initiatives Three Initial Networks About 30 – 35 years ago, DARPA funded the creation of three seminal packet networks – ARPANET, Packet Radio, Packet Satellite link the three of them The Internet came about from a desire to link the three of them Ethernet occurred in parallel, led by Xerox Parc researchers, and other network types followed The resulting architecture was independent of the number and type of networks or who ran them.

Corporation for National Research Initiatives The Internet would be a global information system. An open-architecture interfaces, protocols & objects An open-architecture would be used to combine different networks based on open and well-known interfaces, protocols & objects. TCP/IP A new communications-oriented host protocol (TCP/IP) would be created to replace the original ARPANET host protocol (NCP). IP addresses The concept of global addressing and IP addresses would be introduced to identify individual machines anywhere on the global Internet. Key Decisions

Corporation for National Research Initiatives Comments on the Key Decisions The architecture is robust in the presence of many different network types and many outages. Gateways provided IP routing and Network "Impedance Matching". TCP accommodated end-end protocol: —different packet sizes, duplicates, error detection, losses due to tunnels, mountains, jamming, etc. Separate network administrations were permitted, which allowed the Net to grow. DNS not technically critical, but helped users.

Corporation for National Research Initiatives Understanding the Big Picture Many things were done well from the outset; with 20/20 hindsight, some could have been done better. The context was critical: –Mostly mainframes, few time-sharing systems –No PCs, workstations, LANs –One dominant carrier in the US –Government facility initially What is important at the time may be only apparent with hindsight; but also what seems important at the time may not turn out to be so important later on.

Corporation for National Research Initiatives What is so hard about it? – Making it scalable over platforms, size and time – Achieving Critical Mass Getting Buy in: – Pleasing many essential participants – Displacing prior capabilities – Structuring matters to deal with concerns about empire building It’s a lot easier to create brand new capabilities than to affect existing means of operation. Infrastructure Development

Corporation for National Research Initiatives Infrastructure Creation is a Subtractive Process Infrastructure reduces a common, shared capability to its basic and essential attributes. These attributes are not always recognized or understood up front. Upon further scrutiny, capabilities are usually deleted from a well-conceived architecture over time. Consensus develops when no more can be removed without disabling the infrastructure.

Corporation for National Research Initiatives What is the Information Management Problem? Managing information in the Net over very long periods of time – e.g., centuries or more. Dealing with very large amounts of information in the Net over time. When information, its location(s) and even the underlying systems may change dramatically over time. Respecting and protecting rights, interests and value.

Corporation for National Research Initiatives Allows for arbitrary types of information systems. Allows for dynamic formatting and data typing. Can accommodate interoperability between multiple different information systems. Allows metadata schema to be identified and typed. A Meta-level Architecture

Corporation for National Research Initiatives To reformulate the Internet architecture around the notion of uniquely identifiable data structures. Enabling existing and new types of information to be reliably managed and accessed in the Internet environment over long periods of time. Providing mechanisms to stimulate innovation, the creation of dynamic new forms of expression, and to manifest older forms. While supporting intellectual property protection, fine- grained access control, and enable well-formed business practices to emerge. Digital Object Architecture: Motivation

Corporation for National Research Initiatives Digital Object Architecture Technical Components Digital Objects (DOs) – Structured data, independent of the platform on which it was created – Consisting of "elements" of the form – One of which is its unique, persistent identifier Resolution of Unique Identifiers – Maps an identifier into "state information" about the DO – Handle System is a general purpose resolution sy stem Repositories from which DOs may be accessed – And into which they may be deposited Metadata Registries – Repositories that contain general information about DOs – Support multiple metadata schemes – Can map queries into unique DO specifications (via handles)

Corporation for National Research Initiatives Defined data structure, machine independent. Consisting of a set of elements: – Each of the form – One of which is the unique identifier Identifiers are known as "Handles": – Format is "prefix/suffix" – Prefix is unique to a naming authority – Suffix can be any string of bits assigned by that authority Data structure can be parsed; types can be resolved within the architecture. Associated properties record, and transaction record, contain metadata and usage information. What is a Digital Object?

Corporation for National Research Initiatives Create a cohesive interoperable collection of repository-based systems. – Initially, perhaps, around a core set of projects, content, applications and/or organizations Demonstrate interoperability between different repository collections. Develop procedures to insure continued accessibility to key archival information. Interoperability & Federated Repositories

Corporation for National Research Initiatives Repository Notion Any Hardware & Software Configuration Logical External Interface DOP Digital Object Protocol

Corporation for National Research Initiatives Repository Each Digital Object has its own unique & persistent ID. Content Providers assign IDs. Could be upwards of trillions of DOs per Repository. Objects may be Replicated in Multiple Repositories Repositories & Digital Objects

Corporation for National Research Initiatives Distributed identifier service on the Internet First general purpose resolution system Can be used to locate repositories that contain digital objects given their handles – and more! Other indirect references – Public Keys, Authentication information for DOs Accommodates interoperability between many different information systems The Handle System

Corporation for National Research Initiatives The basic Architecture of the Handle System is flat, scaleable, and extensible. Logically central, but physically decentralized. Supports Local Handle Services, if desired. Handle resolutions return entire "handle records" or portions thereof. Handle Records are also: – digital objects – signed by the servers – doubly certificated by the system. Attributes of the Handle System

Corporation for National Research Initiatives Resolution Mechanism Multiple Sites Multiple Servers Handle System Handle System is non-nodal Scaleable & Distributed Supports global (and local) resolution Has backup for reliability, mirroring for efficiency Handle Record

Corporation for National Research Initiatives Managing Digital Objects for long-term access is a key challenge. Initial technology components are available; industry is expected to generate more over time. Third-party value-added providers in the private sector will ultimately shape the long-term evolution. Interoperability and reliable information access is a critical objective. A diversity of applications (with user-friendly interfaces) need to be developed & deployed. Conclusions

Corporation for National Research Initiatives Phone Guy Perspective

Purpose of Digital Object Today's architectures and paradigms, including leading edge technology, operate on the circuit switched telephone equivalent of data storage. – A "dumb" system for payload data storage ("the circuits"). – A separate system for management, control, and metadata ("the signaling network"). As a consequence, these systems are limited in robustness, security, interoperability, extensibility, cost effectiveness, vendor independence, and functionality. Create the foundation for data storage and retrieval, equivalent to what packet data did for communication. Urs Muller, Net-Scale

Today's Paradigms Data management Data Access control Key management Provenance infrastructure Version control Metadata Data storage User Request Data Examples: Documentum (EMC)‏ SharePoint, MOSS 2007 (Microsoft)‏ FileNet (IBM)‏ 10g, Stellent (Oracle)‏ LiveLink (OpenText)‏ Alfresco (open source)‏ Authentication Urs Muller, Net-Scale

What Happens When Data Is Moved Data management Data Data storage Data Loss of access control Loss of key management Loss of provenance infrastructure Loss of version control Loss of metadata Urs Muller, Net-Scale

Use of separate and different systems for storage of the (payload) data and the data management. – Creates a centralized system. – Poor interoperability. – Heavy vendor and product dependence. The data management system is a fragile huge single point of failure which requires heavy protection to make a solution usable. – This is similar to the signaling network and out of band data in a circuit switched traditional telephone network. Poorly suited to reach these key requirements for the DoD: – High degree of global data distribution and replication (a super robust network, data is available where needed). – Vendor independence. – Interoperability among vendors and multiple technology generations (like the Internet). – Access control "travels" with the data and does not need to be replicated each time the data is copied onto a different system (e.g., a laptop). Limitations of Today's Paradigms Urs Muller, Net-Scale

Digital Object Architecture Data Access control Key management Provenance infrastructure Version control Metadata Data Digital Object Repository Data Urs Muller, Net-Scale

A Digital Object Is Moved Data management remains intact: Access control Key management Provenance infrastructure Version control Metadata Digital Object Repository Data Urs Muller, Net-Scale

A Solid Foundation The Digital Object Architecture provides a solid foundation for the creation of: A highly distributed, robust, and scalable data storage and retrieval infrastructure. – Digital Objects are self-contained and don't depend on a separate centralized data management subsystem. This dramatically improves scalability. A highly secure data storage and retrieval infrastructure. – By eliminating a centralized security paradigm which is a single point of failure and greatly vulnerable to attacks. – Security is distributed. A successful attack reveals very little reward (each digital object has to be attacked separately). A highly "future proof", extensible, interoperable, and vendor independent data storage and retrieval infrastructure. – By greatly reducing the complexity for exchanging data without breaking access control, provenance, version control, etc. The Digital Object Architecture provides a far superior foundation for realizing these essential properties compared to today's paradigms. Urs Muller, Net-Scale

Comparison to Data Communication Circuit Switched (old phone) (~ traditional architectures)‏ Data has no "intelligence" and is managed by a large central system (signaling network). Packet Based (Internet) (~ Digital Object Architecture)‏ Data management information is embedded with the data itself (packet header). – The packet itself knows what it is, where it is coming from and where it is going to. – The network can be simpler, far more flexible and robust. Today, few people dispute that packet routing is superior to circuit switching for data communication. – A few decades ago the differences were not so clear. After all, data can easily be exchanged over a circuit-switched network. Compared with today's paradigms, the Digital Object Architecture will lead to far more flexibility, diversity, technology independence, and overall usage for data storage and retrieval. Urs Muller, Net-Scale

Example From the Real World Circuit switched past: When a 5ESS switch was down, all calls to the affected area were out, leaving a whole region without communication. Current Internet: On December 19, 2008 three undersea cables were cut between the Middle East and Europe. Data traffic was severely impacted but communication remained intact. We expect the Digital Object Architecture to create a paradigm shift for data storage and retrieval similar to the impact the Internet had on data communication. Urs Muller, Net-Scale

Corporation for National Research Initiatives Digital Object Architecture Where Are We? Handle System Up and running since the early 90s Core architecture stable from the late 90s Digital Object Repository In daily use in multiple projects Available open-source since the start of Introductory article in Jan/Feb D-Lib Magazine Digital Object Registry In daily use in multiple projects Available open-source since May,

Information Management on Networks Resolution Client Resource Discovery Search Engines, Metadata Databases, Catalogues, Guides, etc. ……. ……. ……. John Jane Reminder Don't forget me! Repositories / Collections Identifier Resolution System

Corporation for National Research Initiatives Information Management on Networks Administrative Client John Jane Reminder Don't forget me! Repositories / Collections Resource Discovery Search Engines, Metadata Databases, Catalogues, Guides, etc. ……. ……. ……. Identifier Resolution System

Corporation for National Research Initiatives Information Management on Networks Administrative Client John Jane Reminder Don't forget me! Repositories / Collections Resource Discovery Search Engines, Metadata Databases, Catalogues, Guides, etc. ……. ……. ……. Identifier Resolution System

Corporation for National Research Initiatives Federation Federation in information systems makes sense when – a set of varying features exists across the federates, which is the reason for multiplicity Includes organizational boundaries, locations, content types, etc. – a set of common features exists across federates, which is usuallly the reason to perform federation Shared topics, common audience, etc.

Corporation for National Research Initiatives Challenges - Conceptual Identifying the type of aggregation: – Aggregate objects ahead of time, before query? – Merge search responses from federates by issuing a distributed query? – Or, anything in between? Identifying the level of semantic interoperability – Enforce complete semantic interoperability across all the data stored in the federates? – Use only the least common denominator (from a data semantics point of view) among the federates? Federate topology – Are all federates directly connected to each other? (fully-connected mode) – Is each federate connected to only its neighbor? (peer-peer mode) These criteria can be visualized as a Federation Spectrum

No Aggregation of Objects (Distributed Query) Complete Aggregation of Objects Disconnected Federates Fully Connected Federates Complete Semantic Interoperability No Semantic Interoperability (Ad Hoc Mix) No Semantic Interoperability (Ad Hoc Mix) No Semantic Interoperability (Ad Hoc Mix) No Semantic Interoperability (Ad Hoc Mix) Complete Semantic Interoperability Level of data Interoperability Format of Participation Level of Aggregation Level of data Interoperability Federation Spectrum

Corporation for National Research Initiatives Challenges - Technical Depending on the criteria chosen for federation, various technical requirements arise. These may include: – Designing a storage model to aggregate objects into a common store that identifies the relationship between multiple metadata instances describing a single object – Designing cross-walking algorithms to translate and map heterogeneous data into a common model – Designing a query model to gather and rank search results from multiple federates – Ensuring scalability, reliability, and security without compromising performance

Corporation for National Research Initiatives Existing technologies Digital Object Registry (basis for ADL-R) – Provides a data model to encapsulate related metadata instances together – Enables aggregation of objects from fully- connected mode to peer-peer mode – Uses the Handle System to uniquely identify objects and metadata instances across all federates