Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 About the Instructor Name: Gong Zhiguo Office: N512 Phone: 83974962 Remark: Some of the slides are tailored from.

Similar presentations


Presentation on theme: "1 About the Instructor Name: Gong Zhiguo Office: N512 Phone: 83974962 Remark: Some of the slides are tailored from."— Presentation transcript:

1 1 About the Instructor Name: Gong Zhiguo Office: N512 Phone: 83974962 E-Mail: fstzgg@umac.mofstzgg@umac.mo Remark: Some of the slides are tailored from the slides by Prof. Hector Garcia-Molina

2 2 From File Processing to DBMS File of current accounts File of saving accounts File of customers Program 4 customer information Program 1 deposit withdraw Program 2 transfer Program 3 printing stmt DBMSDBMS BANK DATA BASE Program 4 customer information Program 1 deposit withdraw Program 2 transfer Program 3 printing stmt

3 3 DDB S = Database + Networking v The technology of computer networks, promotes a mode of work that goes against all centralization efforts and facilitates distributed computing v Distributed database system technology is the union of what appear to be diametrically opposed approaches to data processing: Database System, Computer Network technologies v A database system aims at integrating the operational data of an enterprise, and to provide a centralized and controlled access to that data

4 4 Distributed Computing System v A distributed computing system consists of  a number of autonomous processing elements (not necessarily homogeneous)  interconnected by a computer network  cooperate in performing their assigned tasks v What is distributed?  Processing Logic  Function  Data  Control All these are necessary and important for distributed database technology

5 5 Distributed DBMS Environment Site 1 Site 2 Site 4 Site 3 Site 5 Site 6 Communication Network

6 6 Distributed Database System v A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network; stores data on multiple computers (nodes) over the network and permits access from any node to the joint data v A distributed database management system (DDBMS) is a software system that permits the management of the distributed databases and makes the distribution transparent to the users.

7 7 What is not a Distributed Database System? v A DDBS is not a ``collection of files'' that can be individually stored at each node of a computer network  files are not logically related  no access via common interface

8 8 Centralized DBMS on a Network v data resides only at one node v the database management is no different from centralized DBMS v remote processing, single server­multiple clients Site 1 Site 2 Site 4 Site 3 Site 5 Site 6 Communication Network

9 9 Client-Server Systems v (or how to partition software) Application Front End Query Processor Transaction Processing File Access client server

10 10 Client-Server Systems v (or how to partition software) Application Front End Query Processor Transaction Processing File Access client server

11 11 Client-Server Systems v (or how to partition software) Application Front End Query Processor Transaction Processing File Access client server

12 12 Transaction Servers v Clients ship transactions consisting of 1 or more SQL commands E.g., Open DataBase Connectivity (ODBC) (standard API)

13 13 Data Servers v Client requests pages or records v Popular for OODB systems

14 14 Multiprocessor Systems (Parallel Server) Shared Memory (SMP) Shared Disk Shared Nothing (network) Sequent, SGI, SunVMScluster, SysplexTandem, Teradata, SP2 CLIENTS Memory Processors CLIENTS

15 15 Parallel or distributed DB system? v More similarities than differences!

16 16 Typically, parallel DBs:  Fast interconnect  Homogeneous software  High performance is goal  Transparency is goal

17 17 Typically, distributed DBs:  Geographically distributed  Data sharing is goal (may run into heterogeneity, autonomy)  Disconnected operation possible

18 18 Query processing in parallel DBs: v Typically: we can distribute/partition/ sort…. data to make certain DB operations (e.g., Join) fast

19 19 Query processing in distributed DBs: v Typically: we are given data distribution; we need to find query processing strategy to minimize cost (e.g., communication cost)

20 20 Cloud Computing v Is CC just a marketing term??  utility (like power)  data or CPU cycles?  many processors, many storage units  business model

21 21 Cloud Computing ( M. Armbrust, A View of Cloud Computing, Communication of ACM) v Larry Ellison (Oracle CEO)  “The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do….” v Cloud computing:  both the applications delivered as services over the internet and the hardware and systems software in the data centers that provide those services. v Grid computing  Protocols to offer shared computation and storage over long distance, bbut within a community.

22 22 Is CC a subset, superset, disjoint from, or overlaps with: v grid computing v distributed computing v Web 2.0 v Cluster Computing v Peer-to-peer computing v software as a service v client-server computing v data center as a computer v massively parallel computing (A) CC (B) CC (C) CC (D) CC

23 23 Distributed Database System Technology v The key is integration, not centralization v Distributed database technology attempts to achieve integration without centralization Database Technology Computers Networks Distributed Database Systems Integration Integration Without Centralization Distributed Computing

24 24 Example v Multinational manufacturing company:  head quarters in Macau  manufacturing plants in Nanning and Kunming  warehouses in Zhongshan and Dongguan  R&D facilities in Beijing v Data and Information:  employee records (working location)  projects (R&D)  engineering data (manufacturing plants, R&D)  inventory (manufacturing, warehouse)

25 25 Promises of Distributed DBMS v transparent management of distributed, fragmented, and replicated data v improved reliability and availability through distributed transactions v improved performance v higher system extendibility

26 26 Transparency v Transparency refers to separation of the higher-level semantics of a system from lower-level implementation details. v From data independence in centralized DBMS to fragmentation transparency in DDBMS. v Issues  Who should provide transparency?  What is the state of the art in the industry?

27 27 Improved Reliability v Distributed DBMS can use replicated components to eliminate single point failure. v The users can still access part of the distributed database with “proper care” even though some of the data is unreachable. v Distributed transactions facilitate maintenance of consistent database state even when failures occur.

28 28 Improved Performance v Since each site handles only a portion of a database, the contention for CPU and I/O resources is not that severe. Data localization reduces communication overheads. v Inherent parallelism of distributed systems may be exploited  inter-query parallelism  intra-query parallelism v Performance models are not sufficiently developed.

29 29 Easier System Expansion v Ability to add new sites, data, and users over time without major restructuring. v Huge centralized database systems (mainframes) are history (almost!). v PC revolution (Compaq buying Digital, 1998) will make natural distributed processing environments. v New applications (such as, supply chain) are naturally distributed - centralized systems will just not work.

30 30 Disadvantages of DDBSs v Lack of Experience  No operating true distributed database systems in existence v Complexity  DDBS problems are inherently more complex than centralized DBMS ones v Cost  More hardware, software and people costs v Distribution of control  Problems of synchronization and coordination to maintain data consistency v Security  Database security + network security v Difficult to convert  No tools to convert centralized DBMSs to DDBSs

31 31 Complicating Factors v Data may be replicated in a distributed environment, consequently the DDBS is responsible for  choosing one of the stored copies of the requested data for access in case of retrievals  making sure that the effect of an update is reflected on each and every copy of that data item v If there is site/link failure while an update is being executed, the DDBS must make sure that the effects will be reflected on the data residing at the failing or unreachable sites as soon as the system recovers from the failure

32 32 Complicating Factors v Maintaining consistency of distributed/replicated data. v Since each site cannot have instantaneous information on the actions currently carried out in other sites, the synchronization of transactions at multiple sites is harder than centralized system.

33 33 Distributed DBMS Issues v Distributed Database Design v Distributed Query Processing v Distributed Directory Management v Distributed Concurrency Control v Distributed Deadlock Management v Reliability of Distributed Databases v Operating Systems Support v Heterogeneous Databases

34 34 Distributed Database Design v The problem is how the database and the applications that run against it should be placed across the sites. v The two fundamental design issues are fragmentation (the separation of the database into partitions called fragments), and allocation (distribution), the optimum distribution of fragments. The general problem is NP­hard.

35 35 Distributed Query Processing v Query processing deals with designing algorithms that analyze queries and convert them into a series of data manipulation operations. v The problem is how to decide on strategy for executing each query over the network in the most cost effective way, however the cost is defined. The objective is to optimize where the inherent parallelism is used to improve the performance of executing the transaction

36 36 Distributed Directory Management v A directory contains information (such as descriptions and locations) about data items in the database. v A directory may be global to the entire DDBS, or local to each site, distributed, multiple copies, etc.

37 37 Distributed Concurrency Control v Concurrency control involves the synchronization of accesses to the distributed database, such that the integrity of the database is maintained. v One not only has to worry about the integrity of a single database, but also about the consistency of multiple copies of the database (mutual consistency)

38 38 Reliability of Distributed DBMS v It is important that mechanisms be provided to ensure the consistency of the database as well as to detect failures and recover from them. v This may be extremely difficult in the case of network partitioning, where the sites are divided into two or more groups with no communication among them.

39 39 Directory Management Deadlock Management Concurrency Control ReliabilityDistributed DB DesignQuery Processing Relationship among Topics


Download ppt "1 About the Instructor Name: Gong Zhiguo Office: N512 Phone: 83974962 Remark: Some of the slides are tailored from."

Similar presentations


Ads by Google