Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the same local DBMS at each site. Heterogeneous DDBMS has at least two sites where the local DBMSs are different.
Characteristics of Distributed DBMSs Location transparency feels to a user as though the entire database is at their location. Replication transparency is where the user is unaware of the behind the scenes replication of the data. Fragmentation transparency is where a local object can be divided among the various locations on the network.
Advantages of Distributed Databases Local control of data Increasing database capacity System availability Added efficiency
Disadvantages of Distributed Databases Update of replicated data More complex query processing More complex treatment of shared update More complex recovery measure More difficult management of data dictionary More complex data design
File Servers File server contained files required by the individual workstations on the network.
Client/Server Systems Client/Server has the DBMS run on the file server, but the user sends requests for specific data, not files.
Advantages of Client/Server Systems More efficient than file server systems. Possibility of distributing work among several processors. Workstations need not be as powerful. The user doesn’t need to learn any special commands or techniques.
Advantages of Client/Server Systems Easier for users to access data from a variety of sources. Provides greater level of security than file server systems. Powerful enough to replace expensive mainframe applications.
Data Warehouses A subject-oriented, integrated, time- variant, nonvolatile collection of data in support of management’s decision-making process.
Data Warehouse Architecture
Data Warehouse Structure
Why build a Data Warehouse? To speed up the writing and maintaining of queries and reports by technical personnel To more easily query and report data, on a regular basis, from multiple transaction processing systems and/or from external data sources To provide a repository of transaction processing system data that contains data over a span of time
Why build a Data Warehouse To address security concerns To provide a repository of "cleaned up" transaction processing systems data that can be reported against and that does not necessarily require fixing the transaction processing systems
Data errors Incomplete –Missing records/fields Incorrect –Wrong codes (or incorrect pairing of codes) Incomprehensible –Multiple fields in one field –Many to many relationships –Spreadsheet and word-processing files
Data Errors Inconsistent –Use / meaning of codes –Business rules –Timing –Use of attributes –Use of nulls/spaces
Data Mining Identify the goal Assemble the relevant data Choose your analysis methods Decide which software tool is best for implementing the method Run the analysis Decide how to implement the results
Organizational Databases Operational Database –organized about a transaction –supports OLTP (record keeping) –thousands of users –accesses few records at a time –response time in seconds Data Warehouses –organized about a subject –supports OLAP (decision support) –few hundred users –accesses many records at a time –response times in minutes
Organizational Databases Operational Database –primitive & detailed –smaller (current) –highly normalized (many tables with few columns) –dynamic (continuous updates online) Data Warehouses –derived & summarized –larger (historical) –de-normalized (few tables with many columns) –periodic (batch update)