How to effectively store the history of data in a relational DBMS Database systems MSE-Seminar 14.12.20081© Raphael Gfeller,

Slides:



Advertisements
Similar presentations
The Relational Model and Relational Algebra Nothing is so practical as a good theory Kurt Lewin, 1945.
Advertisements

C6 Databases.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Management Information Systems, Sixth Edition
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
© Copyright 2011 John Wiley & Sons, Inc.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 © John Wiley & Sons, Inc. All rights reserved. Slide 1 Key.
Chapter 3 Database Management
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Database Management: Getting Data Together Chapter 14.
Organizing Data & Information
Accounting Databases Chapter 2 The Crossroads of Accounting & IT
Chapter 14 The Second Component: The Database.
Database Features Lecture 2. Desirable features in an information system Integrity Referential integrity Data independence Controlled redundancy Security.
SESSION 7 MANAGING DATA DATARESOURCES. File Organization Terms and Concepts Field: Group of words or a complete number Record: Group of related fields.
1 Introduction Introduction to database systems Database Management Systems (DBMS) Type of Databases Database Design Database Design Considerations.
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
Chapter 4: Organizing and Manipulating the Data in Databases
ACS1803 Lecture Outline 2 DATA MANAGEMENT CONCEPTS Text, Ch. 3 How do we store data (numeric and character records) in a computer so that we can optimize.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
The McGraw-Hill Companies, Inc Information Technology & Management Thompson Cats-Baril Chapter 3 Content Management.
STORING ORGANIZATIONAL INFORMATION— DATABASES CIS 429—Chapter 7.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 © John Wiley & Sons, Inc. All rights reserved. Slide 1 Systems.
Chapter 3 and Module C DATABASES AND DATA WAREHOUSES Building Business Intelligence.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Principles of Database Design, Conclusions AIMS 2710 R. Nakatsu.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
MIS2502: Data Analytics The Information Architecture of an Organization.
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Chapter 13 Designing Databases Systems Analysis and Design Kendall & Kendall Sixth Edition.
Principles of Database Design, Conclusions MBAA 609 R. Nakatsu.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
Fundamentals of Information Systems, Sixth Edition Chapter 3 Database Systems, Data Centers, and Business Intelligence.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
uses of DB systems DB environment DB structure Codd’s rules current common RDBMs implementations.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Managing Data Resources File Organization and databases for business information systems.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Intro to MIS – MGS351 Databases and Data Warehouses
Chapter 13 The Data Warehouse
Databases and Data Warehouses Chapter 3
Database.
Teaching slides Chapter 8.
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

How to effectively store the history of data in a relational DBMS Database systems MSE-Seminar © Raphael Gfeller,

Agenda Time – Definition – Storing within a DMBS – General problems History – Motivation – On Data warehouses, Online Analytical Processing system, OLAP systems Common patterns available: slowly changing dimensions, SDC – On Online transaction Processing system, OLTP systems No common patterns available Analyzed data models – Performed tests – Results – Conclusion Questions References © Raphael Gfeller,

Time is defined as one second: “the duration of periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom.” © Raphael Gfeller,

Storing within a DMBS a date is represented by an offset with a defined accuracy on a reference point an interval is represented by a value with a defined accuracy a duration – is represented by a composite value of two dates or – a date value and an interval value © Raphael Gfeller,

Storing within a DMBS \ Example Value to store: :00 Used date type: smalldatetime (SQL 2), resolution 1 min, reference point: , based on the Gregorian calendar Internal stored Value: („Value to store“ - reference point [min]) © Raphael Gfeller,

Storing within a DMBS \ Date Types Available date types on Microsoft SQL Server © Raphael Gfeller,

Time \ General Problems Different time zones Different implementation of data types Different calendars Time synchronization – local – networked Summer, winter time problematic Choosing the date type – Resolution is to small – Resolution is to high – Range is to small © Raphael Gfeller,

History \ Motivation Why do we need to know the history of our data? For example for: Legal requirements – A bank has to know at each time what the exact balance of the customer was – A internet provider has to be able to store all traffic from a user over a given time Business requirements – A version control has to be able to manage multiple revisions of the same unit of information – A customer relationship management (CRM) software has to be able to present the volume of sales of a costumer over time Entertainment requirements – A chat program has to be able to present the conversation between two person over time Other requirements – The “Time Machine” function in the Mac OS X has to be able to go back in time for locating older version of your files © Raphael Gfeller,

Main approaches to store data Data warehouses (Online Analytical Processing system, OLAP) Designed for – Reporting – Analysis – Speed of data retrieval Uses to following approaches – data are stored denormalised based on a dimension-based model (logical data grouped together) Include often business intelligence tools to retrieve and analyze data History – Common pattern exits for storing historical data (Slow Changing Dimensions (SCD)) Online transaction Processing system, OLTP Designed for – Perform day-to-day transaction processing – Preservation of data integrity – Speed of recording of business transactions Uses to following approaches – Database normalization Codd rules of data normalization – Entity-relationship model History – No common pattern available for storing historical data © Raphael Gfeller,

History \ Slow Changing Dimensions [0/2] Type 0: an attribute of a dimension is fixed, no history available, is not frequently used yet Type 1: overwrites the old data with the new data, no history available Type 2: tracks historical data by creating multiple records with a separated key, unlimited history is possible Type 3: additional columns in the tables track changes, limited history is available Type 4: creates separate historical tables that stores the historical data Type 6: is a hybrid approach that combines SCD 1, 2 and 3, unlimited history is possible, is not frequently used yet  Type 1,2 and 3 are the most common © Raphael Gfeller,

History \ Slow Changing Dimensions [1/2] SCD 1, overwrites the old data with the new data, no history available SCD 2, tracks historical data by creating multiple records with a separated key, unlimited history is possible Results in © Raphael Gfeller,

History \ Slow Changing Dimensions [2/2] SCD 3, Additional columns in the tables track changes, limited history is available SCD 4, creates separate historical tables that stores the historical data © Raphael Gfeller,

History \ OnLine Transaction Processing system, OLTP No common patterns available for storing historical data  Common used rational data models are analyzed (all based on SCD type 2 (unlimited history is possible)) – Method “Duplication” – Method “Transaction” – Method “Linked history items” – Method “Bidirectional linked history items” Based on the following rational data model: © Raphael Gfeller,

History \ OLTP \ Duplication Method “Duplication” Focused on Fast access of historical data Easy implementation Data integrity IDNameSalaryID_ChangeID_Company 1Gfeller Raphael Gfeller Raphael Hans Meier Hans Meier Fritz Müller IDDateTime IDNameID_Change 1UBS1 2HSR1 1UBS2 2HSR © Raphael Gfeller,

History \ OLTP \ Transaction Method “Transaction” Focused on Less used data storage Precious information about history at every point on time IDNameSalaryID_Company 1Gfeller Raphael Hans Meier5001 3Fritz Müller17502 IDDateTimeEntry IDNewStringValueAction Gfeller RaphaelPerson.SetName Gfeller Raphal, 2500, 1Person.Create IDName 2HSR 1UBS © Raphael Gfeller,

History \ OLTP \ Linked history items Method “Linked history items” Focused on Avoid huge changes to the underlying database Easy to implement Fast insertion of new entries IDNameSalaryID_CompanyFK_Old_ID 1Gfeller Raphael Hans Meier5001NULL 3Gfeller Raphael 15001NULL 4Gfeller Raphael Fritz Müller17502NULL © Raphael Gfeller,

History \ OLTP \ Bidirectional Linked history items Method “Bidirectional Linked history items” Focused on Avoid huge changes to the underlying database Extendibility by adding additional metadata to the separated table Fast insertion of new entries Providing additional backward and forward navigation IDNameSalaryID_Company 1Gfeller Raphael Hans Meier5001 3Gfeller Raphael Gfeller Raphael Fritz Müller17502 Old PersonNew PersonDateTimeUser User User © Raphael Gfeller,

History \ OLTP \ Analyzed criteria’s Analyzing criteria's Insert an entry Updating an entry Storage cost Get an entry at (Time – 1) Get en entry at (Time – n) Entry at time x Get an integrity state over all entries Get the next entry by a entry at the past Get the previous entry by a entry at the past A person by a company at the past © Raphael Gfeller,

History \ OLTP \ Theory © Raphael Gfeller,

History \ OLTP \ Test Test environment CPU: Intel Core 2, 2Ghz Memory: 2 Gb Operating System: Windows XP, Sp3, Database: Microsoft SQL Server 2005, Express Edition with SP1, Benchmark written in C# Benchmark input Count inserted companies Count inserted persons Count companies to change Count persons to change Benchmark steps 0.Insert companies 1.Insert persons 2.Change companies 3.Change persons 4.Find a person by its parent person 5.Collect all persons and companies that are valid at a specific time. 6.Find a person in the past by a datetime value 7.Find a person by a company by a datetime value © Raphael Gfeller,

History \ OLTP \ Results  The measurements confirm the theory Possible optimizations Method Change Set based on Duplication – Only changed entries are duplicated  acceptable overhead in reading  fewer data storage used Method “Transaction with anchors “ – Using “anchors transaction”, they resave the entire state of the entries  Fewer network traffic  Restoring an entry becomes linear O(maxChangesBetweenTwoAnchors) instead of O(nChanges) © Raphael Gfeller,

History \ OLTP \ Result \ Conclusion Advices based on these tests If storage is limited use the methods in the following order: – transaction mechanism – Linked history items – Bidirectional linked history items – Transaction with anchors – Change Set based on Duplication – Duplication If network bandwidth is limited, use the methods in the following order: – Change Set based on Duplication – Duplication – Linked history items – Bidirectional Linked history items – Transaction with anchors – transaction mechanism If the knowledge of the developers is low use either method duplication or Linked history items © Raphael Gfeller,

History \ OLTP \ Result \ Conclusion Advices based on these tests If data volume is high, use the methods in the following order: – transaction mechanism – Linked history items – Bidirectional linked history items – Transaction with anchors – Change Set based on Duplication – Duplication If change frequency of the data is high, use the methods in the following order: – Transaction with anchors – transaction mechanism – Linked history items – Bidirectional linked history items – Change Set based on Duplication – Duplication © Raphael Gfeller,

History \ OLTP \ Result \ Conclusion Practical example of the usage of the methods © Raphael Gfeller,

Questions? © Raphael Gfeller,

References Gfeller Raphael, How to effectively store the history of data in a ration DBMS, [Online] [Citied ] © Raphael Gfeller,