Download presentation
Presentation is loading. Please wait.
1
How to effectively store the history of data in a relational DBMS Database systems MSE-Seminar 14.12.20081© Raphael Gfeller, RaphaelGfeller@sunrise.ch
2
Agenda Time – Definition – Storing within a DMBS – General problems History – Motivation – On Data warehouses, Online Analytical Processing system, OLAP systems Common patterns available: slowly changing dimensions, SDC – On Online transaction Processing system, OLTP systems No common patterns available Analyzed data models – Performed tests – Results – Conclusion Questions References 14.12.20082© Raphael Gfeller, RaphaelGfeller@sunrise.ch
3
Time is defined as one second: “the duration of 9 192 631 770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom.” 14.12.20083© Raphael Gfeller, RaphaelGfeller@sunrise.ch
4
Storing within a DMBS a date is represented by an offset with a defined accuracy on a reference point an interval is represented by a value with a defined accuracy a duration – is represented by a composite value of two dates or – a date value and an interval value 14.12.20084© Raphael Gfeller, RaphaelGfeller@sunrise.ch
5
Storing within a DMBS \ Example Value to store: 2008-12-02 10:00 Used date type: smalldatetime (SQL 2), resolution 1 min, reference point: 1900-01-01, based on the Gregorian calendar Internal stored Value: 57286680 („Value to store“ - reference point [min]) 14.12.20085© Raphael Gfeller, RaphaelGfeller@sunrise.ch
6
Storing within a DMBS \ Date Types Available date types on Microsoft SQL Server 2008 14.12.20086© Raphael Gfeller, RaphaelGfeller@sunrise.ch
7
Time \ General Problems Different time zones Different implementation of data types Different calendars Time synchronization – local – networked Summer, winter time problematic Choosing the date type – Resolution is to small – Resolution is to high – Range is to small 14.12.20087© Raphael Gfeller, RaphaelGfeller@sunrise.ch
8
History \ Motivation Why do we need to know the history of our data? For example for: Legal requirements – A bank has to know at each time what the exact balance of the customer was – A internet provider has to be able to store all traffic from a user over a given time Business requirements – A version control has to be able to manage multiple revisions of the same unit of information – A customer relationship management (CRM) software has to be able to present the volume of sales of a costumer over time Entertainment requirements – A chat program has to be able to present the conversation between two person over time Other requirements – The “Time Machine” function in the Mac OS X has to be able to go back in time for locating older version of your files 14.12.20088© Raphael Gfeller, RaphaelGfeller@sunrise.ch
9
Main approaches to store data Data warehouses (Online Analytical Processing system, OLAP) Designed for – Reporting – Analysis – Speed of data retrieval Uses to following approaches – data are stored denormalised based on a dimension-based model (logical data grouped together) Include often business intelligence tools to retrieve and analyze data History – Common pattern exits for storing historical data (Slow Changing Dimensions (SCD)) Online transaction Processing system, OLTP Designed for – Perform day-to-day transaction processing – Preservation of data integrity – Speed of recording of business transactions Uses to following approaches – Database normalization Codd rules of data normalization – Entity-relationship model History – No common pattern available for storing historical data 14.12.20089© Raphael Gfeller, RaphaelGfeller@sunrise.ch
10
History \ Slow Changing Dimensions [0/2] Type 0: an attribute of a dimension is fixed, no history available, is not frequently used yet Type 1: overwrites the old data with the new data, no history available Type 2: tracks historical data by creating multiple records with a separated key, unlimited history is possible Type 3: additional columns in the tables track changes, limited history is available Type 4: creates separate historical tables that stores the historical data Type 6: is a hybrid approach that combines SCD 1, 2 and 3, unlimited history is possible, is not frequently used yet Type 1,2 and 3 are the most common 14.12.200810© Raphael Gfeller, RaphaelGfeller@sunrise.ch
11
History \ Slow Changing Dimensions [1/2] SCD 1, overwrites the old data with the new data, no history available SCD 2, tracks historical data by creating multiple records with a separated key, unlimited history is possible Results in 14.12.200811© Raphael Gfeller, RaphaelGfeller@sunrise.ch
12
History \ Slow Changing Dimensions [2/2] SCD 3, Additional columns in the tables track changes, limited history is available SCD 4, creates separate historical tables that stores the historical data 14.12.200812© Raphael Gfeller, RaphaelGfeller@sunrise.ch
13
History \ OnLine Transaction Processing system, OLTP No common patterns available for storing historical data Common used rational data models are analyzed (all based on SCD type 2 (unlimited history is possible)) – Method “Duplication” – Method “Transaction” – Method “Linked history items” – Method “Bidirectional linked history items” Based on the following rational data model: 14.12.200813© Raphael Gfeller, RaphaelGfeller@sunrise.ch
14
History \ OLTP \ Duplication Method “Duplication” Focused on Fast access of historical data Easy implementation Data integrity IDNameSalaryID_ChangeID_Company 1Gfeller Raphael100012 1Gfeller Raphael250021 2Hans Meier50011 2Hans Meier55021 3Fritz Müller175022 IDDateTime 12008-12-2 22008-12-3 IDNameID_Change 1UBS1 2HSR1 1UBS2 2HSR2 14.12.200814© Raphael Gfeller, RaphaelGfeller@sunrise.ch
15
History \ OLTP \ Transaction Method “Transaction” Focused on Less used data storage Precious information about history at every point on time IDNameSalaryID_Company 1Gfeller Raphael25001 2Hans Meier5001 3Fritz Müller17502 IDDateTimeEntry IDNewStringValueAction 22008-12-21Gfeller RaphaelPerson.SetName 12008-12-11Gfeller Raphal, 2500, 1Person.Create IDName 2HSR 1UBS 14.12.200815© Raphael Gfeller, RaphaelGfeller@sunrise.ch
16
History \ OLTP \ Linked history items Method “Linked history items” Focused on Avoid huge changes to the underlying database Easy to implement Fast insertion of new entries IDNameSalaryID_CompanyFK_Old_ID 1Gfeller Raphael 250015 2Hans Meier5001NULL 3Gfeller Raphael 15001NULL 4Gfeller Raphael 100023 5Fritz Müller17502NULL 14.12.200816© Raphael Gfeller, RaphaelGfeller@sunrise.ch
17
History \ OLTP \ Bidirectional Linked history items Method “Bidirectional Linked history items” Focused on Avoid huge changes to the underlying database Extendibility by adding additional metadata to the separated table Fast insertion of new entries Providing additional backward and forward navigation IDNameSalaryID_Company 1Gfeller Raphael 25001 2Hans Meier5001 3Gfeller Raphael 15001 4Gfeller Raphael 10002 5Fritz Müller17502 Old PersonNew PersonDateTimeUser 342008-12-3User 1 412008-12-4User 2 14.12.200817© Raphael Gfeller, RaphaelGfeller@sunrise.ch
18
History \ OLTP \ Analyzed criteria’s Analyzing criteria's Insert an entry Updating an entry Storage cost Get an entry at (Time – 1) Get en entry at (Time – n) Entry at time x Get an integrity state over all entries Get the next entry by a entry at the past Get the previous entry by a entry at the past A person by a company at the past 14.12.200818© Raphael Gfeller, RaphaelGfeller@sunrise.ch
19
History \ OLTP \ Theory 14.12.200819© Raphael Gfeller, RaphaelGfeller@sunrise.ch
20
History \ OLTP \ Test Test environment CPU: Intel Core 2, 2Ghz Memory: 2 Gb Operating System: Windows XP, Sp3, Database: Microsoft SQL Server 2005, Express Edition with SP1, Benchmark written in C# Benchmark input Count inserted companies Count inserted persons Count companies to change Count persons to change Benchmark steps 0.Insert companies 1.Insert persons 2.Change companies 3.Change persons 4.Find a person by its parent person 5.Collect all persons and companies that are valid at a specific time. 6.Find a person in the past by a datetime value 7.Find a person by a company by a datetime value. 14.12.200820© Raphael Gfeller, RaphaelGfeller@sunrise.ch
21
History \ OLTP \ Results The measurements confirm the theory Possible optimizations Method Change Set based on Duplication – Only changed entries are duplicated acceptable overhead in reading fewer data storage used Method “Transaction with anchors “ – Using “anchors transaction”, they resave the entire state of the entries Fewer network traffic Restoring an entry becomes linear O(maxChangesBetweenTwoAnchors) instead of O(nChanges) 14.12.200821© Raphael Gfeller, RaphaelGfeller@sunrise.ch
22
History \ OLTP \ Result \ Conclusion Advices based on these tests If storage is limited use the methods in the following order: – transaction mechanism – Linked history items – Bidirectional linked history items – Transaction with anchors – Change Set based on Duplication – Duplication If network bandwidth is limited, use the methods in the following order: – Change Set based on Duplication – Duplication – Linked history items – Bidirectional Linked history items – Transaction with anchors – transaction mechanism If the knowledge of the developers is low use either method duplication or Linked history items 14.12.200822© Raphael Gfeller, RaphaelGfeller@sunrise.ch
23
History \ OLTP \ Result \ Conclusion Advices based on these tests If data volume is high, use the methods in the following order: – transaction mechanism – Linked history items – Bidirectional linked history items – Transaction with anchors – Change Set based on Duplication – Duplication If change frequency of the data is high, use the methods in the following order: – Transaction with anchors – transaction mechanism – Linked history items – Bidirectional linked history items – Change Set based on Duplication – Duplication 14.12.200823© Raphael Gfeller, RaphaelGfeller@sunrise.ch
24
History \ OLTP \ Result \ Conclusion Practical example of the usage of the methods 14.12.200824© Raphael Gfeller, RaphaelGfeller@sunrise.ch
25
Questions? 14.12.200825© Raphael Gfeller, RaphaelGfeller@sunrise.ch
26
References Gfeller Raphael, How to effectively store the history of data in a ration DBMS, [Online] 2.12.2008. [Citied 2.12.2009] http://wiki.hsr.ch/Datenbanken/wiki.cgi?HistorieUndZeitInDatenbanken. http://wiki.hsr.ch/Datenbanken/wiki.cgi?HistorieUndZeitInDatenbanken 14.12.200826© Raphael Gfeller, RaphaelGfeller@sunrise.ch
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.