Blog: http://Lloyd.TheAlbins.com/ViewingNewRecords Slides: http://Lloyd.TheAlbins.com/ViewingNewRecordsSlides.

Slides:



Advertisements
Similar presentations
Debugging ACL Scripts.
Advertisements

What is a Database By: Cristian Dubon.
By: Lloyd Albin 11/6/2012. Serials are really integers that have a sequence attached to provide the capability to have a auto incrementing integer. There.
XP New Perspectives on Microsoft Office Access 2003, Second Edition- Tutorial 2 1 Microsoft Office Access 2003 Tutorial 2 – Creating And Maintaining A.
A Guide to SQL, Seventh Edition. Objectives Understand the concepts and terminology associated with relational databases Create and run SQL commands in.
ASP.NET Programming with C# and SQL Server First Edition Chapter 8 Manipulating SQL Server Databases with ASP.NET.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Phil Brewster  One of the first steps – identify the proper data types  Decide how data (in columns) should be stored and used.
A Guide to SQL, Eighth Edition Chapter Three Creating Tables.
Session 5: Working with MySQL iNET Academy Open Source Web Development.
How a little code can help with support.. Chris Barba – Developer at Cimarex Energy Blog:
Chapter 7 PHP Interacts with Ms. Access (Open DataBase Connectivity (ODBC))
UNIT TESTING FOR SQL Prepared for SUGSA CodeLabs Alain King Paul Johnson.
Forensic Audit Logging for PostgreSQL
Microsoft Access Get a green book. Page AC 2 Define Access Define database.
LiveCycle Data Services Introduction Part 2. Part 2? This is the second in our series on LiveCycle Data Services. If you missed our first presentation,
PHP meets MySQL.
INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014.
PostgreSQL and relational databases As well as assignment 4…
1 Working with MS SQL Server Textbook Chapter 14.
XP 1 Microsoft Access 2003 Introduction To Microsoft Access 2003.
U:/msu/course/cse/103 Day 06, Slide 1 CSE students: Do not log in yet. Review Day 6 in your textbook. Think about.
The Design of POSTGRES Storage System Author: M. Stonebraker Speaker: Abhishek Shrivastava.
Diagnostic Pathfinder for Instructors. Diagnostic Pathfinder Local File vs. Database Normal operations Expert operations Admin operations.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
A337 - Reed Smith1 Structure What is a database? –Table of information Rows are referred to as records Columns are referred to as fields Record identifier.
Gold – Crystal Reports Introductory Course Cortex User Group Meeting New Orleans – 2011.
IMS 4212: Constraints & Triggers 1 Dr. Lawrence West, Management Dept., University of Central Florida Stored Procedures in SQL Server.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
1 Section 9 - Views, etc. u Part 1: Views u Part 2:Security Issues u Part 3:Transaction Management u Part 4:Set Operations u Part 5:Triggers and Stored.
Understanding Core Database Concepts Lesson 1. Objectives.
1 Terminal Management System Usage Overview Document Version 1.1.
Partitioning & Creating Hardware Tablespaces for Performance
ASP.NET Programming with C# and SQL Server First Edition
Welcome POS Synchronize Concept 08 Sept 2015.
Microsoft Office Access 2010 Lab 1
Y.-H. Chen International College Ming-Chuan University Fall, 2004
Database Normalization
Prepared for Prof. JAI NAVLAKHA By Hsin-Yu Ha
Database application MySQL Database and PhpMyAdmin
GO! with Microsoft Access 2016
Office 2007 and the IDC Templates
Error Handling Summary of the next few pages: Error Handling Cursors.
COS 346 Day 8.
Chapter 18: Modifying SAS Data Sets and Tracking Changes
James Blankenship March , 2018
MODULE 7 Microsoft Access 2010
COP5725 DATABASE MANAGEMENT POSTGRESQL TUTORIAL
SSI Toolbox Status Workbook Overview
Please use speaker notes for additional information!
Advanced SQL: Views & Triggers
Developing a Model-View-Controller Component for Joomla Part 3
CIS16 Application Programming with Visual Basic
Microsoft Office Access 2003
Computer Science Projects Database Theory / Prototypes
Please use speaker notes for additional information!
Creating and Managing Database Tables
COP4710 Database Management Connect to PostgreSQL sever via pgAdmin
Relational Database Design
Chapter 8 Advanced SQL.
Data Structures & Algorithms
Viewing data at the intersection between roles
Understanding Core Database Concepts
-Transactions in SQL -Constraints and Triggers
New Perspectives on Microsoft
Responding to Data Manipulation Via Triggers
Presentation transcript:

Blog: http://Lloyd.TheAlbins.com/ViewingNewRecords Slides: http://Lloyd.TheAlbins.com/ViewingNewRecordsSlides

Viewing New Records By Lloyd Albin

In this presentation we will cover: What you thought you knew about copying new records from a table with a serial id is all wrong once you have multiple writes happening to your database inside multiple transactions. We ran across into a situation with a MS SQL where we missed getting some records from a table that is appended only. There is a serial primary key and we would grab any new serials since the last grab of data. We found that we missed some records due to the inserts being performed by more than one thread. So developers asked how could we prevent this problem with PostgreSQL. This presentation is my response to our developers on what caused this problem and how to avoid this problem in PostgreSQL. © Fred Hutchinson Cancer Research Center

Creating the Problem

Normal aka Single Threaded Inserts Here we demo a single threaded or non- transactional inserts. This is exactly what most developers expect. CREATE TABLE public.testing ( id SERIAL, val TEXT, PRIMARY KEY(id) ) WITH (oids = true); INSERT INTO testing (val) VALUES ('test'); INSERT INTO testing (val) VALUES ('testing'); CREATE TABLE test_copy AS SELECT * FROM testing; INSERT INTO testing (val) VALUES ('tested'); INSERT INTO test_copy AS SELECT * FROM testing WHERE id > (SELECT max(id) FROM test_copy); SELECT * FROM testing; SELECT * FROM test_copy; id val 1 test 2 testing 3 tested public.testing id val 1 test 2 testing 3 tested public.test_copy © Fred Hutchinson Cancer Research Center

Multi-Threaded Inserts Here we demo a multi-threaded or multi transactional inserts. The results are not what most developers expect. Lets create our table. Then insert some data via two transactions. Then make a copy of the data. Insert some more data and finish our transaction. Then copy the rest of the data. -- Thread 3 CREATE TABLE public.testing ( id SERIAL, val TEXT, PRIMARY KEY(id) ) WITH (oids = true); -- Thread 1 BEGIN; INSERT INTO testing (val) VALUES ('test’); -- Thread 2 INSERT INTO testing (val) VALUES ('testing’); COMMIT; CREATE TABLE test_copy AS SELECT * FROM testing; INSERT INTO testing (val) VALUES ('tested’); INSERT INTO test_copy AS SELECT * FROM testing WHERE id > (SELECT max(id) FROM test_copy); © Fred Hutchinson Cancer Research Center

Multi-Threaded Inserts When we compare the two tables. We can see, that the two tables are not the same. This is because we could not view record 1 at the time we copied record 2 due to the transaction not having yet been completed. -- Thread 3 SELECT * FROM testing; SELECT * FROM test_copy; id val 1 test 2 testing 3 tested public.testing id val 2 testing 3 tested public.test_copy © Fred Hutchinson Cancer Research Center

Looking for the Solution This will go through what I did to find a solution.

System Columns Let's take a look at the system columns to see if they can help. If you turned oid on, then the oid will have the same issue as our serial id and transaction id but will just reach the problem faster as it is used across many tables. The tableoid may be joined to pg_class.oid to find the table name and schema oid. So this is also of no help to us. The xmin is our transaction id. We can't use the transaction id because record 1, "293897773", is numerically before record 2 "293897776". -- Thread 3 SELECT oid, tableoid, xmin, cmin, xmax, cmax, ctid, * FROM public.testing; System Columns https://www.postgresql.org/docs/current/ddl-system-columns.html oid tableoid xmin cmin xmax cmax ctid id val 494969335 494969326 293897773 (0,1) 1 test 494969336 293897776 (0,2) 2 testing 494969337 (0,3) 3 tested public.testing © Fred Hutchinson Cancer Research Center

System Columns The cmin lets us know the data line within the transaction, but that is no help either. xmax and cmax is our deleting transaction and deleting data row respectively and no help to us. The ctid is our unique record indicator, but it is in the format (page, line on page) and since record 1 was written before record 2, this is also no help to us. This means that there is nothing in the system columns that will help us figure out that we need to grab record 1. -- Thread 3 SELECT oid, tableoid, xmin, cmin, xmax, cmax, ctid, * FROM public.testing; System Columns https://www.postgresql.org/docs/current/ddl-system-columns.html oid tableoid xmin cmin xmax cmax ctid id val 494969335 494969326 293897773 (0,1) 1 test 494969336 293897776 (0,2) 2 testing 494969337 (0,3) 3 tested public.testing © Fred Hutchinson Cancer Research Center

Page Item Attributes This requires the pageinspect extension to be installed and my heap_page_item_attrs_details function that can be obtained from my github page or the sql file from the slide download page. -- Thread 3 SELECT * FROM public.heap_page_item_attrs_details('public.testing'); p lp lp_off lp_flags lp_len t_xmin t_xmax t_field3 t_ctid heap_hasnull heap_hasvarwidth heap_hasexternal heap_hasoid 1 8152 41 293897773 (0,1) False True 2 8112 44 293897776 (0,2) 3 8072 43 (0,3) public.heap_page_item_attrs_details heap_xmax_keyshr_lock heap_combocid heap_xmax_excl_lock heap_xmax_lock_only heap_xmax_shr_lock heap_lock_mask False public.heap_page_item_attrs_details This contains the heap_page_item_attrs_details code. https://github.com/LloydAlbin/SCHARP-PG-DBA-Debugging-Tools Pageinspect Extension https://www.postgresql.org/docs/current/pageinspect.html © Fred Hutchinson Cancer Research Center

Page Item Attributes This requires the pageinspect extension to be installed and my heap_page_item_attrs_details function that can be obtained from my github page or the sql file from the slide download page. -- Thread 3 SELECT * FROM public.heap_page_item_attrs_details('public.testing'); heap_xmin_committed heap_xmin_invalid heap_xmax_committed heap_xmax_invalid heap_xmin_frozen heap_xmax_is_multi heap_updated True False public.heap_page_item_attrs_details heap_moved_off heap_moved_in heap_moved heap_xact_mask heap_natts_mask heap_keys_updated heap_hot_updated False True 2 public.heap_page_item_attrs_details © Fred Hutchinson Cancer Research Center

Page Item Attributes This requires the pageinspect extension to be installed and my heap_page_item_attrs_details function that can be obtained from my github page or the sql file from the slide download page. While this is a lot of great information for debugging issue with bloat, etc., none of it is actually useful in this case. -- Thread 3 SELECT * FROM public.heap_page_item_attrs_details('public.testing'); heap_only_tuple heap2_xact_mask t_hoff t_bits t_oid t_attrs False 32 Null 494969335 {"\x01000000","\x74657374"} 494969336 {"\x02000000","\x74657374696e67"} 494969337 {"\x03000000","\x746573746564"} public.heap_page_item_attrs_details © Fred Hutchinson Cancer Research Center

Transaction Commit Timestamps Lets take a look at our transaction commit timestamps. Ops, we don't have that turned on in our config. This means that we need to edit the postgresql.conf and change "track_commit_timestamp" from off to on. Once you have done this, you will need to restart postgres. For any new transaction, there will now be a timestamp for each commit. There is a disk space price to pay for this feature, but for most people, this is a small price to pay. SELECT pg_catalog.pg_xact_commit_timestamp(xmin), * FROM public.testing; -- ERROR: could not get commit timestamp data -- HINT: Make sure the configuration parameter "track_commit_timestamp" is set. ALTER SYSTEM SET track_commit_timestamp TO 'on'; -- Now restart postgres DROP FUNCTION public.heap_page_item_attrs_details(table_name regclass); DROP EXTENSION pageinspect; DROP TABLE public.test_copy; DROP TABLE public.testing; -- Go to the slide “Multi-Threaded Inserts” and start over on the with the testing. track_commit_timestamp https://www.postgresql.org/docs/current/runtime-config-replication.html pg_catalog.pg_xact_commit_timestamp(xid) https://www.postgresql.org/docs/current/functions-info.html © Fred Hutchinson Cancer Research Center

Transaction Commit Timestamps Now, we should get results that look like this. Since the pg_xact_commit_timestamp for Record 2 is before the pg_xact_commit_timestamp of Record 1, this means a solution is possible. Lets reset our tests, starting at the slide “Multi- Threaded Inserts”, and then test our solution. SELECT pg_catalog.pg_xact_commit_timestamp(xmin), * FROM public.testing; -- Reset the test DROP TABLE public.test_copy; TRUNCATE TABLE public.testing; pg_xact_commit_timestamp id val 2019-01-07 13:22:59.257975-08 1 test 2019-01-07 13:21:55.770918-08 2 testing 3 tested public.testing pg_catalog.pg_xact_commit_timestamp(xid) https://www.postgresql.org/docs/current/functions-info.html © Fred Hutchinson Cancer Research Center

Solution #1 Transaction Commit Timestamps

Solution #1 - Transaction Commit Timestamps Let’s run our test again. This time we will create two tables. One that is our copy of the new records and one that lets us know what to grab. -- Thread 1 BEGIN; INSERT INTO testing (val) VALUES ('test’); -- Thread 2 INSERT INTO testing (val) VALUES ('testing’); COMMIT; -- Thread 3 -- Prepping for our copy CREATE TABLE test_copy (LIKE public.testing); CREATE TABLE test_copy_last_record ( t_type TEXT, t_time TIMESTAMP WITH TIME ZONE, PRIMARY KEY(t_type) ); INSERT INTO test_copy_last_record VALUES ('next', NULL), ('last', NULL); © Fred Hutchinson Cancer Research Center

Solution #1 - Transaction Commit Timestamps First we get the latest transaction’s timestamp and store it as the next transaction we want. Now we do our data copy. Reading from the last timestamp to the next timestamp. The IS NULL is to catch the first use case. Then we need to update the last timestamp from next timestamp. Note: Because pg_xact_commit_timestamp is a stable function, this means that any time you call it with the same transaction id within a transaction, it will return the same result without having to re-compute it. This means that is you have have many rows inserted by a single transaction, it will only have to lookup the value once. -- Thread 3 -- Grabbing the latest transaction timestamp INSERT INTO test_copy_last_record (t_type, t_time) SELECT 'next', max(pg_catalog.pg_xact_commit_timestamp(xmin)) FROM public.testing ON CONFLICT (t_type) DO UPDATE SET t_time = EXCLUDED.t_time; -- Insert the new data after last timestamp up to and including next timestamp INSERT INTO test_copy SELECT * FROM public.testing WHERE pg_catalog.pg_xact_commit_timestamp(xmin) <= ( SELECT t_time FROM test_copy_last_record WHERE t_type = 'next’) AND ( pg_catalog.pg_xact_commit_timestamp(xmin) > ( SELECT t_time FROM test_copy_last_record WHERE t_type = 'last’) OR ( WHERE t_type = 'last') IS NULL); -- Update the last timestamp SELECT 'last', t_time FROM public.test_copy_last_record WHERE t_type = 'next’ pg_catalog.pg_xact_commit_timestamp(xid) https://www.postgresql.org/docs/current/functions-info.html ON CONFLICT DO UPDATE https://www.postgresql.org/docs/11/sql-insert.html © Fred Hutchinson Cancer Research Center

Solution #1 - Transaction Commit Timestamps Now we finish our longer running transaction. Now we do the update routines again. EXCLUDED is a special table name that references the values originally proposed for the insert. -- Thread 1 INSERT INTO testing (val) VALUES ('tested’); COMMIT; -- Thread 3 -- Grabbing the latest transaction timestamp INSERT INTO test_copy_last_record (t_type, t_time) SELECT 'next', max(pg_catalog.pg_xact_commit_timestamp(xmin)) FROM public.testing ON CONFLICT (t_type) DO UPDATE SET t_time = EXCLUDED.t_time; -- Insert the new data after last timestamp up to and including next timestamp INSERT INTO test_copy SELECT * FROM public.testing WHERE pg_catalog.pg_xact_commit_timestamp(xmin) <= ( SELECT t_time FROM test_copy_last_record WHERE t_type = 'next’) AND ( pg_catalog.pg_xact_commit_timestamp(xmin) > ( SELECT t_time FROM test_copy_last_record WHERE t_type = 'last’) OR ( WHERE t_type = 'last') IS NULL); -- Update the last timestamp SELECT 'last', t_time FROM public.test_copy_last_record WHERE t_type = 'next’ © Fred Hutchinson Cancer Research Center

Solution #1 - Transaction Commit Timestamps Now we finish our longer running transaction. Now we do the update routines again. -- Thread 3 SELECT * FROM test_copy; id val 1 test 2 testing 3 tested public.test_copy © Fred Hutchinson Cancer Research Center

Solution #1 - Transaction Commit Timestamps Here we can see the original code and an alternate version that will be faster running. -- Thread 3 -- Original code -- Grabbing the latest transaction timestamp on table INSERT INTO test_copy_last_record (t_type, t_time) SELECT 'next', max(pg_catalog.pg_xact_commit_timestamp(xmin)) FROM public.testing ON CONFLICT (t_type) DO UPDATE SET t_time = EXCLUDED.t_time; -- Alternate faster code -- Grabbing the latest transaction timestamp SELECT 'next', "timestamp" FROM pg_catalog. pg_last_committed_xact() pg_catalog.pg_xact_commit_timestamp(xid) pg_catalog. pg_last_committed_xact() https://www.postgresql.org/docs/current/functions-info.html © Fred Hutchinson Cancer Research Center

Solution #2 Universal SQL Solution

Solution #2 - Universal SQL Solution Lets reset our test again and then insert our first two records. -- Thread 3 DROP TABLE public.test_copy; TRUNCATE TABLE public.testing; -- Thread 1 BEGIN; INSERT INTO testing (val) VALUES ('test’); -- Thread 2 INSERT INTO testing (val) VALUES ('testing’); COMMIT; © Fred Hutchinson Cancer Research Center

Solution #2 - Universal SQL Solution This time we will create a table to store all our copied ids. Add records to test_copy that do not appear in copied_ids. Update copied_ids with new ids from test_copy. The Primary Key could instead be a Unique Key. This is especially important if you are using a composite key and one or more of the fields can be null. -- Thread 3 BEGIN; -- Setup our two tables CREATE TABLE public.copied_ids ( id INTEGER, CONSTRAINT copied_ids_idx PRIMARY KEY(id) ); CREATE TABLE test_copy (LIKE public.testing); -- Add new records INSERT INTO test_copy SELECT testing.* FROM public.testing LEFT JOIN copied_ids ON testing.id = copied_ids.id WHERE copied_ids.id IS NULL; -- Add new ids that were copied, so that we don't copy them again. INSERT INTO copied_ids SELECT test_copy.id FROM public.test_copy COMMIT; CREATE TABLE LIKE https://www.postgresql.org/docs/current/sql-createtable.html © Fred Hutchinson Cancer Research Center

Solution #2 - Universal SQL Solution Now we can copy our last record and then copy the new records. Now we can view our test_copy table to make sure it has the correct data. -- Thread 1 INSERT INTO testing (val) VALUES ('tested’); COMMIT; -- Thread 3 -- Add new records INSERT INTO test_copy SELECT testing.* FROM public.testing LEFT JOIN copied_ids ON testing.id = copied_ids.id WHERE copied_ids.id IS NULL; -- Add new ids that were copied, so that we don't copy them again. INSERT INTO copied_ids SELECT test_copy.id FROM public.test_copy SELECT * FROM test_copy; id val 1 test 2 testing 3 tested public.test_copy © Fred Hutchinson Cancer Research Center

Solution #2 - Universal SQL Solution While the previous code was universally usable, there are some more PostgreSQL specific version. Version 1 Using USING instead of ON Version 2 Using EXCEPT instead of LEFT JOIN -- Thread 3 -- Version 1 -- Add new records INSERT INTO test_copy SELECT testing.* FROM public.testing LEFT JOIN copied_ids USING(id) WHERE copied_ids.id IS NULL; -- Add new ids that were copied, so that we don't copy them again. INSERT INTO copied_ids SELECT test_copy.id FROM public.test_copy LEFT JOIN copied_ids USING (id) -- Version 2 SELECT testing.* FROM ( SELECT id FROM public.testing EXCEPT SELECT id FROM public.copied_ids ) AS new_ids LEFT JOIN public.testing USING (id); © Fred Hutchinson Cancer Research Center

MS SQL MS SQL Solution

MS SQL – MS SQL Solution Use one of the above tables to view the commit_ts which is assigned when the transaction commits and can be used to tell what order the transactions were committed. The xdes_id will be your transaction id that you will need to match up to transaction that committed each record. When I tested this view today, they were empty, but this is where Microsoft says the information should be. May need to turn on a feature that I have not turned on. -- SQL Server (starting with 2008) -- Azure SQL Database SELECT * FROM sys.dm_tran_commit_table; -- Azure SQL Data Warehouse -- Parallel Data Warehouse SELECT * FROM sys.dm_pdw_nodes_tran_commit_table; sys.dm_tran_commit_table https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/change-tracking-sys-dm-tran-commit-table?view=sql-server-2017 sys.dm_pdw_nodes_tran_commit_table https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-reference-tsql-system-views commit_ts xdes_id commit_lbn commit_csn commit_time pdw_node_id sys.dm_tran_commit_table © Fred Hutchinson Cancer Research Center