Download presentation
Presentation is loading. Please wait.
1
SQL Server In-Memory Internals
Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
2
About Me Hamid J. Fard I am SQL Server Data Platform Expert with more than 9 years’ of professional experience, I am currently Microsoft Certified Master: SQL Server 2008, Microsoft Certified Solutions Master: Charter-Data Platform, Microsoft Data Platform MVP and CIW Database Design Specialist. After a few years of being a production database administrator I jumped into the role of Data Platform Expert. Being a consultant allows me to work directly with customers to help solve questions regarding database issues for SQL Server. Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
3
Agenda What is In-Memory OLTP? In-Memory OLTP Components.
Memory-Optimized Table Requirements. Memory Optimized Tables. Memory Optimized Indexes. In-Memory Query Processing. In-Memory Transactions. In-Memory Memory Estimation. In-Memory Filegroup Configuration. In-Memory Data and Delta File Population. Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
4
What is In-Memory OLTP? Benefits: Eliminate contention.
Reduce logging. Lower latency data retrieval. Minimize code execution time. Efficient data retrieval. Optional IO reduction or removal, when using non-durable tables In-Memory Table Implementation Scenarios: High data insertion rate from multiple concurrent connections. Read performance and scale with periodic batch inserts and updates. Intensive business logic processing in the database server. Low latency. Session state management. Disk Based Table Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
5
In-Memory OLTP Components
Performance Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
6
Memory-Optimized Table Requirements.
X64 Architecture Processor <60 Processor Cores Enough Memory (x2 Table Size) Enough Storage (x2 Table Size) Processor Needs to Support cmpxchg16b X64 Architecture Edition Enterprise or Developer Edition Hyper-V to Support cmpxchg16b (If needed) Enable Instant File Initialization Hardware Software 64-bit Enterprise, Developer, or Evaluation edition of SQL Server 2014. •SQL Server needs enough memory to hold the data in memory-optimized tables and indexes. To account for row versions, you should provide an amount of memory that is two times the expected size of memory-optimized tables and indexes. But the actual amount of memory needed will depend on your workload. You should monitor your memory usage and make adjustments as needed. The size of data in memory-optimized tables must not exceed the allowed percentage of the pool. To discover the size of a memory-optimized table, see sys.dm_db_xtp_table_memory_stats (Transact-SQL). If you have disk-based tables in the database, you need to provide enough memory for the buffer pool and query processing on those tables. It is important to know how much memory your In-Memory OLTP application will require. See Estimate Memory Requirements for Memory-Optimized Tables for more information. •Free disk space for that is two times the size of your durable memory-optimized tables. •A processor needs to support the instruction cmpxchg16b to use In-Memory OLTP. All modern 64-bit processors support cmpxchg16b. If you are using a VM host application and SQL Server displays an error caused by an older processor, see if the application has a configuration option to allow cmpxchg16b. If not, you could use Hyper-V, which supports cmpxchg16b without needing to modify a configuration option. •To install In-Memory OLTP, select Database Engine Services when you install SQL Server 2014. To install report generation (Determining if a Table or Stored Procedure Should Be Ported to In-Memory OLTP) and SQL Server Management Studio (to manage In-Memory OLTP via SQL Server Management Studio Object Explorer), select Management Tools—Basic or Management Tools—Advanced when you install SQL Server 2014. Important Notes on Using In-Memory OLTP •The total in-memory size of all durable tables in a database should not exceed 250 GB. For more information, see Durability for Memory-Optimized Tables. •This release of In-Memory OLTP is targeted to perform optimally on systems with 2 or 4 sockets and fewer than 60 cores. •Checkpoint files must not be manually deleted. SQL Server automatically performs garbage collection on unneeded checkpoint files. For more information, see the discussion on merging data and delta files in Durability for Memory-Optimized Tables. •In this first release of In-Memory OLTP (in SQL Server 2014), the only way to remove a memory-optimized filegroup is to drop the database. •If you attempt to delete a large batch of rows while there is a concurrent insert or update workload affecting the range of rows you are trying to delete, the delete will likely fail. The workaround is to stop the insert or update workload before doing the delete. Alternatively, you could configure the transaction Into smaller transactions, which would be less likely to be disrupted by a concurrent workload. As with all write operations on memory-optimized tables, use retry logic (Guidelines for Retry Logic for Transactions on Memory-Optimized Tables). •If you create one or more databases with memory-optimized tables, you should enable Instant File Initialization (grant the SQL Server service startup account the SE_MANAGE_VOLUME_NAME user right) for the SQL Server instance. Without Instant File Initialization, memory-optimized storage files (data and delta files) will be initialized upon creation, which can have negative impact on the performance of your workload. For more information about Instant File Initialization, see Database File Initialization. For information on how to enable Instant File Initialization, see How and Why to Enable Instant File Initialization. Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
7
Memory Optimized Tables.
SQL Server CLR In-Memory Table The table has three rows: r1, r2, and r3. r1 has three versions, r2 has two versions, and r3 has four versions. Note that different versions of the same row do not necessarily occupy consecutive memory locations. The different row versions can be dispersed throughout the table data structure. The memory-optimized table data structure can be seen as a collection of row versions. Rows in disk-based tables are organized in pages and extents, and individual rows addressed using page number and page offset, row versions in memory-optimized tables are addressed using 8-byte memory pointers. Durability Memory-optimized tables are fully durable by default, and, like transactions on (traditional) disk-based tables, fully durable transactions on memory-optimized tables are fully atomic, consistent, isolated, and durable (ACID). Memory-optimized tables and natively compiled stored procedures support a subset of Transact-SQL. In-Memory OLTP supports durable tables with transaction durability delayed. Delayed durable transactions are saved to disk soon after the transaction has committed. In exchange for the increased performance, committed transactions that have not saved to disk are lost in a server crash or failover. Besides the default durable memory-optimized tables, SQL Server also supports non-durable memory-optimized tables, which are not logged and their data is not persisted on disk. This means that transactions on these tables do not require any disk IO, but the data will not be recovered if there is a server crash or failover. You cannot access a memory-optimized table or natively compiled stored procedure from the context connection (the connection from SQL Server when executing a CLR module). You can, however, create and open another connection from which you can access memory-optimized tables and natively compiled stored procedures. Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
8
Memory Optimized Indexes.
Index Count A memory-optimized table can have up to 8 indexes, including the index created with the primary key. Regarding the number of indexes created on a memory-optimized table, consider the following: •Specify the indexes you need when you create the table. You cannot create an index for a memory-optimized table after the table is created. If you want to add an index to a memory-optimized table, drop and recreate that table. •Do not create an index that you rarely use: Garbage collection works best if all indexes on the table are frequently used. Rarely-used indexes may cause the garbage collection system to not perform optimally for old row versions. Determining Which Indexes to Use for a Memory-Optimized Table Each memory-optimized table must have at least one index. Note that each PRIMARY KEY constraint implicitly creates an index. Therefore, if a table has a primary key, it has an index. A primary key is a requirement for a durable memory-optimized table. When querying a memory-optimized table, hash indexes perform better when the predicate clause contains only equality predicates. The predicate must include all columns in the hash index key. A hash index will revert to a scan given an inequality predicate. A column in a memory-optimized table can be part of both a hash index and a nonclustered index. When querying a memory-optimized table with inequality predicates, nonclustered indexes will perform better than nonclustered hash indexes. The hash index requires a key (to hash) to seek Into the index. If an index key consists of two columns and you only provide the first column, SQL Server does not have a complete key to hash. This will result in an index scan query plan. Usage determines which columns should be indexed. When a column in a nonclustered index has the same value in many rows (index key columns have a lot of duplicate values), performance can degrade for updates, inserts, and deletions. One way to improve performance in this situation is to add another column to the nonclustered index. The hashing function used for hash indexes has the following Characteristics: •SQL Server has one hash function that is used for all hash indexes. •The hash function is deterministic. The same index key is always mapped to the same bucket in the hash index. •Multiple index keys may be mapped to the same hash bucket. •The hash function is balanced, meaning that the distribution of index key values over hash buckets typically follows a Poisson distribution. Poisson distribution is not an even distribution. Index key values are not evenly distributed in the hash buckets. For example, a Poisson distribution of n distinct index keys over n hash buckets results in approximately one third empty buckets, one third of the buckets containing one index key, and the other third containing two index keys. A small number of buckets will contain more than two keys. If two index keys are mapped to the same hash bucket, there is a hash collision. A large number of hash collisions can have a performance impact on read operations. Made for point lookups Made for range scans and ordered scans Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
9
In-Memory Query Processing
SQL Server Query Processing for Disk-Based Tables Disk Based Query Plan Query plan for join of disk-based tables. About this query plan: •The rows from the Customer table are retrieved from the clustered index, which is the primary data structure and has the full table data. •Data from the Order table is retrieved using the non-clustered index on the CustomerID column. This index contains both the CustomerID column, which is used for the join, and the primary key column OrderID, which is returned to the user. Returning additional columns from the Order table would require lookups in the clustered index for the Order table. •The logical operator Inner Join is implemented by the physical operator Merge Join. The other physical join types are Nested Loops and Hash Join. The Merge Join operator takes advantage of the fact that both indexes are sorted on the join column CustomerID. SQL Server Query Processing for Disk-Based Tables SQL Server query processing pipeline. In this scenario: 1.The user issues a query. 2.The parser and algebrizer construct a query tree with logical operators based on the Transact-SQL text submitted by the user. 3.The optimizer creates an optimized query plan containing physical operators (for example, nested-loops join). After optimization, the plan may be stored in the plan cache. This step is bypassed if the plan cache already contains a plan for this query. 4.The query execution engine processes an Interpretation of the query plan. 5.For each index seek, index scan, and table scan operator, the execution engine requests rows from the respective index and table structures from Access Methods. 6.Access Methods retrieves the rows from the index and data pages in the buffer pool and loads pages from disk Into the buffer pool as needed. Interpreted Transact-SQL Access to Memory-Optimized Tables For the first example query, the execution engine requests rows in the clustered index on Customer and the non-clustered index on Order from Access Methods. Access Methods traverses the B-tree index structures to retrieve the requested rows. In this case all rows are retrieved as the plan calls for full index scans. Transact-SQL ad hoc batches and stored procedures are also referred to as Interpreted Transact-SQL. Interpreted refers to the fact that the query plan is Interpreted by the query execution engine for each operator in the query plan. The execution engine reads the operator and its parameters and performs the operation. Interpreted Transact-SQL can be used to access both memory-optimized and disk-based tables. The following figure illustrates query processing for Interpreted Transact-SQL access to memory-optimized tables: Query processing pipeline for Interpreted tsql. Query processing pipeline for Interpreted Transact-SQL access to memory-optimized tables. As illustrated by the figure, the query processing pipeline remains mostly unchanged: •The parser and algebrizer construct the query tree. •The optimizer creates the execution plan. •The query execution engine Interprets the execution plan. The main difference with the traditional query processing pipeline (figure 2) is that rows for memory-optimized tables are not retrieved from the buffer pool using Access Methods. Instead, rows are retrieved from the in-memory data structures through the In-Memory OLTP engine. Differences in data structures cause the optimizer to pick different plans in some cases, as illustrated by the following example. Query plan for join of memory-optimized tables. Observe the following differences with the plan for the same query on disk-based tables (figure 1): This plan contains a table scan rather than a clustered index scan for the table Customer: The definition of the table does not contain a clustered index. Clustered indexes are not supported with memory-optimized tables. Instead, every memory-optimized table must have at least one nonclustered index and all indexes on memory-optimized tables can efficiently access all columns in the table without having to store them in the index or refer to a clustered index. This plan contains a Hash Match rather than a Merge Join. The indexes on both the Order and the Customer table are hash indexes, and are thus not ordered. A Merge Join would require sort operators that would decrease performance Memory Optimized Query Plan INTerpreted Transact-SQL Access to Memory-Optimized Tables Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
10
In-Memory Query Processing (Cont.)
Native compilation of stored procedures Native compilation of stored procedures. The process is described as, 1.The user issues a CREATE PROCEDURE statement to SQL Server. 2.The parser and algebrizer create the processing flow for the procedure, as well as query trees for the Transact-SQL queries in the stored procedure. 3.The optimizer creates optimized query execution plans for all the queries in the stored procedure. 4.The In-Memory OLTP compiler takes the processing flow with the embedded optimized query plans and generates a DLL that contains the machine code for executing the stored procedure. 5.The generated DLL is loaded Into memory. Invocation of a natively compiled stored procedure is described as follows: 1.The user issues an EXECusp_myproc statement. 2.The parser extracts the name and stored procedure parameters. If the statement was prepared, for example using sp_prep_exec, the parser does not need to extract the procedure name and parameters at execution time. 3.The In-Memory OLTP runtime locates the DLL entry point for the stored procedure. 4.The machine code in the DLL is executed and the results of are returned to the client. Execution of natively compiled stored procedures Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
11
In-Memory Transactions
Lifetime of a transaction that accesses memory-optimized tables Regular Processing During this phase, the user-issued Transact-SQL statements are executed. Rows are read from the tables, and new row versions are written to the database. The transaction is isolated from all other concurrent transactions. The transaction uses the snapshot of the memory-optimized tables that exists at the start of the transaction. Writes to the tables in this phase of the transaction are not yet visible to other transactions, with one exception: row updates and deletes are visible to update and delete operations in other transactions, in order to detect write conflicts. If an update or delete operation sees that a row has been updated or deleted since the logical start of the transaction, the operation will fail with error The message for error is "The current transaction attempted to update a record in table X that has been updated since this transaction started. The transaction was aborted." This error dooms the transaction (even if XACT_ABORT is OFF), meaning that the transaction will be rolled back when the user session ends. Doomed transactions cannot be committed and only support read operations that do not write to the log and do not access memory-optimized tables. Commit Dependencies During regular processing, a transaction can read rows written by other transactions that are in the validation or commit phase, but have not yet committed. The rows are visible because the logical end time of the transactions has been assigned at the start of the validation phase. If a transaction reads such uncommitted rows, it will take a commit dependency on that transaction. This has two main implications: •A transaction cannot commit until the transactions it depends on have committed. In other words, it cannot enter the commit phase, until all dependencies have cleared. •In addition, result sets are not returned to the client until all dependencies have cleared. This prevents the client from observing uncommitted data. If any of the dependent transactions fails to commit, there is a commit dependency failure. This means the transaction will fail to commit with error ("A previous transaction that the current transaction took a dependency on has aborted, and the current transaction can no longer commit."). Validation Phase During the validation phase, the system validates that the assumptions necessary for the requested transaction isolation level were true between the logical start and logical end of the transaction. At the start of the validation phase, the transaction is assigned a logical end time. The row versions written in the database become visible to other transactions at the logical end time. Repeatable Read Validation If the isolation level of the transaction is REPEATABLE READ or SERIALIZABLE, or if tables are accessed under REPEATABLE READ or SERIALIZABLE isolation (for more information, see the section on Isolation of Individual Operations in Transaction Isolation Levels), the system validates that the reads are repeatable. This means it validates that the versions of the rows read by the transaction are still valid row versions at the logical end time of the transaction. If any of the rows have been updated or changed, the transaction fails to commit with error ("The current transaction failed to commit due to a repeatable read validation failure."). This error can also occur if a table is dropped after an insert, update, or delete operation and before the transaction commits. This applies only to insert, update, or delete operations in natively compiled stored procedures. Such write operations performed through Interpreted Transact-SQL cause the DROP TABLE statement to block and wait until the transaction commits. Serializable Validation Serializable validation is performed in two cases: •If the isolation level of the transaction is SERIALIZABLE or tables are accessed under SERIALIZABLE isolation. •If rows are inserted in a unique index, such as the index created for a PRIMARY KEY constraint. The system validates that no rows with the same key have been concurrently inserted. The system validates that no phantom rows have been written to the database. The read operations performed by the transaction are evaluated to determine that no new rows were inserted in the scan ranges of these read operations. Insertion of a key in a unique index includes an implicit read operation, to determine that the key is not a duplicate. Serializable validation for unique indexes ensures these indexes cannot have duplicates in case two transactions concurrently insert the same key. If phantom rows are detected, the transaction fails to commit with error ("The current transaction failed to commit due to a serializable validation failure."). Commit Processing If validation succeeds and all transaction dependencies clear, the transaction enters the commit processing phase. During this phase the changes to durable tables are written to the log, and the log is written to disk, to ensure durability. Once the log record for the transaction has been written to disk, control is returned to the client. All commit dependencies on this transaction are cleared, and all transactions that had been waiting for this transaction to commit can proceed. Limitations •Cross-database transactions are not supported with memory-optimized tables. Every transaction that accesses memory-optimized tables cannot access more than one database, with the exception of read-write access to tempdb and read-only access to the system database master. •Distributed transactions are not supported with memory-optimized tables. Distributed transactions started with BEGIN DISTRIBUTED TRANSACTION cannot access memory-optimized tables. •Memory-optimized tables do not support locking. Explicit locks through locking hints (such as TABLOCK, XLOCK, ROWLOCK) are not supported with memory-optimized tables. Repeatable Read Validation Serialization Validation Commit Dependency Regular Processing Validation Phase Commit Processing Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
12
In-Memory Memory Estimation
CREATE TABLE IMOLTP ( col1 INT NOT NULL PRIMARY KEY NONCLUSTERED, col2 INT NOT NULL INDEX INDEX_1 HASH WITH (BUCKET_COUNT = ), col3 INT NOT NULL INDEX INDEX_2 HASH WITH (BUCKET_COUNT = ), col4 INT NOT NULL INDEX INDEX_3 HASH WITH (BUCKET_COUNT = ), col5 INT NOT NULL INDEX INDEX_4 NONCLUSTERED, col6 CHAR (50) NOT NULL, col7 CHAR (50) NOT NULL, col8 CHAR (30) NOT NULL, col9 CHAR (50) NOT NULL) WITH (MEMORY_OPTIMIZED = ON) Memory for Table: Timestamps + Index Pointers + Row Data => 24 + ( ) + ( ) => 256 x => 1.28GB Memory for Hash Indexes: Round Up Bucket Count x Index Pointer => 223 x 8 => 67,108,864 Bytes => 64MB x 3 => 192MB Memory for the table A memory-optimized table row is comprised of three parts: •Timestamps Row header/timestamps = 24 bytes. •Index pointers For each hash index in the table, each row has an 8-byte address pointer to the next row in the index. Since there are 4 indexes, each row will allocate 32 bytes for index pointers (an 8 byte pointer for each index). •Data The size of the data portion of the row is determined by summing the type size for each data column. In our table we have five 4-byte integers, three 50-byte character columns, and one 30-byte character column. Therefore the data portion of each row is or 200 bytes. The following is a size computation for 5,000,000 (5 million) rows in a memory-optimized table. The total memory used by data rows is estimated as follows: Memory for the table’s rows From the above calculations, the size of each row in the memory-optimized table is , or 256 bytes. Since we have 5 million rows, the table will consume 5,000,000 * 256 bytes, or 1,280,000,000 bytes – approximately 1.28 GB. Setting the hash index array size The hash array size is set by (bucket_count= <value>) where <value> is an integer value greater than zero. If <value> is not a power of 2, the actual bucket_count is rounded up to the next closest power of 2. In our example table, (bucket_count = ), since 5,000,000 is not a power of 2, the actual bucket count rounds up to 8,388,608 (223). You must use this number, not 5,000,000 when calculating memory needed by the hash array. Thus, in our example, the memory needed for each hash array is: 8,388,608 * 8 = 223 * 8 = 223 * 23 = 226 = 67,108,864 or approximately 64 MB. Since we have three hash indexes, the memory needed for the hash indexes is 3 * 64MB = 192MB. Memory for non-clustered indexes Non-clustered indexes are implemented as BTrees with the inner nodes containing the index value and pointers to subsequent nodes. Leaf nodes contain the index value and a pointer to the table row in memory. Unlike hash indexes, non-clustered indexes do not have a fixed bucket size. The index grows and shrinks dynamically with the data. Memory needed by non-clustered indexes can be computed as follows: •Memory allocated to non-leaf nodes For a typical configuration, the memory allocated to non-leaf nodes is a small percentage of the overall memory taken by the index. This is so small it can safely be ignored. •Memory for leaf nodes The leaf nodes have one row for each unique key in the table that points to the data rows with that unique key. If you have multiple rows with the same key (i.e., you have a non-unique non-clustered index), there is only one row in the index leaf node that points to one of the rows with the other rows linked to each other. Thus, the total memory required can be approximated by: memoryForNonClusteredIndex = (pointerSize + sum(keyColumnDataTypeSizes)) * rowsWithUniqueKeys Memory for row versioning To avoid locks, In-Memory OLTP uses optimistic concurrency when updating or deleting rows. This means that when a row is updated, an additional version of the row is created. The system keeps the previous versions available until all transactions that could possibly use the version have finished execution. When a row is deleted, the system acts in a similar way to an update, keeping the version available until it is no longer necessary. Reads and inserts do not create additional row versions. Because there may be a number of additional rows in memory at any time waiting for the garbage collection cycle to release their memory, you must have sufficient memory to accommodate these additional rows. The number of additional rows can be estimated by computing the peak number of row updates and deletions per second, then multiplying that by the number of seconds the longest transaction takes (minimum of 1). That value is then multiplied by the row size to get the number of bytes you need for row versioning. rowVersions = durationOfLongestTransactionInSeconds * peakNumberOfRowUpdatesOrDeletesPerSecond Memory needs for stale rows is then estimated by multiplying the number of stale rows by the size of a memory-optimized table row (see Memory for the table above). memoryForRowVersions = rowVersions * rowSize Memory for growth The above calculations estimate your memory needs for the table as it currently exists. In addition to this memory, you need to estimate the growth of the table and provide sufficient memory to accommodate that growth. For example, if you anticipate 10% growth then you need to multiple the results from above by 1.1 to get the total memory needed for your table. Memory for Non-Clustered Index: (Index Pointer + SUM(Key Column Data Type Size)) x Unique Rows => (8 + 4) x => 57MB Memory for Row Versioning: (Longest Trans. Duration in Sec. x Peak No. of Row Updates & Deletes in Sec.) x Row Size => ( 1 x 10) x 256 => 2560 Bytes => 2560 x (0.3 x ) => 3.66GB Visit the following blog post to get SP_InMemTableSizeEst Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
13
In-Memory Filegroup Configuration
Configuring a Memory-Optimized Filegroup You should consider creating multiple containers in the memory-optimized filegroup and distribute them on different drives to achieve more bandwidth to stream the data into memory. When configuring storage, you must provide free disk space that is four times the size of durable memory-optimized tables. You must also ensure that your IO subsystem supports the required IOPS for your workload. If data and delta file pairs are populated at a given IOPS, you need 3 times that IOPS to account for storing and merge operations. You can add storage capacity and IOPS by adding one or more containers to the memory-optimized filegroup. In a multiple container, multiple drive scenario, data and delta files are allocated in a round-robin fashion into containers. The first data file is allocated from the first container and the delta file is allocated from the next container and this allocation pattern repeats. This allocation scheme distributes data and delta files evenly across containers if you have an odd number of drives, each mapped to one container. However, if you have an even number of drives, each mapped to a container, it can result in imbalanced storage with data files mapped to odd drives and delta files mapped to even drives. To obtain a balanced stream of IO on recovery, consider placing pairs of data and delta files on the same spindles/storage as described in the example below. Example: Consider a memory-optimized filegroup with two containers: container 1 on drive X and container 2 on drives Y. Since the allocation of data and delta files is done in round-robin fashion, container 1 will only have data files and container 2 will only have delta files, which leads to imbalanced persistence for storage as well as input/output operations per second, as data files are significantly larger than the delta files. To distribute data and delta files uniformly across drives X and Y, create four containers instead of two and map the first two containers to drive X and the next two containers to drive Y. With round-robin allocation, the first data and first delta file will be allocated from container-1 and container-2 respectively which are mapped to drive X. Similarly, the next data and delta file will be allocated from container-3 and container-4 which are mapped to drive Y. This allows distributing data and delta files across two drives uniformly. Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
14
In-Memory Data & Delta File Population
Data and Delta Files The data in memory-optimized tables is stored as free-form data rows that are linked through one or more in-memory indexes, in memory. There are no page structures for data rows, such as those used for disk-based tables. When the application is ready to commit the transaction, the In-Memory OLTP generates the log records for the transaction. The persistence of memory-optimized tables is done with a set of data and delta files using a background thread. The data and delta files are located in one or more containers (using the same mechanism used for FILESTREAM data). These containers are mapped to a new type of filegroup, called a memory-optimized filegroup. Data is written to these files in a strictly sequential fashion, which minimizes disk latency for spinning media. You can use multiple containers on different disks to distribute the I/O activity. Data and delta files in multiple containers on different disks will increase recovery performance when data is read from the data and delta files on disk, into memory. An application does not directly access data and delta files. All data reads and writes use in-memory data. The Data File A data file contains rows from one or more memory-optimized tables that were inserted by multiple transactions as part of INSERT or UPDATE operations. For example, one row can be from memory-optimized table T1 and the next row can be from memory-optimized table T2. The rows are appended to the data file in the order of transactions in the transaction log, making data access sequential. This enables an order of magnitude better I/O throughput compared to random I/O. Each data file is sized approximately to 128MB for computers with memory greater than 16GB, and 16MB for computers with less than or equal to 16GB. Once the data file is full, the rows inserted by new transactions are stored in another data file. Over time, the rows from durable memory-optimized tables are stored in one of more data files and each data file containing rows from a disjoint but contiguous range of transactions. For example a data file with transaction commit timestamp in the range of (100, 200) has all the rows inserted by transactions that have commit timestamp greater than 100 and less than or equal to 200. The commit timestamp is a monotonically increasing number assigned to a transaction when it is ready to commit. Each transaction has a unique commit timestamp. When a row is deleted or updated, the row is not removed or changed in-place in the data file but the deleted rows are tracked in another type of file: the delta file. Update operations are processed as a tuple of delete and insert operations for each row. This eliminates random IO on the data file. The Delta File Each data file is paired with a delta file that has the same transaction range and tracks the deleted rows inserted by transactions in the transaction range. This data and delta file is referred to as a Checkpoint File Pair (CFP) and it is the unit of allocation and deallocation as well as the unit for Merge operations. For example, a delta file corresponding to transaction range (100, 200) will store deleted rows that were inserted by transactions in the range (100, 200). Like data files, the delta file is accessed sequentially. When a row is deleted, the row is not removed from the data file but a reference to the row is appended to the delta file associated with the transaction range where this data row was inserted. Since the row to be deleted already exists in the data file, the delta file only stores the reference information {inserting_tx_id, row_id, deleting_tx_id } and it follows the transactional log order of the originating delete or update operations. Populating Data and Delta Files Data and delta file are populated by a background thread called offline checkpoint. This thread reads the transaction log records generated by committed transactions on memory-optimized tables and appends information about the inserted and deleted rows into appropriate data and delta files. Unlike disk-based tables where data/index pages are flushed with random I/O when checkpoint is done, the persistence of memory-optimized table is continuous background operation. Multiple delta files are accessed because a transaction can delete or update any row that was inserted by any previous transaction. Deletion information is always appended at the end of the delta file. For example, a transaction with a commit timestamp of 600 inserts one new row and deletes rows inserted by transactions with a commit timestamp of 150, 250 and 450 as shown in the picture below. All 4 file I/O operations (three for deleted rows and 1 for the newly inserted rows), are append-only operations to the corresponding delta and data files. Accessing Data and Delta Files Data and delta file pairs are accessed when the following occurs. Offline checkpoint thread This thread appends inserts and deletes to memory-optimized data rows, to the corresponding data and delta file pairs. Merge operation The operation merges one or more data and delta file pairs and creates a new data and delta file pair. During crash recovery When SQL Server is restarted or the database is brought back online, the memory-optimized data is populated using the data and delta file pairs. The delta file acts as a filter for the deleted rows when reading the rows from the corresponding data file. Because each data and delta file pair is independent, these files are loaded in parallel to reduce the time taken to populate data into memory. Once the data has been loaded into memory, the In-Memory OLTP engine applies the active transaction log records not yet covered by the checkpoint files so that the memory-optimized data is complete. During restore operation The In-Memory OLTP checkpoint files are created from the database backup, and then one or more transaction log backups are applied. As with crash recovery, the In-Memory OLTP engine loads data into memory in parallel, to minimize the impact on recovery time. Merging Data and Delta Files The data for memory optimized tables is stored in one or more data and delta file pairs (also called a checkpoint file pair, or CFP). Data files store inserted rows and delta files reference deleted rows. During the execution of an OLTP workload, as the DML operations update, insert, and delete rows, new CFPs are created to persist the new rows, and the reference to the deleted rows is appended to delta files. The metadata of all previously-closed and currently active CFPs is stored in an internal array structure referred to as the storage array. It is a finitely sized (8,192 entries) array of CFPs. The entries in the storage array are ordered by transaction range. The CFPs in the storage array (along with the tail of the log) represent all the on-disk state required to recover a database with memory-optimized tables. Over time, with DML operations, the number of CFPs grow causing the storage array to reach capacity, which introduces the following challenges: •Deleted rows. Deleted rows remain in the data file but are marked as deleted in the corresponding delta file. These rows are no longer needed and will be removed from the storage. If deleted rows were not removed from CFPs, they would use space unnecessarily and make recovery time slower. •Storage array full. When there 8,000 entries in the storage array are allocated (192 entries in the array are reserved for existing merges to compete or to allow you to do manual merges), no new DML transactions can be executed on durable memory-optimized tables. Only checkpoint and merge operations are allowed to consume the remaining entries. This ensures that DML transactions do not fill the array and that some entries in the array are reserved to merge existing files and to reclaim space in the array. •Storage array manipulation overhead. Internal processes search the storage array for operations such as finding the delta file to append information about a deleted row. The cost of these operations increases with the number of entries. To help prevent these inefficiencies, the older closed CFPs are merged, based on a merge policy described below, so the storage array is compacted to represent the same set of data, with a reduced number of CFPs. The total in-memory size of all durable tables in a database should not exceed 250 GB. Durable tables that use up to 250 GB of memory will, assuming insert, delete, and update operations, require on average 500 GB of storage space. 4,000 data and delta file pairs in the memory-optimized file group are required to support the 500 GB of storage space. Short-term surges in database activity may cause checkpoint and merge operations lag, which will increase the number of required data and delta file pairs. To accommodate short-term surges spikes in database activity, the storage system can allocate up to 8,000 data and delta file pairs up to a total of 1TB of storage. When that limit is reached, there will be no new transactions allowed on the database until checkpoint operations catch up. If the size of durable tables in memory exceeds 250GB for long periods of time, there is a chance of reaching the 8,000 file pair limit. The merge operation takes as input one or more adjacent closed CFPs (called merge source) based on an internally defined merge policy, and produces one resultant CFP, called the merge target. The entries in each delta file of the source CFPs are used to filter rows from the corresponding data file to remove the data rows that are not needed. The remaining rows in the source CFPs are consolidated into one target CFP. After the merge is complete, the resultant merge-target CFP replaces the source CFPs (merge sources). The merge-source CFPs go through a transition phase before they are removed from storage. Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
15
Questions and Answers Copyrights © 2016 Fard Solutions Sdn Bhd, All rights reserved.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.