Download presentation
Presentation is loading. Please wait.
Published byCecily Miles Modified over 8 years ago
1
SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012
2
Agenda Purpose of the CDR archiving Why the redesign The current environment The new environment Columnstore index, what is it? CSIX brief explanation
3
Agenda Initial loading process Threating the loaded data How to deal with CSIX Partition layout Database layout Process Flow Q & A
4
What is the purpose of CDR archiving -Creation of an archive for Call Data Records with a retention period of 36 months. -Extract Call information from this archive system on daily base -Extension of an existing dataflow 1 Month3 Year
5
What is the purpose of CDR archiving -The first goal is the ability for the financial department to deal with the disputes. When dispute occurs, finance need to be able to prove that this customer sent us the traffic. -CDR’s for specific customers usually retrieved for a 15 days or one month period depending on the billing cycle run. There is also the option to filter out the request for some customers who send a lot of traffic. -Last but not least there are technical network guys launching requests on the system to track and find issues on routes used by the calls.
6
Why upgrading To implement new solutions to improve query performance and manageability Up till 15000 partitions so we can partition on daily base Currently on 1000 partitions possible with SQL 2008 Colum store index Ready to accept more load Currently only +/- 20 queries a day possible (old system was designed for 15) Goal is to go to more than 50 queries a day en beyond Ready for the future Currently only CDR’s stored Future, XDR archiving needed (A lot more data volume) Make use of the same Hardware infrastructure in a more efficient way
7
The current system (1) SQL Server 2008 used Size: 11TB allocated Uncompressed data > 10 TB Compressed data > 6TB (Page level) Page compression activated 3 file groups created on DB FG_CDR_CALL Voice Call fact data 10 database files (1 file/Disk) FG_CDR_DATA Data Call fact data 1 database file FS_DATA Staging & dimension tables 1 database file
8
The current system (2) Storage on SAN Transaction Log files & staging data on FC disks, other mix SATA / FC Staging table unused Daily import directly to partitioned tables Import transaction based, no bulk load 3 years of data in database Partitioned on week level Only one clustered index on tables, no other indexes implemented
9
The current system (3)
10
The new system (1) -Keep the cost low -A new redesigned database layout for better I/O performance -More file-groups with one file / file-group to make sure I/O reading is sequential and not random -Make also use of page compression -Make use of new type of index available in SQL Server 2012 -column store index
11
The new system (2) -More partitions possible, max 15000 -Partitions on daily base instead of weekly -3 years = 1100 partitions -Easier to clean up out of date CDR’s -Smaller datasets to Query -Make use of staging tables for faster load (bulk load) -Do 6 parallel load streams for daily load
12
ColumnStore index, what is it? Result of Project Apollo Index that stores data column-wise instead of row-wise Create for data warehouse purpose. For Fact tables and large dimension tables (> 10Mlj) Index optimized for scans Only one per Table and currently only non-clustered Must contain columns from cluster index key No concept of key columns, 16 limit doesn’t apply If partitioned, must contain partitioning column No 900 bytes limit It’s for read only purpose
13
ColumnStore index (1) Data Stored in a Traditional “Row Store” Format Same in column store format ColumnStore 1 2 3 4 5 Ross Sherry Gus Stan Lijon San Francisco New York Seattle San Jose Sacramento CA NY WA CA CA Row Store 1RossSan FranciscoCA 2SherryNew YorkNY 3GusSeattleWA 4StanSan JoseCA 5LijonSacramentoCA
14
ColumnStore index (2) Each page stores data from a single column
15
Colum store index (3) Base table 1M rows/group New system table: sys.column_store_segments Includes segment metadata: size, min, max, … Column store index S e g m e n t d i r e c t o r y Blobs Row group2 Row group1 Row group3 Row group
16
Colum store index limitations (1) -Limited support for data types. Following data types are not supported: -No Decimals and numeric with precision > 18 digits -binary or varbinary -(N)text and image -(N)varchar(Max) -uniqueidentifier -rowversion (and timestamp) -sql_variant -blob -CLR types (hierarchyid and spatial types) -No Datetimeoffset with scale > 2 -xml
17
Colum store index limitations (2) -Other limitations -No Page or Row compression on CSIX -Cannot be unique -Can’t act as primary- or foreign key -Tables and Columns using Change Data Capture -No columns with Filestream support -No Replication technology -No computed columns or sparse columns -No filtered CSIX -# Columns <= 1024 -Not on indexed view -No include -, ASC or DESC keywords
18
ColumsStore index performance (1) Higher Query speed, why? data organized in a column, shares many more similar characteristics than data organized across rows this result in higher level of compression Use of VertiPaq compression algorithm technology also more superior than SQL compression algorithm (available in Analyse server for PowerPivot in SQL 2008R2) Less I/O transferred from disk to Mry Fetched data only for columns needed by query (Bitmap filter optimization) Algorithms are optimized to take better advantage of modern hardware (More Cores, more Ram,..) Batch mode process
19
ColumsStore index performance (2) ColumnStore Compression Encoding – convert to integers Value based encoding Dictionary (hash) encoding Row reordering Find optimal ordering of rows Proprietary algorithm (Vertipaq) Compression Run length encoding Bit packing 1.8 X better compression than SQL’s page compression
20
ColumsStore index performance (3) Batch-mode processing: is a new, highly-efficient vector technology that works with columnstore indexes. Check Query plan for execution mode. A batch is stored as a vector in a separate area of memory and represent +/- 1000 rows of data Operators that can run in batch mode Scan Filter Hash aggregate Hash join Batch hash table build
21
ColumsStore index performance (4) IO and caching New large object cache Cache for columns segments and dictionaries Aggressive read ahead At segment level At page level within a segment Early segment elimination based on segment metadata Min and Max values stored in segment metadata
22
ColumnStore Index additional info (1) Some Considerations Creation takes +/- 1.5 times normal index More Mry needed Memory grant request in MB = [(4.2 *Number of columns in the CS index) + 68]*DOP + (Number of string cols * 34)
23
ColumnStore Index additional info (1) The command Idem as other create index statements but with less options CREATE [ NONCLUSTERED ] COLUMNSTORE INDEX index_name ON ( column [,...n ] ) [ WITH DROP_EXISTING = { ON | OFF } [ MAXDOP = x ] ) ] [ ON { { partition_scheme_name ( column_name ) } | filegroup_name | "default" } ]
24
ColumnStore Index additional info (2) Some DMV’s sys.column_store_index_stats sys.column_store_segments sys.column_store_dictionaries
25
How to deal with CSX during CDR load -Conflict: Column store index is read only Data need daily update -Solution: Split into fix part & updatable part -Remark: > 99% of data for a partition is loaded the first 3 days -After 3 days we can switch the data to the fix part and make use of column store index.
26
The loading
27
How loaded data is treated.
28
How to deal with CS index limitations (1) -Split done on fact tables -3 main fact tables with column store index and 1100 partitions -3 secondary fact table without column store index but with 1100 partitions -Loading happens into 6 different staging tables -Main staging tables with segment date = call date, one partition -Secondary staging tables with segment date <> call date, one partition
29
How to deal with CS index limitations (2) -All loaded data by default transferred to secondary fact tables -Main staging by switch in, Secondary by insert -Partition on secondary fact table with call date = today – 3 days is moved to main table -A clean-up jobs runs to check secondary fact tables. Rows are moved to main table partitions starting with the partitions containing most of the data.
30
How to deal with CS index limitations (3) Views are used to query both tables with call_reduced as ( select filed1, Field2,... from CDR_Schema.Tbl_ibis_Call_reduced_Prim Inner join.... on... Where a and b union all select filed1, Field2,... from CDR_Schema.Tbl_ibis_Call_reduced_Sec Inner join.... on... Where a and b) Select filed1, Field2,... From Call_reduced
31
Partition & File group Layout -Partitions based on call_dt (daily base) -1100 partitions (3 years of data) -For each fact table same partition function but other schema -Partitions divided over multiple file-groups in a round- robin way -For every file-group, one data file -10 file-groups created / main fact table
32
Filegroup layout -Possibility to extent nr of file groups, if needed
33
Database layout
34
Daily data flow
35
Clean up process
36
Clean up process.
37
What about the performance Some results 30 query’s executed
38
Results in detail
39
Some sample of the used data
40
Is there something you want to ask? Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.