SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012.

SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012

Agenda Purpose of the CDR archiving Why the redesign The current environment The new environment Columnstore index, what is it? CSIX brief explanation

Agenda Initial loading process Threating the loaded data How to deal with CSIX Partition layout Database layout Process Flow Q & A

What is the purpose of CDR archiving -Creation of an archive for Call Data Records with a retention period of 36 months. -Extract Call information from this archive system on daily base -Extension of an existing dataflow 1 Month3 Year

What is the purpose of CDR archiving -The first goal is the ability for the financial department to deal with the disputes. When dispute occurs, finance need to be able to prove that this customer sent us the traffic. -CDR’s for specific customers usually retrieved for a 15 days or one month period depending on the billing cycle run. There is also the option to filter out the request for some customers who send a lot of traffic. -Last but not least there are technical network guys launching requests on the system to track and find issues on routes used by the calls.

Why upgrading To implement new solutions to improve query performance and manageability Up till 15000 partitions so we can partition on daily base Currently on 1000 partitions possible with SQL 2008 Colum store index Ready to accept more load Currently only +/- 20 queries a day possible (old system was designed for 15) Goal is to go to more than 50 queries a day en beyond Ready for the future Currently only CDR’s stored Future, XDR archiving needed (A lot more data volume) Make use of the same Hardware infrastructure in a more efficient way

The current system (1) SQL Server 2008 used Size: 11TB allocated Uncompressed data  > 10 TB Compressed data  > 6TB (Page level) Page compression activated 3 file groups created on DB FG_CDR_CALL  Voice Call fact data  10 database files (1 file/Disk) FG_CDR_DATA  Data Call fact data  1 database file FS_DATA  Staging & dimension tables  1 database file

The current system (2) Storage on SAN Transaction Log files & staging data on FC disks, other mix SATA / FC Staging table unused Daily import directly to partitioned tables Import transaction based, no bulk load 3 years of data in database Partitioned on week level Only one clustered index on tables, no other indexes implemented

The current system (3)

The new system (1) -Keep the cost low -A new redesigned database layout for better I/O performance -More file-groups with one file / file-group to make sure I/O reading is sequential and not random -Make also use of page compression -Make use of new type of index available in SQL Server 2012 -column store index

The new system (2) -More partitions possible, max 15000 -Partitions on daily base instead of weekly -3 years = 1100 partitions -Easier to clean up out of date CDR’s -Smaller datasets to Query -Make use of staging tables for faster load (bulk load) -Do 6 parallel load streams for daily load

ColumnStore index, what is it? Result of Project Apollo Index that stores data column-wise instead of row-wise Create for data warehouse purpose. For Fact tables and large dimension tables (> 10Mlj) Index optimized for scans Only one per Table and currently only non-clustered Must contain columns from cluster index key No concept of key columns, 16 limit doesn’t apply If partitioned, must contain partitioning column No 900 bytes limit It’s for read only purpose

ColumnStore index (1) Data Stored in a Traditional “Row Store” Format Same in column store format ColumnStore 1 2 3 4 5 Ross Sherry Gus Stan Lijon San Francisco New York Seattle San Jose Sacramento CA NY WA CA CA Row Store 1RossSan FranciscoCA 2SherryNew YorkNY 3GusSeattleWA 4StanSan JoseCA 5LijonSacramentoCA

ColumnStore index (2) Each page stores data from a single column

Colum store index (3) Base table 1M rows/group New system table: sys.column_store_segments Includes segment metadata: size, min, max, … Column store index S e g m e n t d i r e c t o r y Blobs Row group2 Row group1 Row group3 Row group

Colum store index limitations (1) -Limited support for data types. Following data types are not supported: -No Decimals and numeric with precision > 18 digits -binary or varbinary -(N)text and image -(N)varchar(Max) -uniqueidentifier -rowversion (and timestamp) -sql_variant -blob -CLR types (hierarchyid and spatial types) -No Datetimeoffset with scale > 2 -xml

Colum store index limitations (2) -Other limitations -No Page or Row compression on CSIX -Cannot be unique -Can’t act as primary- or foreign key -Tables and Columns using Change Data Capture -No columns with Filestream support -No Replication technology -No computed columns or sparse columns -No filtered CSIX -# Columns <= 1024 -Not on indexed view -No include -, ASC or DESC keywords

ColumsStore index performance (1) Higher Query speed, why? data organized in a column, shares many more similar characteristics than data organized across rows this result in higher level of compression Use of VertiPaq compression algorithm technology also more superior than SQL compression algorithm (available in Analyse server for PowerPivot in SQL 2008R2) Less I/O transferred from disk to Mry Fetched data only for columns needed by query (Bitmap filter optimization) Algorithms are optimized to take better advantage of modern hardware (More Cores, more Ram,..) Batch mode process

ColumsStore index performance (2) ColumnStore Compression Encoding – convert to integers Value based encoding Dictionary (hash) encoding Row reordering Find optimal ordering of rows Proprietary algorithm (Vertipaq) Compression Run length encoding Bit packing 1.8 X better compression than SQL’s page compression

ColumsStore index performance (3) Batch-mode processing: is a new, highly-efficient vector technology that works with columnstore indexes. Check Query plan for execution mode. A batch is stored as a vector in a separate area of memory and represent +/- 1000 rows of data Operators that can run in batch mode Scan Filter Hash aggregate Hash join Batch hash table build

ColumsStore index performance (4) IO and caching New large object cache Cache for columns segments and dictionaries Aggressive read ahead At segment level At page level within a segment Early segment elimination based on segment metadata Min and Max values stored in segment metadata

ColumnStore Index additional info (1) Some Considerations Creation takes +/- 1.5 times normal index More Mry needed Memory grant request in MB = [(4.2 *Number of columns in the CS index) + 68]*DOP + (Number of string cols * 34)

ColumnStore Index additional info (1) The command Idem as other create index statements but with less options CREATE [ NONCLUSTERED ] COLUMNSTORE INDEX index_name ON ( column [,...n ] ) [ WITH DROP_EXISTING = { ON | OFF } [ MAXDOP = x ] ) ] [ ON { { partition_scheme_name ( column_name ) } | filegroup_name | "default" } ]

ColumnStore Index additional info (2) Some DMV’s sys.column_store_index_stats sys.column_store_segments sys.column_store_dictionaries

How to deal with CSX during CDR load -Conflict: Column store index is read only  Data need daily update -Solution: Split into fix part & updatable part -Remark: > 99% of data for a partition is loaded the first 3 days -After 3 days we can switch the data to the fix part and make use of column store index.

The loading

How loaded data is treated.

How to deal with CS index limitations (1) -Split done on fact tables -3 main fact tables with column store index and 1100 partitions -3 secondary fact table without column store index but with 1100 partitions -Loading happens into 6 different staging tables -Main staging tables with segment date = call date, one partition -Secondary staging tables with segment date <> call date, one partition

How to deal with CS index limitations (2) -All loaded data by default transferred to secondary fact tables -Main staging by switch in, Secondary by insert -Partition on secondary fact table with call date = today – 3 days is moved to main table -A clean-up jobs runs to check secondary fact tables. Rows are moved to main table partitions starting with the partitions containing most of the data.

How to deal with CS index limitations (3) Views are used to query both tables with call_reduced as ( select filed1, Field2,... from CDR_Schema.Tbl_ibis_Call_reduced_Prim Inner join.... on... Where a and b union all select filed1, Field2,... from CDR_Schema.Tbl_ibis_Call_reduced_Sec Inner join.... on... Where a and b) Select filed1, Field2,... From Call_reduced

Partition & File group Layout -Partitions based on call_dt (daily base) -1100 partitions (3 years of data) -For each fact table same partition function but other schema -Partitions divided over multiple file-groups in a round- robin way -For every file-group, one data file -10 file-groups created / main fact table

Filegroup layout -Possibility to extent nr of file groups, if needed

Database layout

Daily data flow

Clean up process

Clean up process.

What about the performance Some results 30 query’s executed

Results in detail

Some sample of the used data

Is there something you want to ask? Q & A

SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012.

Similar presentations

Presentation on theme: "SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012.

Similar presentations

Presentation on theme: "SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012."— Presentation transcript:

Similar presentations

About project

Feedback