Download presentation
Presentation is loading. Please wait.
Published byCleopatra Rose Modified over 9 years ago
2
CapEx + OpEXOpEx
3
Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables
4
AsynchronousSynchronous Blocking Buffers
6
I've always believed in numbers and the equations and logics that lead to reason. But after a lifetime of such pursuits, I ask, “What truly is logic?” “Who decides reason?” My quest has taken me through the physical, the metaphysical, the delusional -- and back.
7
Late arriving dimensions Denormalised dataAlternativesSlowly changing dimensionsOne to many lookups
8
Owner FirstName MiddleNames Surname Item NameCollectionName ValidFrom ValidTo
10
4,432,277 rows
11
01:24.672
12
31.547 sec
13
52.141 sec
14
Foreach RowSend Row on Dimension partial-cache lookup successful? Dimension full-cache lookup successful? Set resolution from full-cache Set resolution with value in dimension Add value to dimension Set resolution from partial-cache BS6224:1987
15
> 4 MINUTES!!!!! 4,432,277 0 1,401,849 3,030,428 But there are only 100 Owners!
16
Full cache lookupPartial cache lookup OLEDB To Write Record No match Match
17
Full cache lookupPartial cache lookup OLEDB To Write Record No match Match
18
Full cache lookupPartial cache lookup OLEDB To Write Record No match Match
19
Full cache lookupPartial cache lookup OLEDB To Write Record No match Match
20
Cache lookup not successful? Update dimension & cache Preload cacheForeach RowSend Row on Update Row from cache BS6224:1987
22
Dictionary Immutable ObjectConfiguration
23
Inputs
24
Outputs
25
“Clean Code” Robert C. Martin (Uncle Bob)
26
Connection Managers
27
Pre Execute private readonly Dictionary cache = new Dictionary (); Key’s properties set through constructorKey’s properties are READ ONLY {get; private set;}Need to override GetHashCode() & Equals() of Key
33
Pre Execute
34
24.547 sec 22%
35
Pre Execute private readonly Dictionary > cache = new Dictionary >(); Output is Asynchronous Output row count per input row = Number of values retrieved from dictionary Input columns copied to output
36
Lookup 1 Lookup 2 Union All Default value fx Matched Not matched Matched Not matched Lookup 1 Lookup 2 Default value fx Not matched Matched Merge
37
Lookup 3 Each lookup adds its own valueConditionally apply slow lookupsSelect most appropriate result CoalesceLookup 2Lookup 1
38
DistinctSortAggregate xx
39
Cache lookup not successful? Add to cache Write cache out Start End Create empty cache for rows Foreach row Create copy of row BS6224:1987
40
Cache lookup not successful? Add to cache Write cache out Start End Create empty cache for rows Foreach row Create copy of row BS6224:1987 28%
41
Is all of the data required?Partitioning is your friend! 4,432,277 rows 44,323 rows x100
42
Owner Id Collection Id Item Id Valid From Valid To Partition Key Sort KeySort Data SortKey implements IComparable & IComparable Dictionary points to IEnumerable 1Many
43
BS6224:1987 Start End Foreach row Create Partition Key for row Create Sort Key for row Create Sort Data for row Is partition key of row different to current key Write out then clear sorter Set current key to partition key of row Add sort key & data to sorter Write out then clear sorter
44
Pre Execute public override Process_Input0(Input0Buffer buffer) { while (buffer.NextRow()) { ProcessRow(buffer); } if (buffer.EndOfRowSet()) { WriteoutSorter(); } private readonly SortedDictionary > sorter = new SortedDictionary< SortKey,IEnumerable >(); private PartitionKey currentPartitionKey = null;
45
Pre Execute Structure identical to sort exceptOne to one map between key and data xx Data is now MUTABLE - Updated as aggregation progresses
47
38.890 sec (From 01:24.672) 54%
48
Helping out SQL Server
50
25 HOURS! (250 million rows)
51
Read Keys
52
Partitioned Left Hash Lookup
53
Read Keys Partitioned Left Hash Lookup Partitioned Right Hash Lookup
54
Read Keys Partitioned Left Hash Lookup Partitioned Right Hash Lookup Partitioned Sort (to cluster key for data) Read Data (via cluster key) Merge Join 90 minutes! (250 million rows)
55
Bulk Loading Partitioned Tables P Partitioned Table Data Source Transform
56
Data Source ………………………. Switch-in Tables Physical partitions (sort order) Time based partitioning
59
Data Source Batch Manager ActionBlock (multi threaded wrapper around a table writer utilising SqlBulkCopy) BatchBlocks Configure: Size of batches Number of writer threads TPL component Custom code
63
KISSKISS eep t hort imple New tools available but one size does not fit allUtilise partitioningYou have the power
65
So, it is possible to move from the Asynchronous to(wards) the Synchronous
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.