Download presentation
Presentation is loading. Please wait.
1
Quantitative Performance Analysis Joe Chang jchang6@yahoo.com www.sql-server-performance.com/joe_chang.asp
2
Objectives Estimate DB performance early What design/architecture decisions impact performance? Is the project/architecture feasible? Production database performance tuning Reduces guess work, but there are easier ways Server Performance Characteristics Processor Architecture: Pentium III – Xeon – Itanium 2– Opteron System Architecture: 2, 4, 8, 16-way etc
3
SPEC CPU 2000 Integer Pentium M 2.0GHz 90nm, all others 130nm Xeon 2.4GHz/512K base: 913 (used later in this presentation) Pentium III Xeon 700MHz/2M : 431
4
TPC-C Performance – SQL Server # CPUsSystemtpm-C$/tpm-CMem(GB)# Disks 1HP (3.2GHz/2M)35,030$1.881243+4 2HP (3.2GHz/2M)60,364$3.5112280+10 4IBM x365102,667$3.5232266 8IBM x445156,195$4.3164616 16Unisys ES7000237,869$5.0864700+10 32Unisys ES7000304,148$6.18641092+12 # CPUsSystemtpm-C$/tpm-CMem(GB)# Disks 4HP DL585115,1102.6232295+8 Xeon 3.2GHz/2M & Xeon MP 3.0GHz/4M Opteron 2.2GHz/1M IA-32 limited max memory (64GB), AWE overhead, bus architecture
5
Itanium 2 (SQL Server) vs IBM Power 5 # CPUsSystemtpm-C$/tpm-CMem(GB)# Disks 4Pwr5 570 /Oracle194,391$5.62128432+16 8Pwr5 570 /UDB429,900$4.99256880+40 16Pwr5 570 /UDB809,144$4.955121600+40 32Pwr5 695 /Oracle1,601,785$5.2710243200+96 64Pwr5 695 /UDB3,210,540$5.1920486400+140 IBM Power 4+/5 1.9GHz # CPUsSystemtpm-C$/tpm-CMem(GB)# Disks 4HP rx5670121,065$4.4964448+20 8Bull175,366$4.54128225 16Unisys ES7000309,037$4.49128770+24 32NEC Express577,531$7.745121150 64HP Superdome786,646$6.495121792+60 Itanium 2 1.5GHz/6M
6
Scaling P n / P 1 = S ** log 2 (n) P n Performance with n processorsP 1 Perf. with 1 processor S Scale Factor nNumber of processors
7
Unit of Measure – CPU-Cycles Query costs measured in CPU-cycles Alternative: CPU-sec Cost = Runtime (sec) × CPU Util. × Available CPU-cycles ÷ Iterations Available CPU-cycles = Number of CPUs × Frequency Example 4 x 700MHz = 2.8B cycles/sec CPU-cycles does not imply CPU instructions, Unit of time same as CPU clock 1GHz CPU: time unit = 1ns 2GHz CPU: time unit = 0.5ns All tests on Windows 2000/2003, SQL Server 2000 SP1-3
8
CPU-Cycles dependencies CPU-cycles on one processor architecture has no relation to another – ex. Pentium III, Pentium 4, Itanium, Opteron Some platform dependencies – cache size, bus speed, SMP Notes: Some platform dependencies – cache size, bus speed, SMP ProcessorSystem/Cache/MemPerformanceCost / trans. Itanium 24 x 1.5GHz/6M /64G121,0652.974M Xeon4 x 3.0GHz/4M /32G102,6677.013M Opteron4 x 2.2GHz/1M /32G105,6874.996M
9
Cost Structure - Model Stored Procedure Call Cost = RPC cost (once per procedure) + Type cost (once per procedure?) + Query costs (one or more per procedure) Query – one or more components Component Cost = Cost for component operation base + Cost per additional row or page Only stored procedures are examined
10
RPC Cost Cost of RPC to SQL Server includes: 1) Network roundtrip 2) SQL Server handling costs Calls to SQL Server are made with RPC (not SQL Batch) Profiler -> Event Class: RPC ADO.NET Command Type Stored Procedure or Text with parameters Command Text without parameters: SQL Batch
11
Type Cost? Blank Procedure: ~250,000 CPU-cycles CREATE PROC p_ProcBlank AS RETURN Proc with Single Query ~320K CPU-cycles CREATE PROC p_ProcSingleQuery AS SELECT … FROM TableA WHERE ID = @ID Proc with Two Queries~360K CPU-cycles CREATE PROC p_ProcTwoQueries AS SELECT … FROM TableA WHERE ID = @ID SELECT … FROM TableB WHERE ID = @ID RPC Cost ~250K CPU-cycles, Type Cost ~30K CPU-cycles, Query Cost ~40K CPU-cycles
12
Blank RPC Performance All systems with 2 processors, Xeon 3.2GHz with 800MHz bus RPC’s are expensive relative to simple query 120K 270K 240K 260K 195K 154K 140K Costs in CPU-Cycles
13
RPC Cost Processor PIIIPIII X Xeon Opteron Itanium 2*It2 CPUs2422248 RPC cost 140K 200K250 140K155K290K350K 270K (2.xx driver) Type Cost Select20-30K ~5K35-55K ~20K~8K Systems: Pentium III 2x 600MHz/256K, 2x 733MHz/256K, PIII Xeon2x 500MHz/2M, 4x 700MHz/2M, 4x900/2M Xeon (P4) 2x 2.0GHz/512K2x 2.4GHz/512K Opteron2x 2.2GHz/1M Itanium 22x 900MHz/1.5M8x1.5GHz/6M OS: W2K, W2K3, various sp SQL Server 2000, various sp PIII: Intel PRO/100+, Others: Broadcom Gigabit Ethernet driver 5.xx+ *Itanium 2 system booted with 2, 4 or 8 processors (4P config may have had procs from more than 1 cell) Costs in CPU-Cycles
14
RPC Cost – Fiber versus Threads PIII Xeon - TCP 1P2P4P FE-Thread105K150K200K FE-Fiber 95K120K170K Xeon - TCP GE-Thread210K 250K GE-Fiber200K230K Xeon - VI VI Thread190K VI Fiber160K180K Itanium 2 - TCP1P2P4P8P Thread105K155K290K350K Fiber 95K145K260K300K Costs in CPU-Cycles Broadcom Gigabit Ethernet driver 5.xx, 6.xx, 7.xx (270K for 2P 2.xx driver) VI: QLogic QLA2350, drivers: qla2300 8.2.2.10, qlvika 1.1.3
15
RPC Cost TCP vs Named Pipes PIII Xeon 4PTCPnamed pipes FE-Thread200K315K FE-Fiber170K370K Xeon, Thread1P 2P GE, TCP210K 250K GE, Named Pipes320K360K Broadcom Gigabit Ethernet driver 5.xx, 6.xx, 7.xx (270K for 2P 2.xx driver) VI: QLogic QLA2350, drivers: qla2300 8.2.2.10, qlvika 1.1.3 Costs in CPU-Cycles
16
RPC Costs – owner, case PIIIPIII XP4/Xeon RPC cost 140K 140K?250K sp_executesql 210K Unspecified owner, Ex: user1 calls procedure owned by dbo +100K on 4P PIII, +100K on 2P Xeon, 300K on 8P Itanium2 Case mismatch: Actual procedure: p_Get_Rows Called procedure:p_get_rows +100K on 4P PIII, +150K on 2P Xeon, +300K on 8P Itanium2 Costs in CPU-Cycles
17
Single Row Select Costs Clustered Index Seek Does cost depend on index depth? Role of I/O count in cost Index Seek with Bookmark Lookup Does cost depend on table type? Heap versus clustered index Table Scan
18
Single Row Index Seek vs Index Depth Index depth: 50K rows -2 both, 200K 3-Cl, 2-NC, 500K 3 Pentium III 2x733
19
Batching Queries into a Single RPC Each query: clustered index seek returning 1 row
20
Single Row Cost Summary Index depth Plan – no, True cost -yes Cost versus index depth not fully examined Fill factor dependence not tested Bookmark Lookup – Table type Plan cost -no True cost higher for clustered index than heap
21
Multi-row Select Queries Queries return aggregate - Not a test of network bandwidth Single Table Example SELECT @Count = count(*), @Value1 = AVG(randDecimal) FROM M3C_01 WHERE GroupID = @ID 1 Count, 1 AVG on 5 byte decimal Join Example SELECT count(*), AVG(m.randDecimal), min(n.randDecimal) FROM M2A_02 m INNER LOOP JOIN M2B_02 n ON n.ID = m.ID WHERE m.GroupID = @ID AND n.GroupID = @ID 1 Count, 2 AVG on 5 byte decimal Table Scan tests return either - a single row - 1 Count, 1 aggregate on 5 byte decimal
22
Index Seek – Aggregates Xeon 2x3.06GHz/512K
23
Multi-row Bookmark Lookups Table lock escalation at >5K rows ? 2x2.2GHz/1M Opteron 1 Count, 1 AVG on decimal
24
Table Scan Xeon 3.2GHz/800MHz FSB HT
25
Table Scan – Component Cost Total cost = RPC + Type + Base + per page costs PIII (X)P4/Xeon OpteronIt2 2P8P Type + Base cost 60K/40K 145K35K90K Cost per page: NOLOCK: 24K -16K23-35K TABLOCK: 24K 25K16K PAGLOCK: 26K 26K 20K17K23-35K ROWLOCK: 140K 250K 110K100K150K Table Scan or Index Scan – Plan Formula I/O: 0.0375785 + 0.0007407 per page CPU: 0.0000785 + 0.0000011 per row Measured Table Scan cost formulas for 99 rows per page Costs in CPU-Cycles per page
26
Table Scan Cost per Page 2x733MHz/256KB Pentium III
27
Bookmark – Table Scan Cross-over Plan Costs: ≤1GB 2x733MHz/256K Pentium III
28
Loop, Hash and Merge Join Loop joins: Case 1, 2, 3 etc Hash joins: in-memory versus others Merge: regular versus many-to-many Merge join with sort operations Locks – page lock
29
Joins – Locking Granularity Xeon 2x2.0GHz/512K Count+2 AVG(decimal) Hash join spools to tempdb
30
Loop, Hash & Merge Joins Xeon 2x3.06GHz/512K(Count)
31
Processor Architecture Pentium III, Xeon Opteron, Itanium 2
32
Index Seek COUNT(*) + AVG(Decimal)
33
Bookmark Lookup Heap, Page Lock, sequential rows 1 Count, 1 AVG on decimal
34
Table Scan NOLOCK
35
Hyper-Threading Xeon 3.06GHz/512K (130nm) vs. Xeon 3.2GHz/1M & 800MHz FSB (90nm)
36
Hyper-Threading CPU linear with throughput CPU non-linear with throughput Exercise caution in extrapolating system performance based on low CPU measurements
37
Index Seek – Single Row
38
Index Seek - Multiple Rows
39
Table Scan - NOLOCK Xeon, Xeon/800 NOLOCK
40
Bookmark Lookup – Cl. IX, Seq, PL 1 Count, Clustered Index, Sequential rows, Page Lock
41
Bookmark Lookup – Heap, Seq, PL 1 Count, Heap, Sequential rows, Page Lock
42
Loop Join
43
Hash Join
44
Merge Join
45
Hyper-Threading Summary Lower cost RPC Improves performance of low cost queries May improve performance on some ops Degrades performance on others Xeon 800MHz FSB with 90nm core better than 130nm
46
Scaling Itanium 2, 1.5GHz/6M, 1, 2, 4, 8, 16 CPUs
47
Scaling First System: HP rx7620 8-way Itanium 2 1.5GHz/6M Boot with /NUMPROC = xx Query: Count + 1 or 2 AVG(Decimal) Second System: Unisys ES7000 Aries 420 16-way Itanium 2 1.5GHz/6M Booted with all 16 CPUs, Affinity Mask = 1, 2, … Query: Count only SQL Server 2000, Service Pack 3 (build 760)
48
Index Seek 9X difference between 16P and 1P
49
Bookmark Lookup 1-way Minor differences between Table org & lock granularity
50
Bookmark Lookup – 16 way Major differences! 100X !!
51
Bookmark Lookup – Clustered Index Really bad news beyond 2P, especially bad ~5K rows
52
Bookmark Lookup – Cl. IX, PL No scaling beyond 2P, but no disaster at 5K rows
53
Bookmark Lookup – Heap Performance degradation at 5K rows
54
Bookmark Lookup – Heap Page Lock Excellent scaling, 8P anomaly?
55
Joins – 1P Up to 10X difference between loop, hash & merge
56
Joins – 16P Up to 100X difference between loop, hash & merge
57
Loop Join No scaling beyond 2P
58
Hash Join Scales to 8P, peaks at ~5K rows
59
Merge Join Scales well all around
60
Table Scans Questionable scaling, peaks at 300 pages, cache size ?
61
Scaling Summary Avoid Bookmark Lookups on Tables with Clustered indexes Use Heap tables Use NOLOCK, especially ~5000 rows / query Avoid Loop Joins Design for Merge or Hash Joins Avoid table scans (Itanium only?)
62
Bookmark HP 32P, SP3 build 760/818
63
Bookmark HP 32P, SP4 build 2000?
64
Joins HP 32P, SP3 build 760/818
65
Joins HP 32P, SP4 build 2000?
66
Table Scan HP 32P,
67
Bookmark Lookup (PL Sequential)
68
Index Seek
69
Loop Join
70
Hash Join
71
Merge
72
Joins HP 32P, SP4 build 2000?
74
Database Design for Performance Round-trip minimization – RPC cost Row count management – Cost per row Indexes – isolate queried rows to a limited number of adjacent pages quickly, not most selective columns 1st Design for low cost operations Covered index instead of bookmark lookups Merge joins, Indexed views Avoid excessive logic NOLOCK on non-transactional data
75
Statistics Accuracy & Relevance More than keeping statistics up to date Data queried needs to reflect data in table Avoid populating database with test data having different distribution than live data
76
Performance Issues NC Index Seek + Bookmark Lookup vs. Table Scan Query optimizer switches to table scan too soon for in-memory, too late for disk bound data Watch table scan lock level Row count plan versus actual cost issues May be related to WHERE clause conditions Lock hints Merge and Hash joins vs. Loop joins Fixed costs favor consolidation Both in RPC and queries
77
Summary Query cost structure Fixed costs identified Costs applied once per RPC Component operations examined Base cost and cost per row or page Lock hints – Row, Page, Table, No Lock
78
Additional Information www.sql-server-performance.com/joe_chang.asp SQL Server Quantitative Performance Analysis Server System Architecture Processor Performance Direct Connect Gigabit Networking Parallel Execution Plans Large Data Operations Transferring Statistics SQL Server Backup Performance with Imceda LiteSpeed jchang6@yahoo.com
79
Quantitative Performance Analysis Backup Slides
80
Subjects Cost measurement Test procedure, Unit of measure Query Costs Single row, table scan, multi-row, joins Discrepancies with plan costs Logical I/O, Lock Hints, Conditions Database design implications Design, coding and indexes
81
Test Procedure Load generator Drive DB server CPU utilization as high as possible (80-100%) Multiple load generators running same query Single thread will not fully load DB server Network propagation time is significant Most queries run on single processor Server may have more than one processor Component operation cost derived Comparing two queries differing in one op
82
Index Seek by Aggregates SELECT COUNT(*) SELECT COUNT(*), CONVERT int to bigint, etc SELECT COUNT(*), SUM(int), SELECT COUNT(*), SUM(Money)
83
Join Cost Linearity on 32-bit Hash join cost per row is somewhat linear up to ~5M rows Duration then jumps due to disk IO
84
Join Cost Linearity on Itanium 2 Loop & Merge join cost per row independent of row count Hash join cost per row is not (no spooling to temp)
85
Hash Join – row size Optimizer cost: depends on OS size, not IS size Actual cost: depends on # of columns and OS/IS
86
Type + Component Base Cost ProcessorPIIIPIII XXeonOptItanium 2 # of CPUs 2422248 Frequency ~7337002-2.42.2900M1.5G Cache 256K2M512K1M1.5M6M Aggregate 100K40K135K45K 35K66K Table Scan 60K40K145K50K35K40K90K Loop Join 110K24K130K50K45K10K15K Hash Join 500K250K620K250K225K210K320K Merge Join 150K54K170K80K75K55K110K Merge + Sort 145K320K140K230K Many-to-Many 250K550K200K320K Costs in CPU-Cycles Big cache lowers base cost?
87
Cost per Row ProcessorPIIIXeonOptItanium 2 2P, 4P, 8P Index Seek1.3K1.8K1.1K1.0K Bookmark LU (CL)16K20K12K11K16K27K BL (CL) page lock11K15K8K 13K18K Bookmark LU (Heap)11K14K9K10K11K16K BL (Heap) page lock7K9K5K6K8K Loop Join16K25K13K11K16K23K Loop Join, page lock13K20K11.5K9K14K21K Hash Join8.5K11K7.5K7K Hash Join, page lock6.5K8K6.0K5K Merge Join6.5K10K5.0K6K Merge Join, page lock3.0K4K2.5K3K Merge+Sort7.5K9K6K7K Many-To-Many PL32K40K18K31K Costs in CPU-Cycles
88
IUD Cost Structure 2xXeon*Notes RPC cost 240,000Higher for threads, owner m/m Type Cost 130,000once per procedure IUD Base 170,000once per IUD statement Single row IUD300,000Range: 200,000-400,000 Multi-row Insert Cost per row 90,000cost per additional row INSERT, UPDATE & DELETE cost structure very similar Multi-row UPDATE & DELETE not fully investigated *Use Windows NT fibers on
89
INSERT Cost Structure Index and Foreign Key not fully explored Early measurements: 50-70,000 per additional index 50-70,000 per foreign key Assumes inserts into “sequential” segments Cluster on Identity or Cluster Key + Identity Cluster on row GUID can cause substantial disk loading!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.