OLTP on Hardware Islands Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki Data-Intensive Application and Systems Lab,

OLTP on Hardware Islands Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki Data-Intensive Application and Systems Lab, EPFL *IBM Research - Almaden

Hardware topologies have changed Topology Core to core latency Core Island CMP SMP of CMP low variable SMP high (~10ns) (10-100ns) (~100ns) 2 Variable latencies affect performance & predictability Should we favor shared-nothing software configurations?

The Dreaded Distributed Transactions 3 % Multisite Transactions Throughput Shared-Nothing Shared-Everything Shared-nothing OLTP configurations need to execute distributed transactions Extra lock holding time Extra sys calls for comm. Extra logging Extra 2PC logic/instructions ++ At least SE provide robust perf. Some opportunities for constructive interference (e.g. Aether logging) Shared-nothing configurations not the panacea!

Deploying OLTP on Hardware Islands 4 HW + Workload -> Optimal OLTP configuration Shared-everything Single address space Easy to program х Random read-write accesses х Many synchronization points х Sensitive to latency Shared-nothing Thread-local accesses х Distributed transactions х Sensitive to skew

Outline Introduction Hardware Islands OLTP on Hardware Islands Conclusions and future work 5

6 Multisocket multicores Communication latency varies significantly Core L1 L2 L1 L2 L1 L2 L3 Memory controller Core L1 L2 Core L1 L2 L1 L2 L1 L2 L3 Core L1 L2 L1 L3 Inter-socket links Memory controller Inter-socket links 50 cycles 500 cycles <10 cycles

OS Placement of application threads Counter microbenchmark TPC-C Payment 8socket x 10cores 4socket x 6cores 39% 7 Unpredictable 40% 47% ? ? ? ? ? ? ? ? ? ? ? ? SpreadIsland SpreadIsland Islands-aware placement matters

8 Impact of sharing data among threads Counter microbenchmarkTPC-C Payment – local-only 18.7x 516.8x 4.5x Fewer sharers lead to higher performance 8socket x 10cores4socket x 6cores Log scale (vs 8x) (vs 80x)

Outline Introduction Hardware Islands OLTP on Hardware Islands –Experimental setup –Read-only workloads –Update workloads –Impact of skew Conclusions and future work 9

Experimental setup Shore-MT –Top-of-the-line open source storage manager –Enabled shared-nothing capability Multisocket servers –4-socket, 6-core Intel Xeon E7530, 64GB RAM –8-socket, 10-core Intel Xeon E7-L8867, 192GB RAM Disabled hyper-threading OS: Red Hat Enterprise Linux 6.2, kernel 2.6.32 Profiler: Intel VTune Amplifier XE 2011 Workloads: TPC-C, microbenchmarks 10

Microbenchmark workload Single-site version –Probe/update N rows from the local partition Multi-site version –Probe/update 1 row from the local partition –Probe/update N-1 rows uniformly from any partition –Partitions may reside on the same instance Input size: 10 000 rows/core 11

Software System Configurations 1 Island Shared-everything 24 Islands Shared-nothing 4 Islands 12

13 Increasing % of multisite xcts: reads Contention for shared data No locks or latches Messaging overhead Fewer msgs per xct Finer grained configurations are more sensitive to distributed transactions

Where are the bottlenecks? Read case 14 4 Islands 10 rows Communication overhead dominates

15 Increasing size of multisite xct: read case Physical contention More instances per transaction All instances involved in a transaction Communication costs rise until all instances are involved in every transaction

16 Increasing % of multisite xcts: updates No latches 2 round of messages +Extra logging +Lock held longer Distributed update transactions are more expensive

Where are the bottlenecks? Update case 17 4 Islands 10 rows Communication overhead dominates

18 Increasing size of multisite xct: update case Efficient logging with Aether* More instances per transaction Increased contention *R. Johnson, et al: Aether: a scalable approach to logging, VLDB 2010 Shared everything exposes constructive interference

19 Effects of skewed input 4 Islands effectively balance skew and contention Few instances are highly loaded Still few hot instances Contention for hot data Larger instances can balance load

OLTP systems on Hardware Islands Shared-everything: stable, but non-optimal Shared-nothing: fast, but sensitive to workload OLTP Islands: a robust, middle-ground –Runs on close cores –Small instances limits contention between threads –Few instances simplify partitioning Future work: –Automatically choose and setup optimal configuration –Dynamically adjust to workload changes Thank you! 20

OLTP on Hardware Islands Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki Data-Intensive Application and Systems Lab,

Similar presentations

Presentation on theme: "OLTP on Hardware Islands Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki Data-Intensive Application and Systems Lab,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

OLTP on Hardware Islands Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki Data-Intensive Application and Systems Lab,

Similar presentations

Presentation on theme: "OLTP on Hardware Islands Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki Data-Intensive Application and Systems Lab,"— Presentation transcript:

Similar presentations

About project

Feedback