Presenter: Alexander Kaganov Supervisor: Paul Chow

Name: Presenter: Alexander Kaganov Supervisor: Paul Chow
Uploaded: 2017-07-17T18:29:53+00:00
Duration: PTM15S38
Channel: Alice Wood
Description: Presenter: Alexander Kaganov Supervisor: Paul Chow

Presenter: Alexander Kaganov Supervisor: Paul Chow
Hardware Acceleration of Monte-Carlo Structural Financial Instrument Pricing Using a Gaussian Copula Model Presenter: Alexander Kaganov Supervisor: Paul Chow

Increasing Computational Requirements (1/3)
In recent years the financial industry has seen: 1. Increasing contract/model complexity Every year new models are developed Unavailability of closed-form solution Necessitate Monte-Carlo pricing

2. Increasing portfolio sizes Increase in simple to price instruments Increase in complex derivate security CDO issuance has increased from $157 billion in 2004 to $507 billion in 2007 (>3x)¹ 3xN instruments 3xY time (at least) N instruments Y time ¹ SIFMA

3. Ever-present need to make real-time decisions Market trends can change quickly Instruments traded electronically 1 ms in Latency is Worth $100 M in Stock Trading Business Value (AMD Analyst Day-26 july 2007)

Trends in Financial Monte-Carlo Algorithms
1. Computationally intensive Accuracy Time2 2. Highly repetitive A large portion of the calculation time is spent in a small portion of the code (~90% of the time is spent in ~10% of the code) 3. High degree of coarse and fine-grain parallelism Fine-Grain Coarse-Grain Typical MC Financial simulation

Collateralized Debt Obligation (CDO)

CDO Problem: Solution:
Banks typically hold portfolios with highly volatile assets. Solution: Sell assets to an outside entity (SPV), which combines the different assets together into one collateral pool Repackage the pool as CDO tranches. Sell tranches as form of protection to investors in return for premium payments

CDO Structure/Pricing (1/2)
Each tranche has attachment and detachment points Losses below attachment point → the tranche is unaffected Losses above the detachment point → the tranche becomes inactive Investor premium is paid based on the tranche width minus tranche losses Investors Borrowers Super Senior: 12%-100% Bonds Senior: 6% -12% Loans Collateral Pool (Credit Default Swap) Mezzanine: 3% -6% CDS SPV CDOs Equity: 0% -3% Sponsor (Bank) Tranches

CDO Structure/Pricing (2/2)
Mezzanine Tranche: Detachment (6%) Paid premium based on the full investment Investor Premium Payments Investor Premium Payments Losses 1/3 of the principal investment. Premium paid based on 2/3 of the original investment 4% Tranche Losses Attachment (3%) CDO Tranche Value = Premium Leg – Default Leg S =tranche thickness si= Premium di= Discount factor Li= Tranche loses at time interval i

Li’s Gaussian Copula Model
Monte-Carlo (MC) pricing model For each path: One-Factor Multi-Factor 1. Generate: or 2. Compare: 3. Record losses:

Implementation

Multi-Core Architecture
Three portions: Distributor, Pricing cores, and Collector. Coarse Grain Parallelism: MC paths divided among OFGC cores All cores have the same input data except for random generator seeds Data transfer occurs in parallel to calculations Double Buffering Minimal required data transfer rate with an external processor of: 11.3MBytes/sec 1-Lane PCI express- 250 MBytes/sec Data transfer latency can be hidden

Pricing Core Design Phase 1: Generate Yi
Phase 2: Compare Yi<Φ-1[P(τi<t)]. Record partial losses Phase 3: Combine the partial sums, L(ti)’s. Phase 4: Convert collateral pool losses to tranche losses Phase 5: Accumulate tranche losses

Phase 1 One-Factor Multi-Factor
Four factors accumulated per clock cycle Accumulation is performed in parallel to Phase 2 execution One-Factor Multi-Factor

Phase 2: Compare Yi<Φ-1[P(τi<t)]. Record partial losses Phase 3: Combine the partial sums, L(ti)’s. Phase 4: Convert collateral pool losses to tranche losses Phase 5: Accumulate tranche losses

Phase 2 Compare Yi<Φ-1[P(τi<t)]. Record Losses
Fine-grain parallelism: parallelize over time 8 replicas More replicas → higher speedup (potentially) However, large portions of the hardware become underutilized Pipelined adder latency creates multiple partial sums

Phase 2: Compare Yi<Φ-1[P(τi<t)]. Record partial losses Phase 3: Combine the partial sums, L(ti)’s. Phase 3: Combine the partial sums, L(ti)’s. Phase 4: Convert collateral pool losses to tranche losses Phase 4: Convert collateral pool losses to tranche losses Phase 5: Accumulate tranche losses Phase 5: Accumulate tranche losses

Results

One-Factor Utilization*
Single Precision Double Precision Hybrid “Integer” Flip-Flops 6530 9910 (+51.2%) 6721 (+2.9%) 4906 (-24.9%) LUTs 7052 13325 (+89.0%) 7599 (+7.8%) 5224 (-25.9%) BRAMs 15 31 (+106.7%) (0%) DSP48Es 29 40 (+37.9%) 30 (+3.4%) 7 (-75.9%) Frequency 248.8 190.9 (-23.3%) 244.8 (-1.6%) 268.2 (+7.8%) Average Error (%) 0.37 [1.07] 3.02E-5 [5.27E-5] *Utilization for Virtex 5 SX50T chip

Performance: Benchmarks
Credit rating and number of instruments are based on Dow Jones CDX Notionals obtained from Moody’s, range from $600,000 to $6.6 billion α: uniformly distributed in [0, 1] Recovery rate: Normally distributed, N (0.4,0.15) # of Time Steps: Normally distributed, N (20,10) # of Systemic Factors: varied from 1 to 16 # Based on Data From # of Assets # of Time Steps # of Default Curves 1 CDX.NA.HY 100 15 5 2 CDX.NA.IG 125 35 3 CDX.NA.IG.HVOL 30 19 4 CDX.NA.XO 22 CDX.EM 14 6 CDX.DIVERSIFIED 40 23 7 CDX.NA.HY.BB 37 13 8 CDX.NA.HY.B 46 26 9 Semi-homogenous 400 24

Processor vs. FPGA setup
3.4 GHz Intel Xeon Processor 3GB RAM C++ program 100,000 Monte-Carlo paths Virtex 5 SX50T speed grade -3 Connected to host through PCI express 100,000 Monte-Carlo paths

Performance: One-Factor Single Core Results (1/3)

Single Core Average Acceleration: Double Precision: 10.8 X Single Precision: 14.0 X Single/Double Hybrid: 13.8 X Integer: 15.8 X

Performance: One-Factor Multi-Core Results
Monte-Carlo paths independence allows for a linear speedup as more pricing cores are incorporated. Double Single Single/Double Hybrid Integer Single Core Acceleration 10.8X 14.0X 13.8X 15.8X Maximum # of Instantiations 2 4 5 Multi-Core Acceleration 15.9X 47.2X 47.5X 64.5X

Performance: Multi-Factor Results (1/3)
SFRF is the number of cycles before Phase 1 generates a new Yi TRF is the number of cycles before Phase 2 requires a new Yi Consumer: Phase 2 Ratec: 1/TRF Producer: Phase 1 Ratep: 1/SFRF

Ratec<= Ratep Ratec>Ratep

Ratep>Ratec Ratep= Ratec Ratep<Ratec Average Double Precision 1 Core 13.3X 15.9X 12.0X 13.5X 2 Cores 18.7X 22.5X 16.9X 19.0X Single Precision 16.7X 20.1X 15.1X 17.0X 4 Cores 52.8X 63.4X 47.7X 53.6X Hybrid 16.5X 19.9X 14.9X 16.8X 50.3X 60.4X 45.4X 51.1X Integer 17.7X 22.4X 18.6X 18.9X 5 Cores 67.0X 84.5X 70.1X 71.5X

Summary Presented a hardware architecture for pricing Collateralized Debt Obligations using Li’s models Demonstrating the flexibility of an FPGA in exploiting both coarse- and fine-grain parallelism Established that either a single/double hybrid or “integer” representations could be used to balance resource utilization and accuracy Achieved a maximal acceleration of over 64-fold and 71-fold, for the one-factor and multi-factor models, respectively

Future Work Expand to a multi-FPGA system
Attempt the algorithm on a different accelerator architectures GPU 3. Incorporate into a financial software suite

Thank You (Questions?)

Presenter: Alexander Kaganov Supervisor: Paul Chow

Similar presentations

Presentation on theme: "Presenter: Alexander Kaganov Supervisor: Paul Chow"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presenter: Alexander Kaganov Supervisor: Paul Chow

Similar presentations

Presentation on theme: "Presenter: Alexander Kaganov Supervisor: Paul Chow"— Presentation transcript:

Similar presentations

About project

Feedback