Download presentation
Presentation is loading. Please wait.
Published byLynn Houston Modified over 9 years ago
1
Welcome from Optima Systems COSMOS performance improvements Paul Grosvenor Deerfield Beach 2013 Tuesday October 22nd
2
The Problem Lots and lots of data (568Tb largest encountered so far) Even today the traditional researcher works, thinks and reports in 2D Analysis based on assumptions which hide meaning Outdated protocols Federated (composite) database
3
What is COSMOS Largely written in APL Data visualisation tool Top down view of the data lake It has been described as a Thesis generator Currently targeted at US electronic medical records (EMR data) Built in “canned queries” – e.g. survivability
4
COSMOS version 1
6
More Problems Scalability Security Performance Got to be Sexy
7
COSMOS now
8
Some Solutions to the COSMOS Problem Much help from Dyalog – and APL of course Caching enquiries Mapped Files Flash client side interface Syncfusion Special Casing vs generalisation Refactoring
9
drug←23 patients←(23 26 28) (15 16 19 23) (34 35 124) drug=patients 1 0 0 0 0 0 1 0 0 0 A typical example
10
seed←1000?1000 counts←?nubs ⍴ items vec←counts ⍴ ¨ ⊂ seed :For x :In ⍳ 100 a←100=¨vec b←( ⊂ 100)=¨vec c←100 ∘ =¨vec d←100 ⍷ ¨vec e←( ⊂ 100) ⍷ ¨vec f←100 ∘⍷ ¨vec :If ∧ /a ∘ ≡¨b c d e f :Continue :Else ∘ :EndIf :EndFor A simple test
11
vectorsitems100=vec 10 0.2 101000.3 1010000.8 10100005.5 1010000049 101000000706 10 0.2 100101.8 10001017 1000010169 100000101705 10000001017514 [x=nVectors] timings
13
23=¨(21 22 23) (23 23 24 25) (12 13 14 123) 0 0 1 1 1 0 0 0 0 0 0 ( ⊂ 23)=¨(21 22 23) (23 23 24 25) (12 13 14 123) 0 0 1 1 1 0 0 0 0 0 0 23 ∘ =¨(21 22 23) (23 23 24 25) (12 13 14 123) 0 0 1 1 1 0 0 0 0 0 0 23 ⍷ ¨(21 22 23) (23 23 24 25) (12 13 14 123) 0 0 1 1 1 0 0 0 0 0 0 ( ⊂ 23) ⍷ ¨(21 22 23) (23 23 24 25) (12 13 14 123) 0 0 1 1 1 0 0 0 0 0 0 23 ∘⍷ ¨(21 22 23) (23 23 24 25) (12 13 14 123) 0 0 1 1 1 0 0 0 0 0 0 [x f nVectors] timings
14
vectorsitems100=¨vec ( ⊂ 100)=¨vec100 ∘ =¨vec100 ⍷ ¨vec( ⊂ 100) ⍷ ¨vec100 ∘⍷ ¨vec 10 0.30.20.3 0.4 100101.9 2.82.2 3 10001017.617.727.421 30.5 1000010169.9170.6266204.5205.6304.9 10000010184618512905213421553248 100000010184471751127589213422087030768 [x f nVectors] timings
16
vectorsitems100=¨vec ( ⊂ 100)=¨vec100 ∘ =¨vec100 ⍷ ¨vec( ⊂ 100) ⍷ ¨vec100 ∘⍷ ¨vec 10 0.3 0.40.3 0.4 101000.3 0.40.6 0.7 1010000.7 0.93.3 3.4 10100004.34.24.727 1010000053 350 101000000341 344224322532241
17
[x f nVectors] timings
18
23=(21 22 23) (23 23 24 25) (12 13 14 123) 0 0 1 1 1 0 0 0 0 0 0 1=(,23) ∘⍳ ¨(21 22 23) (23 23 24 25) (12 13 14 123) 0 0 1 1 1 0 0 0 0 0 0 [x ⍳ y] Example
19
vectorsitems100=vec x ⍳ y 10 0.20.7 101000.31.4 1010000.89 10100005.584 1010000049569 1010000007066975 10 0.20.7 100101.85.2 1000101742 1000010169418 1000001017054113 1000000101751443347 [x ⍳ y] Example
21
bool←1000000 ⍴ 0 bool[index]←1 int←1000000 ⍴⍳ 10 int[index]←1 Index Assignment
22
indicesbool[index]←1int[index]←1 100.1 1000.2 10001.40.5 10000133.2 10000012731.2 10000001267335
23
Index Assignment
24
bool←items ⍴ 0 1 0 1 bool=0 1 0 1 0 1 0 1 0 1 0 bool<1 1 0 1 0 1 0 1 0 1 0 bool≤0 1 0 1 0 1 0 1 0 1 0 Boolean Operations
25
itemsbool=0bool<1bool≤0 10000 100000 10000.2 10000222 10000016 1000000160 100000001590 Boolean Operations
26
Generalisation or Special Casing Up to 10x speed-up Be aware of your data Caching of previous queries Lots faster Mapped Files Much better memory handling Data shared across processes Up to 1.5x speed-up So What ?
27
Version 1 analysis – 20 million records – 15 minutes (DCF files and integer pointers) Version 2 analysis – 50 million records – 3 minutes (Mapped files and Boolean masks) Version 3 analysis – 150 million records – 45 seconds Latest version - >300 million records – circa 30 seconds n.b. SQL and federated dataset pool – 2 weeks A Case in Point
28
Thank You and Questions Contact us: Optima House, Mill Court, Spindle Way, Crawley, West Sussex RH10 1TT Tel: 01293 562 700 Fax: 01293 562 699 info@optima-systems.co.uk www.optima-systems.co.uk
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.