Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experiences with a SAS Grid Ray Lindsay ATO ACT SAS Users Group 21 May 2015 1.

Similar presentations


Presentation on theme: "Experiences with a SAS Grid Ray Lindsay ATO ACT SAS Users Group 21 May 2015 1."— Presentation transcript:

1 Experiences with a SAS Grid Ray Lindsay ATO ACT SAS Users Group 21 May 2015 1

2 Background ATO has had a SAS licence for its Analytics capability going back to early 2000s Initially a pair of Windows servers running 9.1.3 – Included Enterprise Miner Switched to Linux in about 2008, initially Debian, now uBuntu – Red Hat install failed in 2008 – Debian and uBuntu not officially supported but everything worked 2

3 Linux environment Set of initially 5 Linux servers running 9.1, then 9.2, 9.3, 9.4 But also running many other Analytics capabilities – R, python etc. Plusses and minuses – Contention for resources – Camp ‘A’ vs camp ‘B’ 3

4 Advantage in staying within Linux Although there are some subtle differences most Linux/Unix commands are the same on RedHat compared with uBuntu But a much reduced set of commands on RH 4

5 SAS Grid Settled on a 42 core grid, with 2 metadata servers and 3 compute servers 24 compute cores (3 x 8) Delivered Dec 2014 All users migrated 5

6 Overall architecture 6

7 Compute nodes 7

8 Logical separation Discovery and Production – But same physical hardware 75% and 25% of disk space Different queues with different priorities Most of our work is within the ‘Discovery’ environment – only 2 queues 8

9 Speed of nodes 3.3 GHz versus 2.4 GHz on older hardware Compute nodes have 128 Gb and Metadata 32 Gb of memory – This is notably less memory than is needed/specified for our R machines, typically 750 Gb 9

10 Notes on Moore’s law In conventional computers, number of transistors doubling every two years – Valid since 1965 – http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumbe r=4567410 argues that will reach limit in 2036 http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumbe r=4567410 – Others think sooner But also note that chip speeds have not increased as rapidly in recent years. Hence need for explicit and implicit parallelisation 10

11 Grid Workload Distribution 11 http://support.sas.com/rnd/scalability/grid/S GF09_SASHA.pdf http://support.sas.com/rnd/scalability/grid/S GF09_SASHA.pdf

12 Notes on a supercomputer Beacon Project, https://www.nics.tennessee.edu/beacon https://www.nics.tennessee.edu/beacon 768 conventional cores, 11520 accelerator cores 210 TeraFLOPS, 2.499 GigaFLOPS/Watt 12 Tb of system memory 1.5 Tb coprocessor memory 73 Tb of SSD storage 12

13 Our Grid – run time in seconds AnetSAS Grid HPForest16120 HPForest14916 GradientBoost240141 GradientBoost14984 DecisionTree3218 DecisionTree2414 Variable Selection136.4 Variable Selection308.9 Data processingUp to 15 times faster 13

14 Shared data Shared file system /sasdata accessible to all Each machine has its own copy of sas installed. – Only compute nodes have SAS/Access for Teradata installed Also shared work area 14

15 SAS software As before – Base, – Stat, – ETS, – Graph, – Access for Teradata and ODBC – Enterprise Miner, High Performance Suite Plus: IML, Add-in for MS Office 15

16 Clients Enterprise Guide 6.1 SAS Studio IML Studio Personal Login Manager Management Console 16

17 Parallelisation of code Not for faint hearted – at this stage Only really useful for regular long-running tasks Task A and Task B do not depend on each other Task C depends on both A and B – Two stage models 17

18 Proc SCAPROC Can be used to identify task dependency within a large program Requires the program to be run in order to output a new vectorised program Same functionality exists within Enterprise Guide – Analyse program for Grid Computing 18

19 SCAPROC continued Nearly all the code is marked ‘no changes below here’ so original code must be kept in case changes needed Our efforts on actual problems so far not successful Have had to sanitise code and logs and send to Tech Support 19

20 What is still missing Xwindows environment, with tools such as file browser, version control software etc. Interface to R – Enterprise Miner and IML both have interfaces to Open Source R – (EM uses IML in fact) – Also IML/Studio 20

21 Disclaimer While based on my experiences using SAS at the ATO all opinions expressed are my own Ray Lindsay, ray.lindsay@ato.gov.auray.lindsay@ato.gov.au 21

22 Questions 22


Download ppt "Experiences with a SAS Grid Ray Lindsay ATO ACT SAS Users Group 21 May 2015 1."

Similar presentations


Ads by Google