SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Slides:



Advertisements
Similar presentations
The Shapes of Computers Today
Advertisements

Distributed Data Processing
SAS 9.3 Enterprise BI Audit & Performance Package
Oracle Exadata for SAP.
Living with Exadata Presented by: Shaun Dewberry, OS Administrator, RDC Tom de Jongh van Arkel, Database Administrator, RDC Komaran Hansragh, Data Warehouse.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Components and Architecture CS 543 – Data Warehousing.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Next step of e-government.. Importance Foreword Cloud computing  Characteristics  Service  Users  Benefit Challenges in E-government Cloud government.
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
Enterprise Reporting with Reporting Services SQL Server 2005 Donald Farmer Group Program Manager Microsoft Corporation.
SAP on windows server 2012 hyper-v documentation
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
GOVERNMENT SERVICES INTEGRATION INDUSTRY SOLUTION.
TPB Models Development Status Report Presentation to the Travel Forecasting Subcommittee Ron Milone National Capital Region Transportation Planning Board.
Copyright © 2006, SAS Institute Inc. All rights reserved. Enterprise Guide 4.2 : A Primer SHRUG : Spring 2010 Presented by: Josée Ranger-Lacroix SAS Institute.
IS 466 ADVANCED TOPICS IN INFORMATION SYSTEMS LECTURER : NOUF ALMUJALLY 3 – 10 – 2011 College Of Computer Science and Information, Information Systems.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Remote OMNeT++ v2.0 Introduction What is Remote OMNeT++? Remote environment for OMNeT++ Remote simulation execution Remote data storage.
DBMS ARCHITECTURE. What is a Database? Definition: A database is a well organized collection of data that are related in a meaningful way which can be.
Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011.
SAS Grid at HC and PHAC June 12, Agenda  To Grid or Not To Grid  The Approach  The Metrics  Lessons Learned  Looking Forward.
Grid The Evolution from Parallel Processing to Modern Day Computing Greg McLean Vecdet Mehmet-Ali.
Chapter 1 Introduction to SAS ® Enterprise Guide ®
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Active Directory Harikrishnan V G 18 March Presentation titlePage 2 Agenda ► Introduction – Active Directory ► Directory Service ► Benefits of Active.
Mainframe (Host) - Communications - User Interface - Business Logic - DBMS - Operating System - Storage (DB Files) Terminal (Display/Keyboard) Terminal.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
“Come out of the desert of ignorance to the OASUS of knowledge” Grid Computing with SAS ® Foundation Statistics Canada SAS Technology Centre.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
SAS Grid Department of Finance Canada. Agenda SAS in the Department of Finance Before the implementation of SAS Grid Implementation of SAS Grid Effect.
Copyright © 2004, SAS Institute Inc. All rights reserved. SAS Stored Processes An analyst’s perspective Sylvain Tremblay SAS Canada 24 February 2006.
Introduction to Microsoft Windows 2000 Welcome to Chapter 1 Windows 2000 Server.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.
with SAS® Data Integration Studio
© 2009 IBM Corporation Maximize Cost Savings While Improving Visibility Into Lines of Business Wendy Tam, CDC Product Marketing Manager
Easier Platform Administration using SAS 9.4 Grid Option Sets SAS New South Wales User Group - Nov 2015 Andrew Howell ANJ Solutions Pty Ltd.
Application Software System Software.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS ® Using the SAS Grid.
Managing and Monitoring the Microsoft Application Platform Damir Bersinic Ruth Morton IT Pro Advisor Microsoft Canada
1 The EDIT System, Overview European Commission – Eurostat.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
Copyright © 2012, SAS Institute Inc. All rights reserved. SAS ® GRID AT PHAC SAS OTTAWA PLATFORM USERS SOCIETY, NOVEMBER 2012.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
QlikView Architecture Overview
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
Red Hat Enterprise Linux Presenter name Title, Red Hat Date.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
1 Copyright © 2007, Oracle. All rights reserved. Installing and Setting Up the Warehouse Builder Environment.
Database 12.2 and Oracle Enterprise Manager 13c Liana LUPSA.
Opening Session.
Overview – SOE PatchTT November 2015.
Open Source distributed document DB for an enterprise
A Statistical Programming Language
Created by Kamila zhakupova
Simulation use cases for T2 in ALICE
Chapter 1: Introduction
An Overview of the Computer System
Name Title Group Microsoft Corporation
Semiconductor Manufacturing (and other stuff) with Condor
Oracle Architecture Overview
Storage Trends: DoITT Enterprise Storage
Backup Monitoring – EMC NetWorker
Presentation transcript:

SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014

Agenda SAS at Statistics Canada What is the StatCan SAS Grid? Migration and Use Cases Lessons Learned Looking Forward

Statistics Canada Canada’s central statistical agency. Mandate to collect, compile, analyse and publish statistical information on the economic, social and general conditions of the country and its citizens. Mandate is fulfilled under the authority of the Statistics Act which prohibits the disclosure of identifiable information. Crunching numbers is our business!

ProcessingAnalysis Where? CollectionDissemination Input Database Clean Microdata Output Database Survey Lifecycle

What? Data processing Application development Query and reporting Statistical analysis Exploratory data analysis “Specialised” computations (time-series, optimization, matrix operations, etc.)

How? SAS/SHARE SAS/STAT SAS/TOOLKIT Integration Technologies Enterprise Guide Enterprise Platform DI Server JMP Grid Manager Base SAS SAS/ACCESS SAS/AF SAS/CONNECT SAS/ETS SAS/GRAPH SAS/IML SAS/Intrnet SAS/OR

Some Numbers! 2,500,000 SAS jobs run every year 4,000 PC-SAS installations 2,500 active SAS users 450 production applications 80 Windows servers 25 Unix servers 20 platforms 3 versions of SAS: 9.1.3, 9.2 and grid!

More than 2500 Users! *

What is the StatCan SAS Grid? A complete SAS Platform deployment utilizing the SAS Grid Manager 9.4. Available to the entire Agency via a Hosting service. Part of the Network Transformation Initiative (NTI) 3 objectives: – Consolidate 100+ SAS servers (Phase 1) – Migrate processing from workstations to the grid (Phase 2) – Enable new computing initiatives/possibilities (Phase 1 & 2)

StatCan Grid Milestones : Several “home-made” grids developed over the years using Base SAS and SAS/CONNECT 2011: first test grid based on Grid Manager 2013: enhanced test grid released May 2014: production grid released for IBSP (V1) Q3 2014: full production grid will be released for general availability (V2)

A Few Impressive Results while Testing the Grid Capital stock calculation: 89% improvement on elapsed time (2005) Audit module in G-Confid: Over 90% improvement on elapsed time (2009) NHS-Tax Linkage project: from 59 hours to 50 minutes using G-Link V3 (2012) Simulations with CCHS data: hundreds of simulations run in a few hours compared to days on a workstation. (2013)

Why the StatCan Grid? Reduced costs $ $ $ Process Higher Volume of Data. Process data in less time. Scalable Secure Centrally managed Usage metrics

Implementation Highlights (phase 1) Shared File System Clustered 2-tier storage 80 TB SAS Metadata Server Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 Node9 Node10 Node11 Node12 Node13 Node14 Node15 Node16 Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 Node9 Node10 Node11 Node12 Node13 Node14 Node15 Node16 16 cores 256GB ram Intel X86_64 Grid Nodes SAS Platform Clients Web Clients and Services SAS Mid-Tier

The Transparent Grid One of the objectives of the grid is to make the user experience as transparent as possible. Single sign-on Samba shares Helpers (Macros, Stored Processes)

SAS Grid Data Tier Data Files (must “live” on the CFS) – Flat files / SAS files – PC files (Excel spreadsheets, etc.) – Exposed to Windows via SAMBA Databases: – SQL*Server – ORACLE – Sybase

Migration Requirements The StatCan SAS grid is a “pure” SAS compute service! Platform clients only such as Enterprise Guide No host commands available SAS/Access to PC File formats with limitations No direct access to Windows Shares SAS 9.4 and SAS 9.3M1 supported

Use Cases Use Case #1: Ad hoc users Users who need to process/analyze data “on-demand” Large number of concurrent users Use Case #2: Batch Jobs SAS Jobs that run unattended. A new mainframe!!! Use Case #3: Parallel Processing Jobs broken into smaller tasks and dispatched to the grid. Myth: a SAS program will execute in parallel with no modifications!

Lessons Learned A SAS grid project is an also infrastructure project. Linux offers some challenges to integrate with a Windows. Managing users expectations is critical. Resistance to change must be managed. Start simple and build on success. Be proactive: plan/think about your next SAS environment.

Looking Forward Phase 1: consolidate 80 servers over the next 2 years. Phase 2: Introduce a new grid at SSC Data Centre. Complete servers consolidation started in Phase1. Migrate workstation processing to the grid. Are there opportunities to collaborate with other departments?

Thank You! Yves DeGuire Section Chief System Engineering Division Statistics Canada R.-H.-Coats Building 14 A 100, Tunney’s Pasture driveway Ottawa, Ont., K1A 0T6 (613)