Download presentation
Presentation is loading. Please wait.
Published byPierce Shaw Modified over 9 years ago
1
Copyright © 2004, SAS Institute Inc. All rights reserved. Wayne Embry Technical Account Manager March 17, 2005 Delivering Enterprise Value with SAS ® 9 Architecture: GRID COMPUTING and SAS
2
Copyright © 2004, SAS Institute Inc. All rights reserved. 2 Agenda Defining Grid Why is Grid Computing Important? Who’s Interested in Grid and Why? SAS Technology Behind Grid Packaging Architecture Supported Platforms Summary
3
Copyright © 2004, SAS Institute Inc. All rights reserved. 3 Defining Grid in the IT World… According to Gartner, "a grid is a collection of resources owned by multiple organizations that is coordinated to allow them to solve a common problem." Gartner (and Wayne) further define three commonly recognized forms of grid: * Computing Grid - multiple computers to solve one application problem Data Grid - multiple storage systems to host one very large data set Collaboration Grid - multiple collaboration systems for collaborating on a common issue. Other: Utility Grid – Resources are chosen for you; ASPs
4
Copyright © 2004, SAS Institute Inc. All rights reserved. 4 Why is Grid Computing Important to SAS? SAS believes that 2005 will be the year customers begin to view grid computing as a practical solution to their business problems, so the timing is right for it to be an important focus. Our ability to speak to our Grid capabilities will further positions our solutions and toolsets as enterprise class, and substantially differentiates our offerings from those of competitors. We will also be able to build additional enterprise credibility. Proof: A recent IDC report projected that the grid computing market may exceed $12 billion by 2007. Gartner reported that 56% of large IT customers had not been contacted by a single vendor regarding Grid.
5
Copyright © 2004, SAS Institute Inc. All rights reserved. 5 Oracle
6
Copyright © 2004, SAS Institute Inc. All rights reserved. 6 Why is Grid Computing Important? Grid computing leverages under-utilized and un-tapped computing resources to drastically reduce processing times which in turn saves money. Grid computing allows organizations to further leverage their current IT investment by harnessing the collective processing power of existing computers to more rapidly solve complex problems and to run increasingly data- intensive applications. IT spending continues to be substantially restricted while demands on the IT department continue to increase. Grid computing is a strategic alternative to resolve this dilemma, providing one of the biggest “bangs-for-the buck” in IT.
7
Copyright © 2004, SAS Institute Inc. All rights reserved. 7 Reality Check: Who’s Interested and Why?… Frugal Phyllis Title: CIO of a business unit of a large corporation Report to: CEO of the business unit Computing Skills: Advanced Top ETL-related issues: 1. Faced with processing ever-increasing volumes of data 2. Challenged to provide useable results in ever-shorter time-frames 3. Short on funds, especially for additional hardware
8
Copyright © 2004, SAS Institute Inc. All rights reserved. 8 Reality Check: Who’s Interested and Why?… Al the Architect Title: Head Information Architect and “right-hand” to CIO Report to: CIO of a business unit of a large corporation Computing Skills: Expert Top ETL-related issues: 1. Charged with building fast and flexible architectures without spending much money 2. Needs to find ways to cope with more jobs, and larger jobs, all being squeezed into the same batch window 3.Would be nice if his solutions to the above could inspire the Enterprise as a whole, or at least integrate with their existing tools
9
Copyright © 2004, SAS Institute Inc. All rights reserved. 9 Reality Check: Who’s Interested and Why?… Silo Sandy (somewhat similar to Frugal Phyllis) Title: CEO (or Director) of a business unit of a large corporation Report to: CEO of the Enterprise Computing Skills: Average Top ETL-related issues: 1.Trying to build own information organization because she is not satisfied with corporate IT 2.Needs to do so using only existing hardware resources 3.Needs solutions running quickly and with reliability and maintainability
10
Copyright © 2004, SAS Institute Inc. All rights reserved. 10 Reality Check: Who’s Interested and Why?… And a user persona who influences the above buyers: Forever Fred Title: Business Analyst (a.k.a Power User) Report to: Director or Sr. Manager of a business unit of a large corporation Computing Skills: Power User Top ETL-related issues: 1.Takes too long to load data for his job, so he misses batch windows 2.Constantly being admonished for monopolizing system resources 3.“Beaten up” for not delivering reports fast enough
11
Copyright © 2004, SAS Institute Inc. All rights reserved. 11 Types of Applications Suitable for Grid Long running jobs (batch window) Many repetitive iterations of a fundamental task Simulation BY GROUP processing Parallelism Independent tasks against large data sources Scoring, Risk analysis Pipeline parallelism (Piping) Both
12
Copyright © 2004, SAS Institute Inc. All rights reserved. 12 RFID Data Collector RFID Data Collector RFID Data Collector RFID Data Collector REALTIME SAP/R3 REALTIME DB/2 ORACLE SYBASE RFID COMPLEXITY
13
Copyright © 2004, SAS Institute Inc. All rights reserved. 13 SAS Technology Behind Grid – Today… Analytics Scenario Base, Connect,…. Base, Connect,… Base, Connect,…. … n Connect Client %Distribute SAS
14
Copyright © 2004, SAS Institute Inc. All rights reserved. 14 SAS Technology Behind Grid – Today… Data Integration Scenario ETL Studio SAS MC Schedule Manager SAS Servers Base Connect,…. Base, Connect, ….. … n Metadata Server Workspace Server Connect Client LSF Job Scheduler
15
Copyright © 2004, SAS Institute Inc. All rights reserved. 15 SAS Technology Behind Grid – 2005… Improving our Capabilities Base, Connect,..... LSF Base, Connect,…… LSF Base, Connect, …… LSF … n Connect Client LSF SAS Server
16
Copyright © 2004, SAS Institute Inc. All rights reserved. 16 SAS Grid –2005… ETL Studio SAS MC Schedule Manager Grid Manager - New SAS Servers Metadata Server Workspace Server Connect Client LSF Job Scheduler Base, Connect,… LSF Base, Connect,…. LSF Base, Connect,.… LSF … n Enterprise Miner
17
Copyright © 2004, SAS Institute Inc. All rights reserved. 17 SAS 9 Packaging… Head Start – SAS\Connect is already included in ETL Server and EETL Server Any solution including ETL Server
18
Copyright © 2004, SAS Institute Inc. All rights reserved. 18 Supported Platforms… Good News – Any platform that supports Base and Connect Heterogeneous architecture
19
Copyright © 2004, SAS Institute Inc. All rights reserved. 19 Architecture Guidelines There are guidelines to keep in mind when architecting SAS Grid environments: Permanent data SASWORK Data Accessibility - Where it is and how each of the machines on the grid are attached to it (NFS, SAN) greatly affects performance. For help architecting SAS Grids, please call SAS Account Representative
20
Copyright © 2004, SAS Institute Inc. All rights reserved. 20 Example Grid Job 1 ETL Studio SAS Server Workspace Server -Base Connect L8364 - 1 CPU (1.6 GHz; 2 GB RAM) Base, Connect Data Quality Demo0505 – 2 CPU (3.06 GHz; 4 GB RAM) Base, Connect Data Quality Demo0507 – 2 CPU (3.06 GHz; 4 GB RAM) Customer Orders_grid Order_item_grid
21
Copyright © 2004, SAS Institute Inc. All rights reserved. 21 Example Grid Job 2 ETL Studio SAS Server Workspace Server -Base Connect L8364 - 1 CPU (1.6 GHz; 2 GB RAM) Base, Connect Data Quality Demo0505 – 2 CPU (3.06 GHz; 4 GB RAM) Base, Connect Data Quality Demo0507 – 2 CPU (3.06 GHz; 4 GB RAM) Orders_grid Order_item_grid LXYZ SASWORK Customer
22
Copyright © 2004, SAS Institute Inc. All rights reserved. 22 An Example - The Scenario… Single Platform Job - Local_Complicated Run locally on my laptop in sequential order Source Data – 3 local SAS tables: –Customer: 16 Mb; 89,954 rows; 12 columns –Orders_grid: 214 Mb; 5,710,014 rows; 8 columns –Order_item_grid: 315 Mb; 4,487,718 rows; 7 columns Target – 1 local SAS table with 15 columns
23
Copyright © 2004, SAS Institute Inc. All rights reserved. 23 Local_Complicated Job ETL Studio SAS Server Workspace Server -Data Quality -Base L8364 - 1 CPU (1.6 GHz; 2GB RAM) Order_item_grid Orders_grid Customer Elapsed Wall Clock Time: 4 minutes
24
Copyright © 2004, SAS Institute Inc. All rights reserved. 24 LOCAL ETL PROCESS
25
Copyright © 2004, SAS Institute Inc. All rights reserved. 25 LOCAL JOB STATS
26
Copyright © 2004, SAS Institute Inc. All rights reserved. 26 Leveraging the Grid - The Scenario… Enable Job to Run on a SAS Grid - Remote_Complicated Grid Strategies: Independent parallelism – Independent data and processes Pipeline parallelism Source Data: 2 remote SAS tables: –Orders_grid: 214 Mb; 5,710,014 rows; 8 columns –Order_item_grid: 315 Mb; 4,487,718 rows; 7 columns 1 local SAS table: –Customer: 16 Mb; 89,954 rows; 12 columns Target – 1 local SAS table with 15 columns
27
Copyright © 2004, SAS Institute Inc. All rights reserved. 27 Remote_Complicated Job ETL Studio SAS Server Workspace Server -Base Connect L7875 - 1 CPU (1.6 GHz; 1 GB RAM) Base, Connect Data Quality Demo0505 – 2 CPU (3.06 GHz; 4 GB RAM) Base, Connect Data Quality Demo0507– 2 CPU (3.06 GHz; 4 GB RAM) Customer Orders_grid Order_item_grid Elapsed Wall Clock Time: 30 seconds 90% improvement!
28
Copyright © 2004, SAS Institute Inc. All rights reserved. 28 Performance Issues Competition answer to performance issues Buy a bigger server (i.e., 32 way to a 64 way) Increase the number of RDMS instances (i.e., Oracle) More $$$$ SAS’ answer Grid computing leverages under-utilized and un-tapped heterogeneous computing resources to drastically reduce processing times Grid computing allows organizations to further leverage their current IT investment by harnessing the collective processing power of existing computers Save $$$$
29
Copyright © 2004, SAS Institute Inc. All rights reserved. 29 Architecture Guidelines There are guidelines to keep in mind when architecting SAS Grid environments: Permanent data SASWORK Data Accessibility - Where it is and how each of the machines on the grid are attached to it (NFS, SAN) greatly affects performance.
30
Copyright © 2004, SAS Institute Inc. All rights reserved. 30 How is it Set Up? The SAS Technology Behind the Scenario… Components and Considerations: Base, SAS/Connect ETL Studio Metadata Server Data Quality
31
Copyright © 2004, SAS Institute Inc. All rights reserved. 31 GRID ETL JOB
32
Copyright © 2004, SAS Institute Inc. All rights reserved. 32 GRID STATS
33
Copyright © 2004, SAS Institute Inc. All rights reserved. 33 Metadata: SAS MC
34
Copyright © 2004, SAS Institute Inc. All rights reserved. 34 Connect Servers and Spawners
35
Copyright © 2004, SAS Institute Inc. All rights reserved. 35 Connect Servers and Spawners
36
Copyright © 2004, SAS Institute Inc. All rights reserved. 36 Libraries
37
Copyright © 2004, SAS Institute Inc. All rights reserved. 37 Logins
38
Copyright © 2004, SAS Institute Inc. All rights reserved. 38 Closing Thoughts… Mileage may vary Next step in evolving the SAS9 Platform Enterprise credibility Competition Buy more servers and license more DBMS instances These 50 jobs will use this server, these 30 jobs run on this server…. Manageability BI – Stored processes EMiner and LSF Integration ITMS – ITRM will have a generic collector to collect LSF performance data
39
Copyright © 2004, SAS Institute Inc. All rights reserved. 39 Collateral… White Papers SUGI29 - http://support.sas.com/rnd/scalability/papers/sugi29_grid.pdf http://support.sas.com/rnd/scalability/papers/sugi29_grid.pdf Connect Syntax - http://support.sas.com/rnd/scalability/papers/mpconnect0401.pdf http://support.sas.com/rnd/scalability/papers/mpconnect0401.pdf %DISTRIBUTE – http://support.sas.com/rnd/scalability/papers/distConnect0401.pdf http://support.sas.com/rnd/scalability/papers/distConnect0401.pdf Web Site http://support.sas.com/rnd/scalability/grid/index.html http://support.sas.com/rnd/scalability/grid/index.html Customer Reference Stories http://support.sas.com/rnd/scalability/grid/gridcust.html http://support.sas.com/rnd/scalability/grid/gridcust.html
40
Copyright © 2004, SAS Institute Inc. All rights reserved. 40 Copyright © 2003, SAS Institute Inc. All rights reserved. 40 Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.