Download presentation
Presentation is loading. Please wait.
1
Overview of Grids, Services Oriented Architectures and Science Portals Overview of Grids, Services Oriented Architectures and Science Portals Sriram Krishnan, PhD sriram@sdsc.edu
2
Outline What is Grid Computing? What are Services Oriented Architectures? What are Science Portals? How are all the pieces tied together? Some case studies
3
Cluster Computing Independent computers combined into a unified system through software and networking Typical Setup –Collection of commodity computers (PCs) –Using a commodity network (Ethernet) –Typically running open-source operating system (Linux) Interconnect –Gigabit Ethernet (commodity) High Latency Cheap –Myrinet, Infiniband, … (non-commodity) Low Latency OS-bypass Expensive –Programming model is Message Passing History –Network Of Workstations (NOW) pioneered the vision for clusters of commodity processors –Beowulf popularized the notion and made it very affordable
4
© 2008 UC Regents High Performance Computing Cluster Front-end Node Public Ethernet Private Ethernet Network Application Network (Optional) Node Power Distribution (Net addressable units as option)
5
Clusters now Dominate High-End Computing http://www.top500.org/charts/list/29/archtype/
6
Grid Computing “Coordinated resource sharing and problem solving in dynamic multi-institutional virtual organization.” [Foster, Kesselman, Tuecke] –Coordinated - multiple resources working in concert, eg. Disk & CPU, or instruments & database, etc. –Resources - compute cycles, databases, files, application services, instruments. –Problem solving - focus on solving scientific problems –Dynamic - environments that are changing in unpredictable ways –Virtual Organization - resources spanning multiple organizations and administrative domains, security domains, and technical domains
7
Other Terms Cyberinfrastructures –Encompasses advanced scientific computing, as well as a more comprehensive infrastructure for research and education based upon distributed, federated networks of computers, information resources, on-line instruments, and human interfaces (Atkins Report, 2003) eScience –Computationally intensive science that is carried out in highly distributed network environments (e.g. in the context of the U.K. eScience program)
8
Grids are not the same as Clusters! Ian Foster’s 3 point checklist –Resources not subjected to centralized control –Use of standard, open, general-purpose protocols and interfaces –Delivery of non-trivial qualities of service Grids are typically made up of multiple clusters
9
Popular Misconception Misconception: Grids are all about CPU cycles –CPU cycles are just one aspect, others are: Data: For publishing and accessing large collections of data, e.g. Geosciences Network (GEON) Grid Collaboration: For sharing access to instruments (e.g. UCSD TeleScience Grid), and collaboration tools (e.g. Global MMCS at IU)
10
How do you build a “Grid”? Start with raw hardware, Add storage and networks, Mix in scientific datasets, Build collaboratory and visualization tools How do you manage, provision, schedule, authenticate, monitor, program, and access these resources?
11
SETI@Home Uses 1000s of internet connected PCs to help in search for extraterrestrial intelligence When the computer is idle, the software downloads ~ 1/2 MB chunk of data for analysis. Results of analysis sent back to the SETI team, combined with 1000s of other participants Largest distributed computation project in existence –Total CPU time: 2433979.781 years –Users: 5436301 Statistics from 2006
12
NCMIR TeleScience Grid An ability to dynamically link resources together as an ensemble to support the execution of large-scale, resource-intensive, and distributed applications IMAGING INSTRUMENTS COMPUTATIONAL RESOURCES LARGE-SCALE DATABASES DATA ACQUISITION,ANALYSIS ADVANCED VISUALIZATION “Telescience Grid”
13
TeraGrid TeraGrid is a “top-down”, planned Grid PSC Extensible Terascale Facility Members: IU, ORNL, NCSA, PSC, Purdue, SDSC, TACC, ANL, NCAR 280 Tflops of computing capability 30 PB of distributed storage High performance networking between partner sites Linux-based software environment, uniform administration Focus is a national, production Grid
14
PRAGMA Grid Member Institutions 31 institutions in 15 countries/regions (+ 7 in preparation) UZurich Switzerland NECTEC ThaiGrid Thailand UoHyd India MIMOS USM Malaysia CUHK HongKong ASGC NCHC Taiwan HCMUT IOIT-HCM Vietnam AIST OsakaU UTsukuba TITech Japan BII IHPC NGO NTU Singapore MU Australia APAC QUT Australia KISTI Korea JLU China SDSC USA CICESE Mexico UNAM Mexico UCN Chile UChile Chile UUtah USA NCSA USA BU USA ITCR Costa Rica BESTGrid New Zealand CNIC GUCAS China LZU China UPRM Puerto Rico
15
Usability Issues Access to Grid resources is still very complicated –User account creation –Management of credentials (identities) –Installation and deployment of scientific software –Interaction with Grid schedulers –Data management
16
Technical Challenges Security –Grids traverse organizational boundaries Different administration domains have different authentication mechanisms Resources have different use agreements and sharing priorities –Need to provide Single Sign-On (SSO), Authentication, Authorization Resource Management –Resources loosely-coupled Higher network latencies Planned and unplanned disruptions –Requirements Seamless access to Grid resources QoS guarantees for jobs Scheduling/co-scheduling of resources Failure management
17
Technical Challenges Data Management –Data Transfer GridFTP: High-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks –Managing large-scale scientific data across different sites Storage Request Broker (SRB): Shared collections that can be distributed across multiple organizations and heterogeneous storage systems
18
Technical Challenges Interoperability –In the past, different projects used different protocols and APIs Legion, Condor, Globus, SGE, etc –Need to use standard, open mechanisms Current thrust towards the use of Service oriented architectures and Web service technologies for interoperability
19
Service Oriented Architectures (SOAs) “SOA represents a model in which functionality is decomposed into small, distinct units (services), which can be distributed over a network and can be combined together and reused to create applications” - Erl, Thomas (2005). Service- Oriented Architecture: Concepts, Technology, and Design.
20
Benefits of SOAs Reduce complexity by encapsulating the back-end implementation –Service interfaces can be published and used by a number of clients Enable interoperability across systems through the use of open standards –Web services (WSDL, SOAP, XML Schemas) are de facto standards –Lend themselves well to the creation of workflows Support a loosely-coupled model where clients can bind to services at run-time –Enables greater flexibility and fault tolerance
21
What are Web Services? Many different definitions are available IBM (Gottschalk, et al): A Web service is an interface that describes a collection of operations that are network accessible through standardized XML messaging. Microsoft (on MSDN): A programmable application logic accessible using standard Internet protocols. Simply put, a Web service is a network service that provides a programmatic interface to remote clients
22
Web Services: Features Independent of programming language and OS All information required to contact a service is captured by the Web Service Description –Web Services Description Language (WSDL) provides a way to encapsulate an interface definition, data types being used, and the protocol information Web services provide programmatic access to remote clients using standard internet protocols
23
Web Services Lifecycle Service Registry Service Requestor Service Provider Lookup Publish Interact
24
Open Grid Services Architecture A standards-based distributed service system that supports the creation of sophisticated distributed services required in inter-organizational computing environments The standards are described by a set of specifications called the Web Services Resource Framework (WSRF)
25
Open Grid Services Architecture The evolution of the Grid to an architecture based on prior Grid and Web service technologies –Open: Extensibility, Vendor-neutrality, Committed to community standardization Use of WSDL to achieve self-describing, discoverable services & interoperable protocols Support for reliable & secure invocation, lifetime management, notification, policy & credential management, and virtualization
26
Open Grid Services Architecture
27
From Theory to Practice
28
SOAs in eScience Grid and scientific communities have been adopting SOAs over the past few years –Open Grid Services Architecture (OGSA) –Web Services Resource Framework (WSRF) However, in general, most past efforts have focuses on middleware, and not science –For instance, the Globus Toolkit –More recently, there are several efforts to build infrastructures for Services Oriented Science * I. Foster. “Services Oriented Science”. In the Science Magazine, 2005
29
Application-level Services Traditional model: Services for middleware tools, e.g. job launch, data transfer, etc Current trend: “Services Oriented Science” –Scientific applications as first class services –Delegation of middleware management to the services back-end –End-users are presented with science-oriented, and not middleware-oriented interfaces
30
Enabling Multiple User Interfaces Gemstone: http://gemstone.mozdev.orghttp://gemstone.mozdev.org ADT: http://mgltools.scripps.eduhttp://mgltools.scripps.edu GridSphere: http://www.gridsphere.orghttp://www.gridsphere.org Kepler: http://kepler-project.orghttp://kepler-project.org
31
What is a Web Portal? Web portals aggregate information content from diverse sources, and present them in a unified way Traditional Model –Monolithic websites, all information content co-located on central server Current Trend –Information content geographically distributed, and implemented as an SOA –Portals provide a single point of entry, by aggregating geographically distributed resources
32
What is a Web Portal? “ A portal is a web based application that commonly provides personalization, single sign on, content aggregation from different sources and hosts the presentation layer of Information Systems ” (JSR 168) Grid/Science Portals build upon the familiar Web portal model, such as Yahoo or Amazon, to deliver the benefits of Grid computing to virtual communities of users, providing a single access point to Grid services and resources.
33
Portals: Pros & Cons Pros –Single point of entry to diverse information sources –Ubiquitous access to applications (browser based) –No need to install complex software Cons –Limited interaction with local desktop tools –Interfaces may not be rich enough for complex tasks such as visualization –Not very easy to make highly interactive interfaces
34
Portal Technology JSR 168 Portlet API –Similar to Servlet API in providing reusable Web applications –Ratified in August 2003 by vendors including BEA, Sun, IBM, Oracle, Plumtree, etc GridSphere: http://www.gridsphere.orghttp://www.gridsphere.org –JSR 168 Compliant –Used by several projects at UCSD such as GEON, NEES, NBCR, CAMERA
35
What is a Portlet? Unit of composition for a portal - a portal is simply an aggregation of portlets Standardized packaging model to share applications among portal vendors Builds off Servlet API and specification so no major surprises for existing Java portal developers API provides useful methods for storing per user data and configuration settings Can be used as building blocks to aggregate content from disparate information sources
36
Putting it all together: NEES Architecture
37
Case Study: The NBCR SOA Transparent access to distributed resources by grid- enabling biomedical codes and biological and biomedical databases –Researchers should be able harness the computational and data resources without having to worry about the complexity of the back-end infrastructure Enable integration of applications across different scales (e.g. atomic to macro-molecular, to cellular and tissue, and so on) –With the help of commodity workflow tools and Problem Solving Environments (PSEs)
38
Approach Scientific applications wrapped as Web services –Provision of a SOAP API for programmatic access Clients interact with application Web services, instead of Grid resources
39
Security Services (GAMA) NBCR SOA: Big Picture Condor poolSGE Cluster PBS Cluster GlobusDRMAAGlobus Application Services State Mgmt Web PortalsADTKeplerContinuity
40
Scientific SOA: Benefits Applications are installed once, and used by all authorized users –No need to create accounts for all Grid users –Use of standards-based Grid security mechanisms Users are shielded from the complexities of Grid schedulers Data management for multiple concurrent job runs performed automatically by the Web service State management and persistence for long running jobs Accessibility via a multitude of clients
41
Web Portal based Access
42
Scientific Workflows Need for automation of scientific processes –An end-to-end application is typically more than a single application run Must be reproducible and maintainable Should be easy to compose from individual components
43
Molecular Visualization Using the Vision Workflow Toolkit
44
Bioinformatics Workflows Using Kepler
45
Conclusions Grid computing provides coordinated resource sharing and problem solving in dynamic multi-institutional virtual organization Service oriented Architectures (SOA) provide a model in which functionality is decomposed into small, distinct services, which can be distributed over a network and can be combined together and reused to create applications –Grid computing and eScience moving towards SOAs Web portals aggregate information content from diverse sources that are implemented as SOAs, and present them in a unified way –Services can also be accessed via a multitude of other clients
46
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.