Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Grid Computing

Similar presentations


Presentation on theme: "Introduction to Grid Computing"— Presentation transcript:

1 Introduction to Grid Computing
Chapter 8 Introduction to Grid Computing

2 Objectives Grid computing Software and middleware for the grid
Present and future grid applications

3 Grid Computing Definition:
“Grid computing is distributed computing performed transparently across multiple administrative domains” (P.V. Coveney). Distributed high-performance computing. Large geographically distributed networks of computers. Provides a means for using distributed resources to solve large problems. “What the Web did for communication, grids endeavor to do for computation.”

4 Grid Computing (2) Very general computing applications: Transparency:
Database searches and queries. Scientific applications. Weather prediction. Cryptography. Business applications. Transparency: Distributing computational resources among multiple and widely separated sources and users is a difficult algorithmic problem.

5 Characteristics of Grids
Grids coordinate resources that are not subject to centralized control. Grids use standard, open, general-purpose protocols and interfaces. Grids deliver high qualities of service.

6 Grid vs. Parallel Computing
Beowulf cluster SHARCNet – University of Western Ontario compneuro.uwaterloo.ca/beowulf.html

7 Grid vs. Parallel Computing (2)
Grid computing is distinguished from parallel computing on one or more multiprocessors: Parallel computing: locally “clustered” machines or large supercomputers. Grid computing: computation across different administrative domains.

8 Two Tenets of Grid Computing
Virtualization Individual resources, such as computers, disks, information sources, and applications) are pooled together and made available by abstractions. Overcomes “hard-coded” connections between providers and consumers of resources. Provisioning When a request for a resource is made, a specific resource is identified to fulfill the request. The system determines how to meet the need, and optimizes system performance.

9 Characteristics of Grid Applications
Data acquired by scientific instruments. Data are stored in archives on separate, perhaps geographically-separated sites. Data are managed by teams belonging to different organizations. Large quantities of data (tera- or petabytes) are collected. Software used to analyze and summarize the raw data.

10 The Importance of Standardization
Without standardization, grid computing practitioners would need to acquire accounts at many different computer centers, managed by different organizations. Different security and authentication protocols and accounting practices would have to be applied. Very heterogeneous software environment.

11 Objectives Grid computing Software and middleware for the grid
Grid applications

12 Importance of Middleware
Middleware eases grid users’ experience and provides them with levels of abstraction. Middleware extends the Web’s information and database management capabilities. Allowing remote deployment of computational resources.

13 Globus Toolkit Most widely-used middleware for grids.
Open source toolkit for building computing grids. Provides a standard platform upon which other services build. Provides directory services, security, and resource management.

14 Objectives Grid computing Software and middleware for the grid
Grid applications

15 CPU Scavenging Unused PC resources worldwide are harnessed. Also known as shared computing. CPU-scavenging systems gain and lose machines at unpredictable times as users interact with their computers, or as network connections fail. CPU-scavengers can migrate jobs to allow smooth operation.

16 SETI@home Search for Extraterrestrial Intelligence
Goal: to analyze vast amounts of data from the Arecibo radio telescope. Initiated by the Space Sciences Laboratory at the University of California, Berkeley

17 SETI@home (2) Uses a free screen saver, available to the public.
When activated, the screensaver program downloads time sequences of radio telescope data and searches them for radio sources. has more than 5 million participants. Inspiration for other scientific applications in need of large computing resources.

18 (3) Main purpose: A program downloads and analyzes radio telescope data. Data is recorded at the Arecibo Observatory in Puerto Rico. The data is sent to Berkeley, where it is processed into units of 107 seconds of data. These work units are sent from the server over the Internet to participating computers around the world for analysis.

19 (4) The analysis software can search for signals with about one-tenth the strength of those sought in previous surveys, because it makes use of a very computationally intensive algorithm. Data is merged into a database using computers in Berkeley. Various pattern-detection algorithms are applied to search for the most interesting signals.

20 User Client

21 BOINC Berkeley Open Infrastructure for Network Computing.
Funded by the National Science Foundation. Used in the SETI project. Client-server architecture: Client – Used by the computer supplying resources for one or more BOINC projects. Performs the computations. Server – System software, such as database services and project’s web site.

22 Remote Procedure Calls
Mechanism by which the server communicates with the client in BOINC. Similar to a regular function call or method invocation, but one computer executes the function on another computer.

23 Remote Procedure Calls - Examples
Return screensaver mode: get_screensaver_mode(int& status) Get a list of results for jobs in progress: get_results(RESULTS&) Get a list of file transfers in progress: get_file_transfers(FILE_TRANSFERS&) Get the client’s current state: get_state(CC_STATE&)

24 Human Proteome Folding Project (HPFP)
Goal: to predict the structure of human proteins. Devised at the Institute for Systems Biology, University of Washington. Produces the likely structures for each of the proteins using a set of predefined rules. Improved knowledge of human proteins is important in developing new therapies. Officially completed on July 18, 2006. Second stage now underway.

25 Human Proteome Folding Project
WCG desktop console - users monitor progress on protein-folding project. Typical desktop screensaver setup for HPFP

26 Business Applications
Business application grid (BAG). Major focus is using existing grid computing technologies to unite all of an organizations desktops, workstations, servers, printers, peripherals, etc., to perform useful work during idle time. Usually focused on well-defined problems: Calculating performance averages for a mutual fund. Reducing processing time in wealth management systems. Database applications.

27 Business Applications (2)
A large financial services company uses specialized grid software for new corporate banking applications. Oracle Corporation offers a grid database system.

28 Business Grid Middleware
Provides an IT-level infrastructure to support business applications. Middleware provides services for composing, submitting, and managing business applications. Business functions (e.g. credit card authorization and shipping-and-handling services) are not provided. Globus Toolkit 4 makes it easier to build an application that taps into existing distributed computing resources (e.g. servers, storages, databases).

29 Conclusions Grid computing is an “enabling technology” that is rapidly gaining popularity in: Science. Medicine. Engineering. Business and financial applications. Many software vendors offer grid computing toolkits and middleware. In 2004, 20% of companies were seeking grid computing solutions (Evans Data Corp.).

30 Benefits of Grid Computing
Collaboration. Increased productivity. Efficient use of resources and storage. Cost-effectiveness. Heterogeneous environments. Failure tolerance. Transparency.

31 Challenges Lack of control over resources, administration. Security.
Middleware. Network failures. Cultural issues.

32 Thank you.

33 Open grid services architecture
OGSA – standard for grid-based applications. Framework for meeting grid requirements. Application specific grid services application specific e.g. astronomy, biomedical informatics, high-energy physics interfaces OGSA services: directory, management, security standard OGSI services: naming, service data (metadata) grid service interfaces e.g. GridService service creation and deletion, fault model, service groups Factory web services Open-grid services infrastructure

34 Globus toolkit Other non-GT3 services can run on top of the GT3 architecture. Replica management – keeps track of subsets of large data sets that are being worked on. Job management – checking status of jobs, pausing, stopping if necessary. Index services – helping to locate grid resources to meet specific needs. Reliable file transfer service (RFT) – performs large file transfers from a client to a grid service. Restricts access to grid services so that only authorized clients can use them. Provides another layer of security on top of firewalls. Low-level functions

35 Other grid tools Resource management: Information Services:
Grid Resource Allocation and Management Protocol (GRAM) Information Services: Monitoring and Discovery Service (MDS) Security Services: Grid Security Infrastructure (GSI) Data Movement and Management: Global Access to Secondary Storage (GASS) and GridFTP

36 World-Wide Telescope (2002)
Goal: deployment of data resources shared by astronomers. Data: Archives of observations over a particular period of time, part of the EM spectrum, and area of the sky. Observations collected at different sites around the world. Data on same celestial objects are combined over different periods of time and different parts of the EM spectrum.

37 World-Wide Telescope (2)
Data archives ( terabyte) managed locally by the teams that collect the data. As data is acquired, it is analyzed and stored as transformed data so that it can be used by remote astronomy sites. Librarian role of scientists. Metatdata is required to describe: Time the data was collected. Part of the sky observed. Instruments used.

38 WCG ongoing projects FightAIDS@Home Human Proteome Folding Phase 2
Launched by WCG in 2005. Each computer processes one potential drug molecule and tests how well it would dock with HIV protease, inhibiting viral reproduction. Human Proteome Folding Phase 2 Released in 2006. Extension of HPF1, focusing on human-secreted proteins. Better protein models, but more computationally intensive.

39 World Community Grid (WCG)
Goal: to create the world's largest public computing grid for humanitarian concerns. Administered and funded by IBM. Platforms: Windows, Linux, and Mac OS X. Uses the idle time of Internet-connected desktop computers. The agent works as a screen saver (like only using a computer's resources when it would otherwise be idle, and returning resources to the users when requested. Projects are approved by an advisory board: representatives of major research institutions, universities, UN, WHO.

40 WCG – Smallpox research
Completed project. WCG largely began due to the success of this project in shaving years off research time. Analysis of therapeutic candidates to fight the small virus. About 35 million potential drug molecules were screened against several smallpox proteins, resulting in 44 new potential treatments.

41 WCG Ongoing projects (2)
Help Defeat Cancer (2006) Processes large numbers of tissue samples using tissue microarrays. Genome Comparison (2006) Compares gene sequences of different organisms to find similarities. Goal: determining the purpose of specific gene sequences in particular functions by comparing it with similar sequences with known functions in another organism.

42 Other grid projects Description of the project Reference
1. Aircraft engine maintenance using fault histories and sensors for predictive diagnostics 2. Telepresence for predicting the effects of earthquakes on buildings, using simulations and test sites 3. Bio-medical informatics network providing nbcr.sdsc.edu researchers with access to experiments and visualizations of results 4. Analysis of data from the CMS high energy particle detector at CERN by physicists world-wide over 15 years 5. Testing the effects of candidate drug molecules for [Taufer et al. 2003] their effect on the activity of a protein, by performing parallel [Chien 2004] computations using idle desktop computers 6. Use of the Sun Grid Engine to enhance aerial photographs by using spare capacity on a cluster of web servers 7. The butterfly Grid supports multiplayer games for very large numbers of players on the internet over the Globus toolkit 8. The Access Grid supports the needs of small group collaboration, for example by providing shared workspaces

43 Requirements of grid systems
Remote access to resources, specifically, to archived data. Data processing at the site where the data is managed. Remote requests (queries) result in a visualization or results from a small quantity of data. Resource manager of a data archive create instances of services when they are needed. Similar to distributed object model, where servant objects are created when needed.

44 Requirements of grid systems (2)
Metadata to describe characteristics of archived data. Directory services based on the metadata. Software for: Query management. Data transfer. Resource reservation.


Download ppt "Introduction to Grid Computing"

Similar presentations


Ads by Google