Grid Computing in a Commodity World KCCMG Fall Impact 2005 Lorin Olsen, Sprint Nextel
Our Earliest Introductions
How Grid Computing Has Evolved
Definitions Grid computing uses the resources of many separate computers connected by a network (usually the internet) to solve large-scale computation problems. Grid computing offers a model for solving massive computational problems by making use of the unused resources (CPU cycles and/or disk storage) of large numbers of disparate, often desktop, computers treated as a virtual cluster embedded in a distributed telecommunications infrastructure. Wikipedia,
DoD Global Information Grid (GIG) Storage Messaging Enterprise Service Management Discovery Mediation Information Assurance Application Hosting User Assistant Collaboration DOD directive, Deputy Secretary of Defense on September 19, 2002September
CERN The sharing of resources on a global scale is the very essence of the Grid.sharing of resources Security is a critical aspect of the Grid, since there must be a very high level of trust between resource providers and usersSecurity If the resources can be shared securely, then the Grid really starts to pay off when it can balance the load on the resources.balance the load Communications networks have to ensure that distance no longer matters.distance no longer matters Open standards are needed in order to make sure that R&D worldwide can contribute in a constructive way to the development of the Grid.Open standards “GridCafe: Building Blocks”; Francois Grey, Matti Heikkurinen, Rosy Mondardini, Robindra Prabhu;
Types of Grids Computational grid A computational grid is focused on setting aside resources specifically for computing power. In this type of grid, most of the machines are high-performance servers. Scavenging grid A scavenging grid is most commonly used with large numbers of desktop machines. Machines are scavenged for available CPU cycles and other resources. Owners of the desktop machines are usually given control over when their resources are available to participate in the grid. Data grid A data grid is responsible for housing and providing access to data across multiple organizations. Users are not concerned with where this data is located as long as they have access to the data. For example, you may have two universities doing life science research, each with unique data. A data grid would allow them to share their data, manage the data, and manage security issues such as who has access to what data. Bart Jacob, IBM Corporation, “Grid computing: What are the key components?”,
Computational Grids
Solve Grand Challenges A Grand Challenge Problem is a general category of unsolved problems. The definition of a Grand Challenge problem has a certain degree of inherent subjectivity surrounding what is, or is not, a Grand Challenge. A Grand Challenge problem exhibits at least the following characteristics: unsolved problemssubjectivity –The problem is demonstrably hard to solve, requiring several orders-of-magnitude improvement in the capability required to solve it. orders-of-magnitude –The problem can not be unsolvable. If it probably cannot be solved, then it can't be a Grand Challenge. Ideally, quantifiable measures that indicate progress toward a solution are also definable. measures –The solution to a Grand Challenge problem must have a significant economic and/or social impact.economicsocial
Grand Challenge Examples Applied Fluid Dynamics Meso- to Macro-Scale Environmental Modeling Ecosystem Simulations Biomedical Imaging and Biomechanics Molecular Biology Molecular Design and Process Optimization Cognition Fundamental Computational Sciences Grand-Challenge-Scale Applications Nuclear power and weapons simulations
Open Grid Computing Environment (OGCE)
EU Grid Imperatives
Enabling Grids for E-sciencE (EGEE)
LCG/EGEE Components The principal components of the middleware package are: The Globus Toolkit (GT2) developed by the Globus ProjectGlobus Toolkit The Condor system developed at the University of Wisconsin, MadisonCondor system The Globus and Condor components and some other tools from US projects are integrated and packaged as the Virtual Data Toolkit by the VDT project at the Univeristy of Wisconsin, Madison. VDT provides support for this package to LCG/EGEE.Virtual Data Toolkit Tools developed by the DataGrid Project (EDG). The EU-funded DataGrid project ended in 2004, but the institutes that had developed the tools needed for the LCG/EGEE grid continue to support them until they are replaced by improved software.DataGrid Project New middleware components developed as part of the gLite toolkit by the EGEE project. The first release of gLite will provide improved tools for workload scheduling, grid catalog, and a monitoring infrastructure. Future releases will add additional functionality and provide re-engineered components with the aim of satisfying the requirements of the main EGEE application domains: high energy physics, biology and medicine. This middleware activity of EGEE works very closely with the LCG project, and has a formal place in the management of LCG.gLiteEGEE projectmiddleware activity “LCG Middleware”;
LHC Computer Grid Monitoring
Scavenging Grids
Solve Not So Grand Challenges
Stanford:
IBM: World Community Grid
Berkeley Open Infrastructure for Network Computing (BOINC)
BOINC-Powered Projects
Climatepredictions.net
Data Grids
Distributed Filesystems and Grids First came distributed filesystems (NFS) Then came commercial clustered filesystems (Veritas) Linux pioneered installable, distributed filesystems (Coda) Red Hat GFS
Database Grids
European Data Grid (EDG)