Object Web Architectures Portals P2P XML ERDC Gateway Tutorial Geoffrey Fox IPCRES Laboratory for Grid Technology Computer Science, Informatics, Physics Indiana University Bloomington IN fox@csit.fsu.edu 11/8/2018 erdcgridportalaug01
Computational Grids Survey A brief introduction to computational grid projects and goals. 11/8/2018 erdcgridportalaug01
What Is a Computational Grid? Grids link distributed scientific resources. Resources can be geographically, politically distributed Goal: provide means for sharing resources between organizations. Example “high-end” resources: Supercomputers and clusters Mass storage Advanced visualization (CAVES) and collaboration (Access Grid). Particle colliders, telescopes, earthquake detectors www.globus.org/research/papers/anatomy.pdf 11/8/2018 erdcgridportalaug01
What Does a Grid Need? Multi-institutional security PKI or Kerberos Information services Manage, store, deliver information about resources. Use information to make decisions Scheduling and Queuing Advance reservation Meta-queuing Remote execution, file transfer, monitoring 11/8/2018 erdcgridportalaug01
Example of a Grid Problem: CERN’s Large Hadron Collider Goes on-line in 2005 Will generate petabytes of raw, distributed data, terabytes of event summary data. Computing resources for data analysis will be distributed between CERN and regional centers spread all over the world 1500-2000 people will collaborate on experiments. 11/8/2018 erdcgridportalaug01
Grid Projects Grid Infrastructure Grid Applications Condor: www.cs.wisc.edu Globus: www.globus.org Legion: www.cs.virginia.edu/~legion Grid Applications Netsolve: www.cs.utk.edu/netsolve Ninf: www.etl.go.jp Global Grid Forum: www.gridforum.org A highlight of some grid projects. First three are mostly concerned with some aspect of infrastructure (scheduling, security, information, resource management). Second two are examples of grid applications. Both attempt to produce grid enabled function calls for scientists. Many more projects are out there. The GGF acts as a clearing house/community/support group for projects, application users. 11/8/2018 erdcgridportalaug01
Examples of Deployed Grids NASA’s Information Power Grid Links NASA’s Ames, Glenn, and Langley Centers. LaunchPad currently available www.ipg.nasa.gov DOE’s ASCI Distributed Resource Management Links classified computing resources at Lawrence Livermore, Los Alamos, and Sandia National Labs. Full deployment scheduled by Nov 2001. 11/8/2018 erdcgridportalaug01
Latest Grid News NSF will spend $53 million on the Distributed Terascale Facility (DTF) 13.6 teraflops, 600 terabytes, 40 Gigabit/sec DTF sites: NCSA, SDSC, Argonne, CalTech Industry partners: IBM, Intel, Qwest See www.ncsa.uiuc.edu/News/Access/Releases for more information (August 9). 11/8/2018 erdcgridportalaug01
Distributed Objects Examples of current object technologies Documents -- URL "General Programs including database invocations" Old Style Web -- CGI New Style Web -- XML CORBA and COM -- special "interface definition language" (IDL) defines invocation in C++ like syntax RMI uses Java language as IDL language Benefits of distributed objects allows objects written in different languages to communicate seamlessly via standardized messaging protocols embodied by middleware. Higher levels of transparency of interoperability Objects can be “self-managing” of resources provides flexible grain of decomposition for building complex systems 11/8/2018 erdcgridportalaug01
Distributed Object Web Technology Model Basic Vision: Merge Web and Distributed Objects E.g. Need to abstract entities (Web Pages, database entries, simulations) and services as objects with methods(interfaces) CORBA .. XML is “just” CGI done right COM(Microsoft) and CORBA(world) are competing cross platform and language object technologies Javabeans plus RMI and perhaps JINI is 100% pure Java distributed object technology W3C says you should use XML which defines a better IDL and with Schema an object specification model and SOAP an Object access model 11/8/2018 erdcgridportalaug01
3-Tier Architecture and Different Object Models There are several important Object Models: COM, CORBA, Java, Web, Oracle Database …… But it doesn’t matter!! Object Repository XML File System (Web Site) Request Or Export/Import Information Middle Tier “Business Logic” dissociates User and Back End Database 11/8/2018 erdcgridportalaug01
Emerging Object Web Multi-Server Model Clients and their servers Back End Servers and their services Middle Tier Custom Servers 11/8/2018 erdcgridportalaug01
Computational Science Grid: Multi-Server Web Computing System Portals are user Interfaces to a Grid The World Wide Web is a big Grid P2P Networks include Grids Multidisciplinary Control (WebFlow) Portal Control Parallel DB Proxy Database NEOS Control Optimization Optimization Service Origin 2000 Proxy MPP NetSolve Linear Alg. Server Matrix Solver Agent-based Choice of Compute Engine IBM SP2 Proxy Data Analysis Server Portals MPP The Grid 11/8/2018 erdcgridportalaug01
Global Grid Forum 11/8/2018 erdcgridportalaug01
Computational Grids Exploit the analogy with electricity – make using a computer as natural as plugging an appliance (PDA, PC) into a wall socket Make the ensemble of computers, storage devices, scientific instruments on the web “seamlessly accessible” Link components of the grid together to solve a single problem Clusters, metacomputers There are computational grids, education grids, information grids, shopping grids etc. The web is a (information) grid Everything is an object Generic access implies standards for API’s and protocols and services USC (ISI Carl Kesselmann) and Argonne (Ian Foster) pioneered grids 11/8/2018 erdcgridportalaug01
Issues for Grids and hence Portals Are the grid components pretty much fixed – such as giant ASCI supercomputers Are they fleeting and mobile such as internet connected cell phones The set of IP enabled home sensors, appliances and controllers is a grid What are requirements? anonymity, performance Security,, ease of use … Different components and requirements implies that not likely to be just one grid but a federation of interoperable grids What are the “standards” and who sets them How do universities build grids they care about on graduate time while industry builds and abandons remarkable technologies on Internet time 11/8/2018 erdcgridportalaug01
Foster’s Grid architecture What is difference between protocol (SOAP, HTTP) and Application interface (HTML, MIME) 11/8/2018 erdcgridportalaug01
ASCI Grid Link the multi teraflop computers of ASCI together – today 12, 3 and 2 teraflops. By 2005 100, 60 and 20 teraflops 11/8/2018 erdcgridportalaug01
IPG Architecture 11/8/2018 erdcgridportalaug01
Information Power Grid Led by NASA Ames 11/8/2018 erdcgridportalaug01
Experimental Particle Physics Grid 11/8/2018 erdcgridportalaug01
Earthquake Engineering Grid Links Experimental Facilities, Compute resources, people 11/8/2018 erdcgridportalaug01
Commodity Portals are Web Interfaces for Consumers Yahoo, NetCenter, Amazon.com, Ebay.com etc. are portals for e-commerce, news etc. We want to use these ideas in building computer interfaces 11/8/2018 erdcgridportalaug01
Hierarchy of Portals and Their Technology Portal Building Tools and Frameworks (XML, iPlanet, Portlets, www.desktop.com) Generic Portals Collaboration Universal Access Security ……. Generic Services User customization, component libraries, fixed channels Information Services Databases ……. Enterprise Portals Grid Services Visualization ... Quizzes Grading ... Education Services Compute Services Education and Training Portals MathML etc Science Portals ……... ……... K-12 University Biology Chem Egy 11/8/2018 erdcgridportalaug01
Services in Any Grid Application Security Fault Tolerance Object Lookup and Registration Object Persistence and Database support Event and Transaction Services Information Services Collaboration among users Teachers and Students (Centra) Market lead and Salespeople (WebeX) 11/8/2018 erdcgridportalaug01
Further Services in Computational Grids Job Status File Services (as in NPACI Storage Resource Broker) Support (XML based) computational science specific metadata like MathML, XSIL Visualization Programming, Debugging, Performance Monitoring Application Integration (chaining services viewed as backend compute filters) can be called Workflow “Seamless Access” and integration of resources between different users/application domains Job Scheduling (Condor) and special operating modes such as multitude of parameter search jobs Parameter Specification Service (get data from Web form into Fortran program wrapped as backend object) High Performance for general services 11/8/2018 erdcgridportalaug01
Web Computing and P2P Pleasingly (embarrassingly) parallel applications involvement the management of multiple jobs running on separate largely independent parts Some Monte Carlo calculations and parameter searches Also fancy number theoretic applications such as cracking of RSA security Here we see “use of idle cycles” and similar job scheduling issues Many have noticed value of Web for this and this is sometimes called P2P or peer-to-peer computing as involves Peers on edge of Internet – not monster servers in middle Note total power of Web is around one thousand times that of most powerful supercomputer but how much can be harnessed? 11/8/2018 erdcgridportalaug01
P2P for Distributed Computing or Web Computing I The P2P applications are highlighted by the use of millions of Internet clients to analyze data looking for extraterrestrial life (SETI@home http://setiathome.ssl.berkeley.edu/ ) and the Newer project examining the folding of proteins ( Folding@home http://www.stanford.edu/group/pandegroup/Cosm/ ). These are building distributed computing solutions for a special class of pleasingly or embarrassingly parallel applications: Those that can be divided into a huge number of essentially independent computations, and a central server system doles out separate work chunks to each participating client. This approach is called P2P because the computing is Peer based even though it does not have the "Peer only communication" characteristic of P2P information systems like Gnutella and Napster. SETI@home and Folding@home are elegantly implemented as screen savers that you download. 11/8/2018 erdcgridportalaug01
Parabon Pure Java model Ensures Security 11/8/2018 erdcgridportalaug01
Entropia Financial Modeling I 11/8/2018 erdcgridportalaug01
Entropia Financial Modeling II Each basic financial instrument can be calculated independently Central Server interprets the total simulation Make Money or Learn what causes market swings or …. 11/8/2018 erdcgridportalaug01
Drug Structure Simulations 11/8/2018 erdcgridportalaug01
United Devices also does Drug Simulation Parameter Study: do billions of simulations – each with different parameters Search Engine like interface to simulation Works as each calculation fits in a PC – a detailed molecular model would usually not do this 11/8/2018 erdcgridportalaug01
Performance of Entropia Network 11/8/2018 erdcgridportalaug01
P2P for Distributed Computing or Web Computing II Other projects of this type include: United Devices (http://www.ud.com/home.htm based on SETI@home), AppliedMeta (http://www.appliedmeta.com based on well known Legion project from the University of Virginia), Parabon computation (http://www.parabon.com), Condor (from Wisconsin http://www.cs.wisc.edu/condor/) and Entropia (http://www.entropia.com/). Other applications for this type of system include financial modeling, bio-informatics, measurement of web server performance and the scheduling of different jobs to use idle time on a network of workstations. Ian Foster has given a more detailed review of these activities at http://www.nature.com/nature/webmatters/grid/grid.html and related them to computational grids (http://www.gridforum.org). 11/8/2018 erdcgridportalaug01
Learning Management Grid from DoD ADL ADL= Advanced Distributed Learning Learning Server Content Server(s) External systems: HR, E-Commerce, ERP... Migration Adapter API Application Browser Server Side Client Side HTML+ Services or Adapter Course Interchange: Structure Format (CSF), Metadata Runtime Environment: Launch, API, Data Model “Learning Management System” LMS Common Grid Services & Objects Client Server www.adlnet.org 11/8/2018 erdcgridportalaug01
Properties of Educational Objects Metadata from IEEE and IMS Roughly Properties of educational objects thought of as “documents” (author, title …) Course Packaging from ADL and IMS How to form bigger (educational) objects from smaller objects Enterprise Properties from IMS Link to people (users) and organization databases (rather incomplete at present but must be important as probably can agree) Tests and Quizzes from IMS Specialized descriptors from ADL Such as objectives, prerequisites, completion requirements All Grids 11/8/2018 erdcgridportalaug01
Education Specific Portal Services Administrative Structure degrees, departments, lecturers, Deans ... Performance (grading) information Homework submission Quizzes of various types (multiple choice, random parameters) Assessment data and an analysis Hierarchical Curriculum structure from document fragment to page to lecture to course Napster/Gnutella type P2P distributed information system with personalized dynamic collections (analogy between CDROM of pirated music and dynamic lectures/personal info resource as in RealJukebox) 11/8/2018 erdcgridportalaug01
Some Science Portals and Services: Gannon Project Supported Grid Services JS JM IS FM AA CT SC EJ Grid Standards used or will use Needed Services Gateway X X X X X X X Kerb/GSI, MDS, Gram, CORBA, EJB Events, Collab, app resource management Mississippi X X X X X X X X Kerb/GSI, Gram, GIS, CORBA. Event, Data access services Unicore GSI, GIS, Scheduling Abstract Job Metadata Hot Page X X X X X GSI, GIS, Gram, SRB Grid Accounting, portal-to-portal protocols, cert/key repository Indiana X X X X X X GSI, MDS, Gram, GSIFTP, CoG Events, app schema standards, RMI Nimrod Globus, Legion, Condor Resource Auctions & Allocations Cactus Globus via GPDK HDF5, MPI Resource Brokers JS Job Submission JM Job Management e.g. File Staging IS Information Services FM File Management AA Authorization and Accounting CT Composition SC Scripting EJ Job Journaling 11/8/2018 erdcgridportalaug01
Some Science Portals and Services: Gannon JS Job Submission JM Job Management e.g. File Staging IS Information Services FM File Management AA Authorization and Accounting CT Composition SC Scripting EJ Job Journaling Project Supported Grid Services JS JM IS FM AA CT SC EJ Grid Standards used or will use Needed Services GPDK-LBL X X X X X Globus via CoG CoG-ANL Globus, CORBA Software Installation JiPANG/Ninf X X X X X X CoG, Jini, Ninf, Netsolve Events ECCE+ELN GSI, GIS, Meta-scheduling IPG LaunchPad X X X X X X GSI, GIS via GPDK Lattice X X X X X X X OpenSSL, x.509 Discover CORBA 11/8/2018 erdcgridportalaug01