Federated Hierarchical Filter Grids STTR-funded project with Indiana, Caltech and Deep Web Technologies A Grid infrastructure for Data Analysis Integrates with the LHC Tiered Computing Model Directly supports general Scientific Analysis In the HEP case, the Gridlet is instantiated as a Rootlet The FHFG Architecture Composed of Information Service Gridlets managed by general Grid system services with a portlet-based portal user interface
Database SS SSSSSSSSS FS FSFS Portal FSFS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS MD MetaData Filter Service Sensor Service Other Service Another Grid Raw Data Data Information Knowledge Wisdom Decisions S S Another Service S Another Grid S SS FS SOAP Messages
Filter Grids Three Features: Information services present data through traditional interfaces Filters that accept data between these interfaces, transform and re-present Streaming connections between all services: –High performance –Archiving –Security –Fault tolerance –Narada Brokering Filter Grids are built from Information resources wrapped as Web Services and Basic Filters that either transform or aggregate Information. Information Services and Filters support identical Service Interfaces.
Information Resource Request/SelectStatusMultiResolution Get IS = Information Service Filter Resource Request/SelectStatusMultiResolution Get MultiResolution PutIssue Queries BFS = Basic Filter Service Filters either transform or Aggregate Information
HEP Event Analysis using Filter Grids Analysis tool of choice is Root. Typical analysis activity is –Loading many files containing event data –Passing each event through a selection filter –Subjecting each selected event to a set of algorithms –Creating summary information in the form of histograms/tables/files Analysis: starts with small event samples, then applied to much larger samples Frequently these are remotely located in the Grid Our HEP implementation is a Filter Grid consisting of Clarens-hosted “Rootlets”. Each Rootlet is a full instance of the Root application, but limited in scope: –The user’s Root loads a Clarens plug-in –The Clarens interface to the Dataset Location Service allows a list of remote datasets to be generated –The client contacts each remote Grid node, connects to the Clarens server there, and instantiates a Rootlet –The user’s analysis selection code is passed over the network to the Rootlet –The list of event data files is passed to the Rootlet –The Rootlet executes, and terminates. –The output histograms/tables/files are then made available via the Clarens server, and fetched, aggregated and processed as required.
Physicist at Tier3 using Root on GBytes of ntuples Loads Clarens Root plugin. Connects to Clarens at Tier2. Sends analysis code (.C/.h files). Clarens creates Rootlet, passes it.C/.h files Rootlet runs analysis code on TBytes of ntuples, creating high statistics output data. Root at Tier3 receives and plots data Rootlets Root embedded in a Clarens server Root nTuples Clarens PluginXML/RPC GBytes Root nTuples ~10 TBytes Analysis.C, Analysis.h Tier3Tier2
Higgs diphoton Analysis using Rootlets