Download presentation
Presentation is loading. Please wait.
Published byBrittney Ramsey Modified over 9 years ago
1
SAN DIEGO SUPERCOMPUTER CENTER An Intelligent Rule-Oriented Data Management System DataGrid Wayne Schroeder San Diego Supercomputer Center, University of California San Diego
2
SAN DIEGO SUPERCOMPUTER CENTER Talk Outline Background Brief Overview of the SDSC SRB Current Projects/Usage Activities/Plans Rule-Oriented Data Management System iRODS Requirements/Planning Architecture Infrastructure Development Collaborations/Plans
3
SAN DIEGO SUPERCOMPUTER CENTER Data Grid Using a Data Grid – in Abstract Ask for data User asks for data from the data grid Data delivered The data is found and returned Where & how details are hidden
4
SAN DIEGO SUPERCOMPUTER CENTER User asks for data Using a Data Grid - Details Storage Resource Broker Data request goes to SRB Server Storage Resource Broker Metadata Catalog DB Server looks up data in catalogCatalog tells which SRB server has data1 st server asks 2 nd for dataThe data is found and returned
5
SAN DIEGO SUPERCOMPUTER CENTER Using a Data Grid - Details SRB MCAT DB SRB Data Grid has arbitrary number of servers Complexity is hidden from users
6
SAN DIEGO SUPERCOMPUTER CENTER Storage Resource Broker A Data Grid Solution Collaborative client-server system that federates distributed heterogeneous resources using uniform interfaces and metadata Provides a simple tool to integrate data and metadata handling – attribute-based access Blends browsing and searching Developed at SDSC -Operational for 7+ years; -Under continual development since 1997; -Customer-driven
7
SAN DIEGO SUPERCOMPUTER CENTER Some SRB Features The SRB is an integrated solution which includes: a logical namespace, interfaces to a wide variety of storage systems, high performance data movement (including parallel I/O), fault-tolerance and fail-over, WAN-aware performance enhancements (bulk operations), storage-system-aware performance enhancements ('containers' to aggregate files), metadata ingestion and queries (a MetaData Catalog (MCAT)), user accounts, groups, access control, audit trails, GUI administration tool data management features, replication user tools (including a Windows GUI tool (inQ), a set of SRB Unix commands, and Web (mySRB)), and APIs (including C, C++, Java, and Python). SRB Scales Well (many millions of files, terabytes) Supports Multiple Administrative Domains / MCATs (srbZones) And includes SDSC Matrix: SRB-based data grid workflow management system to create, access and manage workflow process pipelines.
8
SAN DIEGO SUPERCOMPUTER CENTER Recent SRB Release, April 28 Any valid ASCII characters are now acceptable in SRB filenames, except a string of two quotes in a row Data integrity and vault management Quota System SRB Web Perl Portal SRB account management via grid-mapfile Real time data management New driver for NCAR MSS Completely reworked web site/documentation system (MediaWiki) Other new features Critical bug patches for in 3.4.0 included Other bugzilla fixes (about 35) MCAT Patch
9
SAN DIEGO SUPERCOMPUTER CENTER Recent SRB Releases 3.4.1 April 28, 2006 3.4 October 31, 2005 3.3.1 April 6, 2005 3.3 February 18, 2005 3.2.1 August 13, 2004 3.2 July 2, 2004 3.1 April 19, 2004 3.0.1 December 19, 2003 3.0 October 1, 2003 2.1.2 August 12, 2003 2.1.1 July 14, 2003 2.1 June 3, 2003 2.0.2 May 1, 2003 2.0.1 March 14, 2003 2.0 February 18, 2003
10
SAN DIEGO SUPERCOMPUTER CENTER SRB Projects Astronomy National Virtual Observatory Data Grids UK e-Science CCLRC Teragrid Digital Libraries and Archives National Archives and Records Administration National Science Digital Library Persistent Archive Testbed Ecological, Environmental, Oceanographic ROADnet Southern California Earthquake Center SIO Digital Libraries Molecular Sciences Synchrotron Data Repository Alliance for Cellular Signaling Neuro Sciences Biomedical Information Research Network Physics and Chemistry BaBar Many others Over 650 Tera Bytes in 106 million files
11
SAN DIEGO SUPERCOMPUTER CENTER SRB Scalability Over 2 Petabytes World-wide Major SRB instances in the UK, Australia, Taiwan, US United Kingdom - UK e-Science Australia - APAC Taiwan - Academia Sinica, NCHC Europe -IN2P3, Italy, Norway United States 660 Terabytes at SDSC 100 Million files SAM QFS, HPSS, Unix file system, SRB Bricks
12
SAN DIEGO SUPERCOMPUTER CENTER SDSC Hosted SRB Data
13
SAN DIEGO SUPERCOMPUTER CENTER Case Study: SRB in BIRN BIRN Toolkit Mediator Viewing/Visualization Queries/ResultsApplications Data Management File System MCAT HPSS Data Model Data Access Data Grid Computational Grid Collaboration NMI Grid Management Globus GridPort Scheduler Distributed Resources Database SRB Database
14
SAN DIEGO SUPERCOMPUTER CENTER SRB server SRB agent SRB server Federated SRB Operation MCAT Read Application in Boston SRB agent 1 2 3 4 6 5 5/6 Logical Name Or Attribute Condition 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access R1 R2 San Diego Durham
15
SAN DIEGO SUPERCOMPUTER CENTER SDSC Storage Resource Broker & Meta-data Catalog SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application C, C++, Linux I/O Unix Shell Dublin Core Resource, User Defined Application Meta-data Remote Proxies DataCutter Third-party copy Java, NT Browsers Web Prolog Predicate MCAT HRM
16
SAN DIEGO SUPERCOMPUTER CENTER IRODS - the Next Generation of Data Grid Technology
17
17 Moving Forward, a Two-Prong Plan Maintain and Adapt SRB to New Usages: SRB has reached a Stable Plateau Bug Fixes Some New Features Merge Features Developed by others Continue Testing Improve Documentation Continue Application Support Existing and new Projects Continue Answering User Queries Chart New Areas Federation Research - ZoneSRB Collaborative Data Grids Real-time Data Grids - Virtual Object Ring Buffer Sensors and Video Streams Collaborating Observatories SRB Workflows - New UI for Admins and users Kepler actors, Matrix, etc iRODS - Adaptive Middleware Architecture MCAT1 MCAT2 MCAT3 Server1.1 Server1.2 Server2.1 Server2.2 Server3.1
18
SAN DIEGO SUPERCOMPUTER CENTER Continuing SRB Support 10 FTEs SRB 5 FTEs iRODS iRODS Developers Support SRB
19
19 Next generation Data Architecture SRB is quite complex – with too many functions and operations The intelligence is hard-coded extensions/modifications require extreme care But, the modules are fairly robust and reusable AIM: Can we make SRB more flexible Easy to customize at finer level Example: Higher authentication for a particular collection Example: Can we use stricter authorization for a collection Example: Can we treat a particular resource differently Currently- needs code changes Solution: Use rule-based architecture to provide flexibility
20
SAN DIEGO SUPERCOMPUTER CENTER iRODS A New Paradigm in Middleware Development Flexible Collection management Can be customized at user/collection-levels, … Language for Collection management As in stored procedures, triggers (RDB) Administrative ease Lot of potential beyond SRB adaptive middleware architectures This will be a fully Open Source effort
21
SAN DIEGO SUPERCOMPUTER CENTER Rule-Oriented Data Systems Framework Resources Client InterfaceAdmin Interface Metadata Modifier Module Config Modifier Module Rule Modifier Module Consistency Check Module Confs Rule Base Meta Data Base Engine Rule Curren t State Rule Invoker Micro Service Modules Resource-based Services Micro Service Modules Metadata-based Services Service Manager Consistency Check Module Consistency Check Module
22
SAN DIEGO SUPERCOMPUTER CENTER Rule Checking Establish State Data Movement CleanUp Condition checking, rule firing Backend Processing Micro Services Setup state and interact with RCAT – updates and modifications to persistent state Cleanup state and interact with RCAT – updates and modifications to persistent state Client Operation such as srbObjCreate Server-side Client-side Rule-oriented Data System (Phase I Operational Model)
23
23 Rules and Constraints Rule-based Lower-level Functions are composed of micro-services Higher-level Functions are composed of rules of lower-level micro- services Rules are interpreted using a rule engine Customizability Problems with rule composition Integrity checks to make sure rules do not break higher-level functionalities Declarative programming Rules define semantics Operational programming Rule invocation provides procedural interpretation Rules can be used as “checks and balances” to make sure that collections are self-consistent Example: Rule makes two copies of each files Constraint checking: can be used to see if the collection is consistent with this rule
24
24 Rule Scalability and Decidability Distinct Sets of Rules Applied in Different Ways Atomic Deferred (state flags) Compound Applied Using Micro-services Granularity User Input to Influence Rule Expression Administration Enforcement Collection Consistency Management Rule Properties Metadata Managing Execution (granularity, periodicity) Metadata Defining Result of Rule Execution
25
25 ingestInCollection(S) :- /* store & backup */ chkCond1(S), ingest(S), register(S) findBackUpRsrc(S.Coll, R), replicate(S,R). ingestInCollection(S) :- /*store & check */ chkCond2(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S). ingestInCollection(S) :- /* store, chk, backup & chk */ chkCond3(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S), findBackUpRsrc(S.Coll, R), replicate(S,R) computeSerChkSum(S,C3), checkAndRegisterChkSum(C2,C3,S). ingestInCollection(S) :- /*store,check, backup & extract metadata */ chkCond4(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S), findBackUpRsrc(S.Coll, R), [replicate(S,R) || extractRegisterMetadata(S)]. ingestInCollection(S) :- /* just store */ ingest(S), register(S). Sample Rules chkCond1(S) :- user(S) == ‘adil@cclrc’. chkCond1(S) :- coll(S) like ‘*/scec.sdsc/img/*’. chkCond2(S) :- user(S) == ‘*@nara’. chkCond3(S) :- user(S) == ‘@salk’. chkCond4(S) :- user(S) == ‘@birn’, datatype(S) == ‘DICOM’. [OprList] implies delay for later or send to a CronJobManager Opr||Opr implies do them in parallel Opr, Opr implies do them serially
26
SAN DIEGO SUPERCOMPUTER CENTER New DataGrid Technology Next Generation SRB -- iRODS: Intelligent Rule-Oriented Data Systems Customizable and Flexible – User Configurable Administratively Simpler – Admin Configurable Build upon the experience of SRB Data Grid Transition from SRB to iRODS Client-level similarity Meta Catalog transition Current NSF Funding Information Technology Research 2 years ~ 2 FTEs Simple proto-type in a year Started September 2004 Rule-based architecture Follow-on funding NARA NSF
27
SAN DIEGO SUPERCOMPUTER CENTER iRODS Collaborations SRB/iRODS Developers Arcot Rajasekar Michael Wan Wayne Schroeder Other SRB Team Members Collaborative Development UK e-Science University of Queensland University of Maryland Others
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.