Download presentation
Presentation is loading. Please wait.
Published byCameron Lane Modified over 9 years ago
1
Using SRB and iRODS with the Cheshire3 Information Framework Building Data Grids with iRODS 27-30 May, 2008 National e-Science Centre Edinburgh Dr Robert Sanderson Dept. of Computer Science University of Liverpool azaroth@liverpool.ac.uk http://www.cheshire3.org/ Building Data Grids with iRODS iRODS Workshop, May 27 th 2008 Slide 1
2
Cheshire3 Introduction Architecture SRB Integration Architecture Grid Usage iRODS Integration Possible Architectures Overview iRODS Workshop, May 27 th 2008 Slide 2
3
Cheshire3: Information Analysis Framework Digital Library/Information Retrieval engine with... Data Mining/Machine Learning Text Mining/Natural Language Processing Computational Grid Data Grid Standards Based: Unicode, XML/XPath, MPI, Z39.50/SRU,... Object Oriented Architecture Easy to develop and extend in Python,... but heavy lifting possible in imported C libraries Developed at University of Liverpool, plus UC Berkeley Version: 0.9.10 Mostly stable, needs thorough testing/documentation Introduction iRODS Workshop, May 27 th 2008 Slide 3
4
Context iRODS Workshop, May 27 th 2008 Slide 4
5
Architecture iRODS Workshop, May 27 th 2008 Slide 5 Index Extractor Server ConfigStore UserStore User Object Database Query Normalizer Record Document PreParser Parser Transformer Records ProtocolHandler RecordStore Terms Documents Ingest Process ResultSet PreParser DocumentFactor y DocumentStore IndexStore Tokenizer TokenMerger
6
Architecture 2 iRODS Workshop, May 27 th 2008 Slide 6 Index Record IndexStore Extractor XPathObject Extractor XPathObject Extractor Normalizer Index Normalizer Tokenizer TokenMerger Tokenizer TokenMerger Index Normalizer
7
SRB Integration iRODS Workshop, May 27 th 2008 Slide 7 RecordStore / DocumentStore Filesystem Berkeley DBSQL RDBMS (postgresql) SRB record, document data
8
SRB Integration iRODS Workshop, May 27 th 2008 Slide 8 IndexStore SRB terms a-bc-d e-fg-h... Index dbs db with query term
9
Grid Implementation iRODS Workshop, May 27 th 2008 Slide 9 Focus on ingest, not discovery (yet) Instantiate architecture on every node Assign one node as master, rest as slaves. Master then divides the processing as appropriate. Calls between slaves possible Calls as small, simple as possible: (objectIdentifier, functionName, *arguments) Typically: (workflow_id, 'process', document_id)
10
Grid Architecture iRODS Workshop, May 27 th 2008 Slide 10 Master Task Slave Task 1 Slave Task N Data Grid GPFS Temporary Storage (workflow, process, document) fetch document document extracted data
11
Grid Architecture 2 iRODS Workshop, May 27 th 2008 Slide 11 Master Task Slave Task 1 Slave Task N Data Grid GPFS Temporary Storage (index, load) store index fetch extracted data
12
NARA ERA Demonstrator 20Gb of web crawled data in SRB, indexes stored in SRB Interface generated by easily deployable Python layer Medline Dataset Experiments 16.5 Million Abstracts plus associated metadata Parsed data stored in SRB Indexes in filesystem NSDL Grade Level Analysis NSDL web crawl data (3 Tb+) Data already in SRB, analysis stored to SRB Usage iRODS Workshop, May 27 th 2008 Slide 12
13
Simple Integration (ala SRB) possible: Store data in iRODS for Storage classes Requires Python interface to iRODS Doesn't really benefit from rule capabilities Other (more interesting) Options: Cheshire3 as External Microservice Platform Cheshire3 as Internal Microservice Platform Cheshire3 as Rules Platform(?) iRODS Integration iRODS Workshop, May 27 th 2008 Slide 13
14
External Microservice Platform iRODS Workshop, May 27 th 2008 Slide 14 iRODS Cheshire3 C3 Microservice C3 Interface Microservice data processed data Possible Interfaces: MPI/PVM RPC SOAP Xml Over Http Arbitrary Transport Protocol etc. Loose Coupling via Client Interface
15
Internal Microservice Platform iRODS Workshop, May 27 th 2008 Slide 15 iRODS C3 Microservice data Cheshire3 Requires iRODS to have Python interpreter as alternative Microservice platform, rather than a Python client API. Much tighter integration: Cheshire3 would have access to iRODS internal information rather than just what was passed over interface. Microservice definition problem becomes Cheshire3 Workflow definition – XML description No bandwidth problems of transferring large amounts of data back and forth Tight Coupling via Python Integration
16
Rules Platform? iRODS Workshop, May 27 th 2008 Slide 16 iRODS data Cheshire3 Rules C3 Microservice Microservice s Requires Python interpreter at the Rules execution level, rather than (as well as) at the Microservice level. More flexible in terms of rule design Easier to write rules than current rule language Event system rather than rules execution? Integration of Computational Grid for rule/microservice execution?
17
Website: http://www.cheshire3.org/http://www.cheshire3.org/ Me: azaroth@liverpool.ac.uk Acknowledgements: SHAMAN: EU 7 th Framework Programme Cheshire3: JISC, NSF Questions? Thank You! iRODS Workshop, May 27 th 2008 Slide 17
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.