Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University
Feb 24-27, 2004ICDL 2004, New Dehli Overview Introduction Architecture & Design Experimentation & Implementation Conclusion & Future Works
Feb 24-27, 2004ICDL 2004, New Dehli Introduction Approaches for DL Interoperation Harvesting and distributed search Lightweight Federated Digital Library (LFDL) Universal search interface for non-cooperating DLs DL behavior specification in DLDL Architectural Enhancements Cache based architecture Better services by processing cached result sets
Feb 24-27, 2004ICDL 2004, New Dehli LFDL Services Registration service Registration server Search service Search engine Result processing engine Management service DL removal, verification, … Runtime info DL availability DL average response time Most often used queries System total hits
Feb 24-27, 2004ICDL 2004, New Dehli LFDL Design – DL specification DLDL in XML Structure General info on a digital library Search URL Search method Query Mapping rules Access methods of the digital library Search interface definition Mapped to LFDL universal interface Results retrieval and parsing rules Information to be retrieved from the digital library
Feb 24-27, 2004ICDL 2004, New Dehli DL Specification - sample Specification for NEEDS SpecificationNEEDS Search form information 2 POST s/public/search/index_body.jhtml Search interface /smete/forms/FindLearningObjects.keyword UI_keyword text input
Feb 24-27, 2004ICDL 2004, New Dehli Query Mapping Samples DLDL native query after mapping ACM query=computer&coll=ACM&dl=ACM&whichdl=acm ARC formname=advance&archive=All&sets=All&creator=Smith&group=archive&sort=r ank&boolean=and IEEE rq=0&col=allieee&qt=computer&qc=allieee&nh=20&ws=0&qm=0&st=1&lk=1&rf =0&rq2=0 NEEDS /smete/forms/FindLearningObjects.keyword=computer=&/smete/forms/FindLearni ngObjects.author=Smith&… CogPrints abstract/keywords/title=computer&abstract/keywords/title_srchtype=ALL&authors /editors=Smith&authors/editors_srchtype=ALL&_satisfyall=ALL&_order=bytitle LTRS abs=computer&au=Smith&sti=*&boolean=AND Sample Query in UI UI_keyword=computer&UI_creator=Smith&UI_hits=20
Feb 24-27, 2004ICDL 2004, New Dehli Limitations and Issues Limited service usability Search results presented in flat structure Need richer metadata to present rich search results Performance Need local metadata repository to generate intelligent cache Solution Retrieve metadata from remote digital libraries Intelligent cache based on retrieved metadata
Feb 24-27, 2004ICDL 2004, New Dehli LFDL Architecture - Enhancement
Feb 24-27, 2004ICDL 2004, New Dehli LFDL Architecture – data flows among modules 1) At initialization the system reads all DL specifications including query mapping rules and metadata parsing rules 2) A resource discovery user submits a query using the universal search interface 3) The front-end filter does pre-processing (query clean-up) and then the query is passed to the Search Engine 4) The Search Engine uses the query mapping rules to transform the universal query to a DL’s native local query 5) A DL agent sends the transformed query to the remote DL and receives the search results 6) The Result Process Engine parses the search results pages and extract the metadata according to the metadata parsing rules and store them in the Local Repository 7) All parsed results are merged by the Controller into an intermediate XML document 8) The resulting XML document is displayed using a XSLT processor. 9) Once the Local Repository has been populated, the Search Engine executes searches against the Local Repository (cache) first instead of sending queries directly to remote DLs.
Feb 24-27, 2004ICDL 2004, New Dehli Local Metadata Repository All searches are served locally first A secondary in memory metadata cache for better performance and system reliability Cache grouped by metadata instead of query string Cache-based distributed search Display results from cache, at the same time Still send out query to DLs to update cache Transparent to end users
Feb 24-27, 2004ICDL 2004, New Dehli Local Metadata Search – detailed process 1) System starts, load most recently and most often used metadata from database to memory cache. 2) User submits a query using LFDL unified search interface. 3) Query is converted to local sql query using predefined translation rules. 4) SQL query is sent to local metadata database and the query results will be matching metadata internal Ids. 5) The in-memory cache is searched based on Ids, if matched the metadata is merged, if not, the missing ones will be loaded from database to cache. 6) If local db has no results, the original query string is transformed to native non-cooperating DL query and sent to the remote DL. Results returned from DL are parsed to extract metadata, which is saved to local repository and loaded to in-memory cache.
Feb 24-27, 2004ICDL 2004, New Dehli Cache Replacement Algorithm Replacement algorithm: least used plus least recent used metadata Initial system-wide parameters: cache size, cache keep safe size Runtime parameters per metadata record: date_last_used, total_usage Algorithm implementation when first start: load from db order by date_last_used, total_usage and pick based on cache size String orderBy = " ORDER BY total_usage desc, date_last_used desc"; String selectMetadata = "SELECT internalID, identifier, archive, datestamp, title, creator, subject, description, publisher, publication, keyword, category contributor, type, format, source, language, status, date_last_used, total_usage FROM dc “ + orderBy; each time when user view a metadata, update date_last_used and total_usage if cache full, remove least used from cache and save to db(first sort by date_last_used, keep safe, then sort by total_usage) cache size and keep safe size can changed at runtime
Feb 24-27, 2004ICDL 2004, New Dehli Results Results Merging and Presentation
Feb 24-27, 2004ICDL 2004, New Dehli Conclusion and Future Works Federation service for non-cooperating DLs is possible Local metadata repository improve service usability and performance Future works Complex interface mapping, access control Populate metadata repository more efficiently Cache maintenance: size, consistency… Automatic specification generation, DL behavior changes discovery Personalized portal: customized interface and results displaying; most often used search and remember search preference; caching options for fresh data or fast results …