Download presentation
Presentation is loading. Please wait.
Published byCyrus Patchell Modified over 10 years ago
1
Setting Up Information Portal Irwan Sampurna C-CONTENT 23 May 2006
2
Presentation Outline Background Information Collection and Enrichment CMS Storage Publish Assignments
3
Background Information available digitally –Exist on Multiple Sites –No single entry point –Minimum reference links between documents Different structure documents collection –Different document formats –Minimum metadata
4
Information Collection and Enrichment General Architecture Harvester Harvested Data Format Converter Formatted Data XML Files Indexing Engine Indexed Data CMS Legal Content Engineers End Users Link Recognizer External Source Legend :Component :User :Data :External Component :Content Flow :Interaction Information Portal Data
5
Harvester Retrieve documents from multiple public web sites Dedicated harvester for dynamic sites –Incremental update Open source harvester for static sites Preserve document format (html and pdf)
6
Format Converter Transform multi-format documents (html, pdf, doc) into XML Metadata identification Dedicated programs for different document types Use open source components as plug-ins: –To convert pdf to html –To validate xhtml structure
7
CMS Integrated documents collection Metadata storage in CMS database Table of Contents definition Additional metadata identification –Links Extractor Identify reference links between documents Based on Open source application
8
CMS Storage The CMS Import Module Get XML Content Incremental Import Link Contents with TOC Input Split Metadata from Content Legal Content Metadata TOC Management Links Generation Export Module Generate TOC as XML file Generate Contents as XML Files (Incremental) Output
9
CMS Storage - Database Design nuke_ce_contentitems PKmc_id mc_parent_id mc_cat_id mc_title mc_text mc_media_url mc_weight mc_notes mc_status mc_cc_exported_flag nuke_ce_categories PKmc_id mc_parent_id mc_title mc_language nuke_ce_statuses PKmc_id mc_parent_id mc_title mc_language nuke_blocks PKpn_bid pn_bkey pn_title pn_content pn_mid pn_weight pn_active nuke_me_menuitems PKmc_id mc_parent_id mc_block_id mc_title mc_uri mc_status mc_weight mc_orig_doc_titledoc_title nuke_modules PKpn_id pn_name pn_version nuke_module_vars PKpn_id pn_modname pn_name pn_value nuke_cc_metaitems PKmc_ceid mc_metaname mc_metacontent mc_metagroup
10
Publish Content indexing –Use dX-index module from eXtrect® –Incremental index generation –Basis for structured document search –Basis for words lemmatization Web Information Portal –Single integrated user interface –Access to complete documents collection –Structured search feature
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.