Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Aspire Document Processing 1. 2 Document Processing – “Aspire” Very High Performance Structured Document Processing Architecture Dynamic configuration.

Similar presentations


Presentation on theme: "1 Aspire Document Processing 1. 2 Document Processing – “Aspire” Very High Performance Structured Document Processing Architecture Dynamic configuration."— Presentation transcript:

1 1 Aspire Document Processing 1

2 2 Document Processing – “Aspire” Very High Performance Structured Document Processing Architecture Dynamic configuration and deployment Based on Open Source Technologies Well Supported (wiki, javadoc) Administration interface built-in Vendor Neutral (CMS and search engine) 2

3 3 Top-Level Overview Aspire Data Sources Feeders Document Processing Pipelines Indexing Index

4 4 Aspire Common Resources Components In Aspire (today) Content Control DB SubJob Extractors Unload ARC Files Unload CSV Component ManagerPipeline Manager MetadataManipulation Text Extraction Date Chooser Split Multi- valued data Host to Domain Groovy Scripting JDBC Connection Feeders RSS Hot Folder Single Page RDB Enhancers Get CCD Metadata RDB Enhancer Output Push XML to REST Error Job Handler Debug Output JMS RDB Unloader Feed One Fetch URL Category Tagger Content Boost

5 5 Functions Handled by Aspire Threading Collection Deployment Error handling and notification Including individual sub-job notifications Collection Configuration Component Scripting Job Processing Admin I/F, performance, live system status

6 6 Benefits Much lower lifecycle cost File processing no longer an ad-hoc collection of java objects and methods Encourages re-use of components New collections with no programming Just re-configure existing components Flexibility: deploy collections individually Much better visibility into the file processing internals, performance, and queuing

7 7 Typical Installation Structure Machine #1Machine #2 Crawler Aspire (other feeders and doc processing) Search Engine

8 8 Aspire Architecture and Components Details

9 9 Top-Level Component Architecture

10 10 Aspire and OSGi Components Aspire Component Aspire Component Factory OSGi Bundle Java Jar File Manufactured By ISA

11 11 The Contents of a Bundle/Component Factory

12 12 Component and Factory Details

13 13

14 14 Aspire Sample Configurations

15 15 Web Site Crawler / Search

16 16 Processing CSV Files

17 17 RSS Feeds, Single Pages

18 18 Aspire Deployment

19 19 Deployment Architected to the latest deployment standards Distribution Archetypes Component Repositories Redeploy collections independently In a live running system Redeploy and update components In a live running system Ready for the cloud 19

20 20 Deployment Structure Aspire Resources Collection Config Collection Config Collection Config Collection Config Collection Config Collection Config Feeders & Pipelines Administrator load/reload configuration Configuration Control re-useable components Component Repository

21 21 Deployment Implications Collections are configured independently Collections use standard components Can be dynamically and remotely deployed Remote System Aspire (always running) Collection Config load remote configurations remote admin control


Download ppt "1 Aspire Document Processing 1. 2 Document Processing – “Aspire” Very High Performance Structured Document Processing Architecture Dynamic configuration."

Similar presentations


Ads by Google