Download presentation
Presentation is loading. Please wait.
Published byBriana Clarke Modified over 9 years ago
1
1 Aspire Document Processing 1
2
2 Document Processing – “Aspire” Very High Performance Structured Document Processing Architecture Dynamic configuration and deployment Based on Open Source Technologies Well Supported (wiki, javadoc) Administration interface built-in Vendor Neutral (CMS and search engine) 2
3
3 Top-Level Overview Aspire Data Sources Feeders Document Processing Pipelines Indexing Index
4
4 Aspire Common Resources Components In Aspire (today) Content Control DB SubJob Extractors Unload ARC Files Unload CSV Component ManagerPipeline Manager MetadataManipulation Text Extraction Date Chooser Split Multi- valued data Host to Domain Groovy Scripting JDBC Connection Feeders RSS Hot Folder Single Page RDB Enhancers Get CCD Metadata RDB Enhancer Output Push XML to REST Error Job Handler Debug Output JMS RDB Unloader Feed One Fetch URL Category Tagger Content Boost
5
5 Functions Handled by Aspire Threading Collection Deployment Error handling and notification Including individual sub-job notifications Collection Configuration Component Scripting Job Processing Admin I/F, performance, live system status
6
6 Benefits Much lower lifecycle cost File processing no longer an ad-hoc collection of java objects and methods Encourages re-use of components New collections with no programming Just re-configure existing components Flexibility: deploy collections individually Much better visibility into the file processing internals, performance, and queuing
7
7 Typical Installation Structure Machine #1Machine #2 Crawler Aspire (other feeders and doc processing) Search Engine
8
8 Aspire Architecture and Components Details
9
9 Top-Level Component Architecture
10
10 Aspire and OSGi Components Aspire Component Aspire Component Factory OSGi Bundle Java Jar File Manufactured By ISA
11
11 The Contents of a Bundle/Component Factory
12
12 Component and Factory Details
13
13
14
14 Aspire Sample Configurations
15
15 Web Site Crawler / Search
16
16 Processing CSV Files
17
17 RSS Feeds, Single Pages
18
18 Aspire Deployment
19
19 Deployment Architected to the latest deployment standards Distribution Archetypes Component Repositories Redeploy collections independently In a live running system Redeploy and update components In a live running system Ready for the cloud 19
20
20 Deployment Structure Aspire Resources Collection Config Collection Config Collection Config Collection Config Collection Config Collection Config Feeders & Pipelines Administrator load/reload configuration Configuration Control re-useable components Component Repository
21
21 Deployment Implications Collections are configured independently Collections use standard components Can be dynamically and remotely deployed Remote System Aspire (always running) Collection Config load remote configurations remote admin control
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.