1 Aspire Document Processing 1. 2 Document Processing – “Aspire” Very High Performance Structured Document Processing Architecture Dynamic configuration.

Slides:



Advertisements
Similar presentations
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Advertisements

Welcome to Middleware Joseph Amrithraj
Setting Up Information Portal Irwan Sampurna C-CONTENT 23 May 2006.
EasySearch Technical Overview. Ever seen a website without a full text search? BUT – Search is expensive Financially Computationally – Search is complicated.
1 Aspire Latest Developments Steve Denny 1.
Copyright © 2009 by SDL Tridion. SDL Tridion®, SDL Tridion R5™, BluePrinting™, SiteEdit™ and WebForms™ are trademarks of SDL Tridion Holding B.V. or its.
G O B E Y O N D C O N V E N T I O N WORF: Developing DB2 UDB based Web Services on a Websphere Application Server Kris Van Thillo, ABIS Training & Consulting.
Fast Track to ColdFusion 9. Getting Started with ColdFusion Understanding Dynamic Web Pages ColdFusion Benchmark Introducing the ColdFusion Language Introducing.
Introducing Symposia : “ The digital repository that thinks like a librarian”
October 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Archive-It Architecture Introduction April 18, 2006 Dan Avery Internet Archive 1.
©2011 Quest Software, Inc. All rights reserved. Steve Walch, Senior Product Manager Blog: November, 2011 Partner Training Webcast.
Curation Tool June 11, Curation Tool Overview Architecture Implementation Dependencies Futures 2.
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
1 © Talend 2014 Service Locator Talend ESB Training 2014 Jan Bernhardt Zsolt Beothy-Elo
Databases & Data Warehouses Chapter 3 Database Processing.
Burton upon Trent, 23rd October. Merit Intelligence Our offerings A complete offering – product, competence and services Competence based on many years.
Windows.Net Programming Series Preview. Course Schedule CourseDate Microsoft.Net Fundamentals 01/13/2014 Microsoft Windows/Web Fundamentals 01/20/2014.
Nutch Search Engine Tool. Nutch overview A full-fledged web search engine Functionalities of Nutch  Internet and Intranet crawling  Parsing different.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
SCRAM Software Configuration, Release And Management Background SCRAM has been developed to enable large, geographically dispersed and autonomous groups.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
Developing Interfaces and Interactivity for DSpace with Manakin Part 2: Technical and Conceptual Overview of Dspace and Manakin Eric Luhrs Digital Initiatives.
MAVEN-BLUEMARTINI Yannick Robin. What is maven-bluemartini?  maven-bluemartini is Maven archetypes for Blue Martini projects  Open source project on.
Introduction to the Atlas Platform Mobile & Pervasive Computing Laboratory Department of Computer and Information Sciences and Engineering University of.
TechEd /22/2017 5:40 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Cognos TM1 Satya Mobile:
Nutch in a Nutshell (part I) Presented by Liew Guo Min Zhao Jin.
ALVIN CHAO GRIDS & PIECES : MINIMIZE LOAD TIME & INCREASE ACCESSIBILITY WITH RSS & CSS.
第十四章 J2EE 入门 Introduction What is J2EE ?
Designing and Developing WS B. Ramamurthy. Plans We will examine the resources available for development of JAX-WS based web services. We need an IDE,
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
Open Search Office Web Services Database Doc Mgt Sys Pipeline Index Geospatial Analysis Text Search Faceting Caching Query parsing Clustering Synonyms.
CaDSR Freestyle Search June 11, caDSR Freestyle Search Overview Architecture Implementation Dependencies Futures 2.
March 12 & 13, 2007 IIS 7.0 for CFML Developers Deploying on IIS 7.0 with Adobe ColdFusion and New.
Open Service Gateway Initiative (OSGi) Reporter : 林學灝 侯承育 1.
Module 10 Administering and Configuring SharePoint Search.
© 2006 by «Author»; made available under the EPL v1.0 | Date | Other Information, if necessary Jason Weathersby BIRT Evangelist, Actuate Corp. Leveraging.
© 2002, Youngjoon Choi, RESL, Inha University Page : 1 Open Services Gateway initiative OSGi Open Services Gateway Initiative Youngjoon Choi © Realtime.
Page 1 © 2001, Epicentric - All Rights Reserved Epicentric Modular Web Services Alan Kropp Web Services Architect WSRP Technical Committee – March 18,
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
System Design and Deployment Status PDS Management Council Face-to-Face UCLA, Los Angeles, California November 28-29, 2012 Sean Hardman.
Solutions using Microsoft Content Management Server 2002 Connector for SharePoint Technologies Sue Corke Mark Harrison Microsoft UK.
How to Read gUSE Documents Orange Docs Series for General Pruposes RELEASE ISSUE POLICY LICENSE HOW TO READ GUSE DOCUMENTS GUSE IN A NUTSHELL by Tibor.
Migrating Desktop Bartek Palak Bartek Palak Poznan Supercomputing and Networking Center The Graphical Framework.
Portal Update Plan Ashok Adiga (512)
1 The EDIT System, Overview European Commission – Eurostat.
PPDG February 2002 Iosif Legrand Monitoring systems requirements, Prototype tools and integration with other services Iosif Legrand California Institute.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Users are moving towards web applications Content on the web is more personal & meaningful Development on the web is easier than the OS.
Hyperion Artifact Life Cycle Management Agenda  Overview  Demo  Tips & Tricks  Takeaways  Queries.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. PPT Version 3 | Content.
SINN01 Technical Workshop Oldenburg 5-7 Dec 2001 Technical Discussion SINN01 Technical Workshop.
Metric Studio Cognos 8 BI. Objectives  In this module, we will examine:  Concepts and Overview  An Introduction to Metric Studio  Cognos 8 BI Integration.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
Replace OpenText with Alfresco in a SAP environment
Crafter case: European Bank Piergiorgio Lucidi Open Source ECM Specialist Certified Alfresco Instructor and Engineer Alfresco Wiki Gardener and Forum Moderator.
Getting & Running EdgeX Docker Containers
Introduction to YouSeer
Getting Started with Alfresco Development
Overall Architecture and Component Model
Introduction to J2EE Architecture
Extraction, aggregation and classification at Web Scale
A technical look at the new capabilities
Academy Hub An eUnomia Factory Solution.
Getting Started With Solr
Academy Hub An eUnomia Factory Solution.
Presentation transcript:

1 Aspire Document Processing 1

2 Document Processing – “Aspire” Very High Performance Structured Document Processing Architecture Dynamic configuration and deployment Based on Open Source Technologies Well Supported (wiki, javadoc) Administration interface built-in Vendor Neutral (CMS and search engine) 2

3 Top-Level Overview Aspire Data Sources Feeders Document Processing Pipelines Indexing Index

4 Aspire Common Resources Components In Aspire (today) Content Control DB SubJob Extractors Unload ARC Files Unload CSV Component ManagerPipeline Manager MetadataManipulation Text Extraction Date Chooser Split Multi- valued data Host to Domain Groovy Scripting JDBC Connection Feeders RSS Hot Folder Single Page RDB Enhancers Get CCD Metadata RDB Enhancer Output Push XML to REST Error Job Handler Debug Output JMS RDB Unloader Feed One Fetch URL Category Tagger Content Boost

5 Functions Handled by Aspire Threading Collection Deployment Error handling and notification Including individual sub-job notifications Collection Configuration Component Scripting Job Processing Admin I/F, performance, live system status

6 Benefits Much lower lifecycle cost File processing no longer an ad-hoc collection of java objects and methods Encourages re-use of components New collections with no programming Just re-configure existing components Flexibility: deploy collections individually Much better visibility into the file processing internals, performance, and queuing

7 Typical Installation Structure Machine #1Machine #2 Crawler Aspire (other feeders and doc processing) Search Engine

8 Aspire Architecture and Components Details

9 Top-Level Component Architecture

10 Aspire and OSGi Components Aspire Component Aspire Component Factory OSGi Bundle Java Jar File Manufactured By ISA

11 The Contents of a Bundle/Component Factory

12 Component and Factory Details

13

14 Aspire Sample Configurations

15 Web Site Crawler / Search

16 Processing CSV Files

17 RSS Feeds, Single Pages

18 Aspire Deployment

19 Deployment Architected to the latest deployment standards Distribution Archetypes Component Repositories Redeploy collections independently In a live running system Redeploy and update components In a live running system Ready for the cloud 19

20 Deployment Structure Aspire Resources Collection Config Collection Config Collection Config Collection Config Collection Config Collection Config Feeders & Pipelines Administrator load/reload configuration Configuration Control re-useable components Component Repository

21 Deployment Implications Collections are configured independently Collections use standard components Can be dynamically and remotely deployed Remote System Aspire (always running) Collection Config load remote configurations remote admin control