#SummitNow Super Size Your Search 14 th November 2013 Fran Alvarez (Zaizi)

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

Enterprise Search with FAST Rick McDannel Manager of Information Technology.
SP 2013 User Profile Service Overview Connecting your Profile to the Portal.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Building A Digital Asset Management System With And Around Fedora 4 Stefano Cossu, Director of Application Services, The Art Institute of Chicago DC Fedora.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Re-Architecting Search Solutions with SharePoint’s new Federation Features ITP314, CIO314, PM314, IA314.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
T Sponsors Sameer Chabungbam Principal Program Manager, Microsoft Connector API Apps BizTalk Summit 2015 – London ExCeL London | April 13th & 14th.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
Thank you SPSKC15 sponsors!. SharePoint 2013 Search Service Application (SSA) Ambar Nirgudkar Software Engineer
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
SOA, EDA, ECM and more Discover a pragmatic architecture for an intelligent enterprise, to maximize impact on the business Patrice Bertrand Software Architect.
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
Federated Searching Pre-Conference Workshop - The federated searching cookbook Qin Zhu HP Labs Research Library February 18, 2007.
1 Enterprise Search From Microsoft Unlock the potential of your organization NameTitle Microsoft Corporation.
Interoperability with CMIS and Apache Chemistry
ManifoldCF for Content Acquisition
Towards a Javascript CoG Kit Gregor von Laszewski Fugang Wang Marlon Pierce Gerald Guo
1 © Copyright 2009 EMC Corporation. All rights reserved. ISIS and PixTools Toolkits Quickly Enabling Document Capture Solutions EMC Corporation.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
BI Funcasts The Mac-Guyver Techniques BI - The Mac-Guyver Techniques : Office Sharepoint Excel Services Gunter Staes –
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Module 10 Administering and Configuring SharePoint Search.
0 SharePoint Search 2013 Rafael de la Cruz SharePoint Developer Seneca Resources twitter.com/delacruz_rafael
Intelligent Web Topics Search Using Early Detection and Data Analysis by Yixin Yang Presented by Yixin Yang (Advisor Dr. C.C. Lee) Presented by Yixin Yang.
Windows Role-Based Access Control Longhorn Update
Virtual techdays INDIA │ august 2010 ENTERPRISE CONTENT MANAGEMENT WITH SHAREPOINT 2010 Naresh K Satapathy │ Solution Specialist, Microsoft Corporation.
Solutions using Microsoft Content Management Server 2002 Connector for SharePoint Technologies Sue Corke Mark Harrison Microsoft UK.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Advanced Search Solutions for SharePoint Christopher Even BA-Insight.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Satisfy Your Technical Curiosity 27, 28 & 29 March 2007 International Convention Center (ICC) Ghent, Belgium.
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Ben Robb MVP, SharePoint Server CTO, cScape Ltd Interoperability Overview: All Roads Lead to SharePoint.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Armedia Case Management for Investigative Case Management David Miller Director of Technology, Armedia James Bailey President, Armedia.
Audit & Reporting with Alfresco & NoSQL architecture Lucas Patingre Alfresco consultant and technical lead at Zaizi.
Replace OpenText with Alfresco in a SAP environment
#SummitNow Super Size Your Search 14 th November 2013 Fran Alvarez (Zaizi)
Thinking Long Term - Archive Strategies for Alfresco Nathan McMinn Remote Service Engineer Alfresco Chetan Lalye Senior Software Architect Agilent Technologies.
Migrating from Legacy ECM Repositories to Alfresco Ray Wijangco Technology Services Group Alfresco Practice Lead.
Explore Various Options for Bulk File Transfer out of Alfresco Craig Tan Technical Account Manager.
#SummitNow Using Alfresco and Cloud Apps in Harmony 5 th November 2013 Santiago Rodríguez Antonio D. Pérez.
Crafter case: European Bank Piergiorgio Lucidi Open Source ECM Specialist Certified Alfresco Instructor and Engineer Alfresco Wiki Gardener and Forum Moderator.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
© 2009 IBM Corporation For Internal Distribution Only © 2009 IBM Corporation For Internal Distribution Only ECM Product Solution Training ® Margaret Worel,
Search can be Your Best Friend You just Need to Know How to Talk to it IW 306 Ágnes Molnár.
The Holmes Platform and Applications
Using Alfresco and Cloud Apps in Harmony
Crafter case: European Bank
Docker Birthday #3.
VI-SEEM Data Discovery Service
What is SharePoint and why you should care
Building Search Systems for Digital Library Collections
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
SharePoint Information Architecture
Multi-Farm, Cross-Continent SharePoint Architecture
Johannes Peter MediaMarktSaturn Retail Group
Lecture 1: Multi-tier Architecture Overview
Developing and testing enterprise Java applications
Remedy Integration Strategy Leverage the power of the industry’s leading service management solution via open APIs February 2018.
Mulesoft Anypoint Connector for AS/400 and Web Transaction Framework
Presentation transcript:

#SummitNow Super Size Your Search 14 th November 2013 Fran Alvarez (Zaizi)

#SummitNow Agenda Myself & My company Background Our Solution Scenario Demo Conclusions

#SummitNow About me Director of Zaizi Iberia and Lead Architect Alfresco Certified Engineer Responsible of large Alfresco architectures Semantic Consultant for Sensefy Alfresco Meetups Organizer

#SummitNow We are an Open Source Development Company that helps people work together more effectively HQ: London (UK) Seville (Spain) Colombo (Sri Lanka) Singapore

#SummitNow What we offer Open Source System Integrator Specialist in ECM Platinum Alfresco partner Best Systems Integrator Partner EMEA 2012 Best Systems Integrator Partner EMEA 2013 Million $ Club Awarded in 2013 Support 24/7

#SummitNow Overview How to build and manage your search server: 1.Scenario 2.Introducing Apache ManifoldCF 3.Zaizi Integrated Search Solution

#SummitNow Scenario An overview about the typical complex search architecture

#SummitNow Scenario - Alfresco limitations Alfresco supports these search engines: Apache Lucene (embedded) Apache Solr (provided by Alfresco) needs development if other repositories must be involved Every other approach must be implemented (ScheduledActions, WebScripts, etc..)

#SummitNow Scenario – Embedded Simple Search Architecture Alfresco is the only one repository involved in the architecture using the embedded search engine: the repository must take care of indexes also managing index transactions Indexes Alfresco FrontEnd applications Apache Lucene

#SummitNow Scenario – Embedded - Cluster Embedded Not easy to scale out with Lucene 1.every cluster must have its own search indexes 2.The cluster must synchronize indexes Indexes Alfresco Apache Lucene Indexes Alfresco Apache Lucene JGroups

#SummitNow Scenario – Simple Architecture Simple search architecture Alfresco is the only one repository involved in the architecture with an external search server 1.The search server can be used for publish contents in the front end architecture 2.The repository will stay in the logic backend Search Engine Indexes Alfresco FrontEnd applications

#SummitNow Scenario – Publish with search A search engine can be used for: advanced management of search indexes scaling out executing complex search on contents publishing contents in the FE architecture

#SummitNow Scenario – Publish with search Publish with search architecture Alfresco is the only one repository involved in the architecture with an external search server 1.The search server can be used for publishing contents in the front end architecture (HTML) 2.The repository will stay in the logic backend Search Engine Indexes Alfresco FrontEnd applications BackEndFrontEnd Lucene / Solr Indexes

#SummitNow Scenario – Simple Architecture Simple Search Architecture Alfresco is the only one repository involved in the architecture with an external search server 1.The search server can be used for publish contents in the front end architecture 2.The repository will stay in the logic backend Search Engine Indexes Alfresco FrontEnd applications

#SummitNow Scenario – Complex Architecture 1.Alfresco is only one of the platforms that must be involved in your search architecture 2.You don’t want to increase the development effort 3.You want just something to configure

#SummitNow Scenario – Complex Architecture Architecture with different ECM systems Alfresco is one of the content platforms that must be involved in the indexing process Alfresco Search Engine Indexes SharePoint FileNet CMIS JIRA Google Drive DropBox

#SummitNow Scenario – Complex Architecture Architecture with different ECM systems Alfresco is one of the content platforms that must be involved in the indexing process Alfresco Search Engine Indexes SharePoint FileNet CMIS JIRA Google Drive DropBox

#SummitNow Scenario – Complex Architecture Architecture with different ECM systems Alfresco is one of the content platforms that must be involved in the indexing process Alfresco Search Engine Indexes SharePoint FileNet CMIS JIRA Google Drive DropBox

#SummitNow Introducing Apache ManifoldCF

#SummitNow Apache ManifoldCF - History ManifoldCF code base was granted by MetaCarta to the Apache Software Foundation in December The MetaCarta effort represented more than five years of successful development and testing in multiple, challenging enterprise environments. The project was graduated as Apache Top Level Project in July 2012.

#SummitNow Apache ManifoldCF – What is? Open Source crawler crawling model (add, change, delete) schedule jobs to create indexes get contents from repositories push contents on search servers

#SummitNow Apache ManifoldCF – What is? Repository 1 Repository 3 Repository 4 Repository 2 Apache ManifoldCF Search Server 1 Search Server 2 Search Server 3 Search Server 4

#SummitNow Apache ManifoldCF – What is? Out-Of-The-Box it is distributed as a webapp REST API Authority Service ACL indexes Crawler UI can be embedded in any Java application

#SummitNow Apache ManifoldCF – Why? Reliability Incremental Flexible Multi repositories Security model Monitoring

#SummitNow ManifoldCF – Why? - Reliability Jobs scheduling and configuration are stored in the database to maintain the state of all the executions Repository 1 Repository 3 Repository 4 Repository 2 Apache ManifoldCF Search Server 1 Search Server 2 Search Server 3 Search Server 4 Pull Agent Daemon Database

#SummitNow ManifoldCF – Why? - Incremental get content changesets obtained from the repository API Repository 1 Apache ManifoldCF Pull Agent Daemon Database query Complete Changesets

#SummitNow ManifoldCF – Why? - Flexible If the repository can't supply all the changes Manifold can discover them through crawling Apache ManifoldCF Pull Agent Daemon Database query Incomplete Changesets Change Discovery NN

#SummitNow ManifoldCF – Why? – Multi repo Jobs can retrieve contents from the following repositories: Google Drive Dropbox HDFS CMIS-compliant Alfresco IBM FileNet EMC Documentum Microsoft SharePoint OpenText LiveLink Autonomy Meridio Memex Patriarch Windows Share/DFS Generic JDBC Generic Filesystem Generic RSS and Web

#SummitNow ManifoldCF – Why? – Multi repo Jobs can ingest contents to the following search servers: Apache Solr ElasticSearch OpenSearchServer MetaCarta GTS

#SummitNow ManifoldCF – Why? - Security Retrieve per-content ACLs Repository 1 Repository 3 Repository 4 Repository 2 Apache ManifoldCF Search Server 1 Search Server 2 Search Server 3 Search Server 4 Authority Service Authority 1 Authority 2 access tokens

#SummitNow ManifoldCF – Why? - Security Retrieve per-content ACLs Repository 1 Repository 3 Repository 4 Repository 2 Apache ManifoldCF Search Server 1 Search Server 2 Search Server 3 Search Server 4 Authority Service Authority 1 Authority 2 user access tokens user specific search results

#SummitNow ManifoldCF – Why? – Monitoring UI Crawler allows you to: configure jobs and connectors monitor jobs execution monitor contents ingestion status reports document status queue status history reports simple history maximum activity maximum bandwidth result histogram

#SummitNow ManifoldCF – Architecture Repository Job Search Server ACLs

#SummitNow ManifoldCF – Architecture Repository Job Search Server ACLs Repository Connector

#SummitNow ManifoldCF – Architecture Repository Job Search Server ACLs Repository ConnectorOutput Connector

#SummitNow ManifoldCF – Architecture Repository Job Search Server ACLs Repository ConnectorOutput Connector Authority Connector

#SummitNow ManifoldCF – Architecture Repository Job Search Server ACLs Repository Connector query to retrieve contents Output Connector Authority Connector

#SummitNow ManifoldCF – Architecture Repository Job Search Server ACLs Repository Connector query to retrieve contents Output Connector metadata mapping content ingestion Authority Connector

#SummitNow ManifoldCF – Architecture Repository Job Search Server ACLs Repository Connector query to retrieve contents Output Connector metadata mapping content ingestion Authority Connector retrieve content ACEs

#SummitNow ManifoldCF – Architecture Repository Job Search Server ACLs Repository Connector query to retrieve contents Output Connector metadata mapping content ingestion Authority Connector retrieve content ACEs verbal description crawling model scheduling

#SummitNow Who is using ManifoldCF?

#SummitNow ManifoldCF - Resources The project is available at From this website you can access to the mailing lists, documentation and download links for binaries and source.

#SummitNow ManifoldCF – Resources - Book ManifoldCF in Action by Karl Wright published by Manning Karl is the original developer and the principal committer of Apache ManifoldCF The book is available at

#SummitNow Background Let’s put a bit of context

#SummitNow Those Old Days… Only Lucene in Alfresco 3.4- Indexes were managed within Alfresco context Permissions were checked after Lucene returned all results

#SummitNow Alfresco 4 is… Common Enemies Find a single document Return large data sets Filter by permissions Be fast! “Sometimes one superhero is not enough”

#SummitNow Present Solr as Search Subsystem Indexes are managed outside Alfresco context Permissions are checked at query time No in-transaction index

#SummitNow Alfresco + Solr Approach Quite a good architecture Takes care of both performance and usability Flexibility in deployment and installations However… Sometimes we just need to use something else

#SummitNow Future Don’t freak out dude! We can arrange something

#SummitNow Our solution Uses Apache Manifold Decoupled from Alfresco Can be integrated with either Alfresco or any other repository vendor Preserve security and permissions within results It’s included in our Semantic solution: Sensefy! API to manage Manifold Services API for searching, decoupling Search engine chosen Simple Bundled UI Lots of Manifold Customization

#SummitNow Apache ManifoldCF Open Source Apache SF Project Get content from repos Push content on search services Crawling model (add, change, delete) Respect permissions, bitch!

#SummitNow Our ManifoldCF Contribution Alfresco Repository Connector: New implementation Amazon Cloud Search Output Connector Alfresco Authority Connector: Design & Development

#SummitNow Some of our most famous villains

#SummitNow Several Alfresco instances Current Alfresco instances don’t share indexes Indexes can’t be merged Can’t have federated search No good approach for presenting results to users

#SummitNow Several Alfresco instances Our solution Once index to rule them all Data origin is irrelevant (or not if we don’t) Single search across repositories You choose your search engine!

#SummitNow Alfresco + Other data providers Current Alfresco Search subsystem != Other provider Search services Alfresco can’t reach external data No way to merge results uniformly to end users

#SummitNow Alfresco + Other data providers Our solution Search engine is shared All of them speak ‘our language’ Alfresco can reach external data through Results are present and accessible between data providers

#SummitNow Alfresco + O(TB) data Current Alfresco Search subsystem Single or clustered Solr Every Solr its own index No chance to apply scale techniques Huge server are required and performance might be compromised

#SummitNow Alfresco + O(TB) data Our Solution Alfresco uses our index Indexing techniques can be applied according to use cases Sharding, Replication… Search strategy can be adopted with best suitable search solution

#SummitNow Other benefits Extract, index and map information from any other sources Putting them together in a single index Permissions are checked just once Search capabilities: facets, highlighting… Red Link Apache ManifoldCF Search Server Authority Service Alfresco Permissions Alfresco

#SummitNow Demo

#SummitNow Demo : Architecture

#SummitNow Demo: Who are these guys? Christian Bale, Actor Christopher Nolan’s Batman Gareth Bale, footballer Real Madrid latest star

#SummitNow Conclusions Searching & Indexing in most popular Cloud Search solutions Retrieving information from most popular repositories and data providers altogether Manage permission and security for data Fully supported by us!

#SummitNow Conclusions

#SummitNow What’s coming How can we improve it, dude? - Powerful UI - New connectors - Large data volume benchmarking - Share integration

#SummitNow We are not Batman But we can be your Superhero Zaizi Ltd.Fran Álvarez (+44) (+34)

#SummitNow Thank you! May you want to help us with this one?