Audit & Reporting with Alfresco & NoSQL architecture Lucas Patingre Alfresco consultant and technical lead at Zaizi.

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

High level QA strategy for SQL Server enforcer
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
Module 12: Auditing SQL Server Environments
Apache Struts Technology
Privileged Identity Management Enterprise Password Vault
The Most Analytical and Comprehensive Defense Network in a Box.
EJB Design. Server-side components Perform –complex algorithms –high volume transactions Run in –highly available environment (365 days/year) –fault tolerant.
Implementation Considerations for FAST Search For SharePoint (FS4SP) Presenter : Shyam Narayan MOSSIG – February 2011 Meeting b:
Chapter 14 The Second Component: The Database.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
Passage Three Introduction to Microsoft SQL Server 2000.
Overview SQL Server 2008 Overview Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server MVP, MCTS Microsoft Web Development MCP ITIL.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Overview of SQL Server Alka Arora.
Systemic Issues of Software Confederations Jaroslav Král, Michal Žemlička Charles University, Prague
©2014 Experian Information Solutions, Inc. All rights reserved. Experian Confidential.
Christopher Jeffers August 2012
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Data: Migrating, Distributing and Audit Tracking Michelle Ayers, Advisory Solution Consultant
Data File Access API : Under the Hood Simon Horwith CTO Etrilogy Ltd.
M1G Introduction to Database Development 6. Building Applications.
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
Presentation. Recap A multi layer architecture powered by Spring Framework, ExtJS, Spring Security and Hibernate. Taken advantage of Spring’s multi layer.
International Directory Network (IDN) Scalability, Security and Interoperability WGISS, 2006 Tom Northcutt Systems Administrator: GCMD September 13, 2006.
ArcGIS Server for Administrators
Module 10 Administering and Configuring SharePoint Search.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
1 Chapter Overview Defining Operators Creating Jobs Configuring Alerts Creating a Database Maintenance Plan Creating Multiserver Jobs.
By Rashid Khan Lesson 6-Building a Directory Service.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
HUSKY CONSULTANTS FRANKLIN VALENCIA WIOLETA MILCZAREK ANTHONY GAGLIARDI JR. BRIAN CONNERY.
SQL Server 2008 R2 Manageability. Challenges facing database administrators today: Scaling management to multiple data centers Proactively monitoring.
Centralized Logfile Search (a.k.a. Tracing) Vito Baggiolini with Gergo Horanyi, Felix Ehm, Stephen Page.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
1 Geog 357: Data models and DBMS. Geographic Decision Making.
CASTOR logging at RAL Rob Appleyard, James Adams and Kashyap Manjusha.
 Introduction  Tripwire For Servers  Tripwire Manager  Tripwire For Network Devices  Working Of Tripwire  Advantages  Conclusion.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Records Management 2.0 Tips and Tricks Kevin Dorr Sr. Solutions Engineer Americas Channel.
Audit API : Hints and Tricks Mehdi BELMEKKI, Consultancy Team Alfresco.
A web based tool for estimation of Gage R&R and Measurement Uncertainty Siva Venkatachalam & Dr. Jay Raja Center for Precision Metrology The University.
Alfresco Scalability Benchmarking Before telling how cool Alfresco is, you better prove it!
#SummitNow Super Size Your Search 14 th November 2013 Fran Alvarez (Zaizi)
MarkLogic The Only Enterprise NoSQL Database Presented by: Aashi Rastogi ( ) Sanket Patel ( )
Crafter case: European Bank Piergiorgio Lucidi Open Source ECM Specialist Certified Alfresco Instructor and Engineer Alfresco Wiki Gardener and Forum Moderator.
Unlocking the Secrets of Alfresco Authentication Mehdi BELMEKKI, Consultancy Team Alfresco.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Elasticsearch – An Open Source Log Analysis Tool Rob Appleyard and James Adams, STFC Application-Level Logging for a Large Tier 1 Storage System.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
A presentation on ElasticSearch
Integrating ArcSight with Enterprise Ticketing Systems
Integrating ArcSight with Enterprise Ticketing Systems
WinCC-OA Log Analysis SCADA Application Service - Reporting
Improving searches through community clustering of information
Open Source distributed document DB for an enterprise
Cloud based Open Source Backup/Restore Tool
Storage Virtualization
Johannes Peter MediaMarktSaturn Retail Group
What’s changed in the Shibboleth 1.2 Origin
Overview of big data tools
Get your ETL flow under statistical process control
End to End Monitoring Solution using Open Source Technology where webMethods 9.10 is used as ESB IBM Confidential.
Indexing with ElasticSearch
Presentation transcript:

Audit & Reporting with Alfresco & NoSQL architecture Lucas Patingre Alfresco consultant and technical lead at Zaizi

Summary Some context Alfresco audit Scaling the audit Benefits Expanding on the architecture Going forward

Why auditing? A bit of context

Zaizi for Alfresco: Platinum Partner and Best System Integrator Partner of Year 2012 & 2013 Specialist Alfresco ECM partner. Implemented the biggest and most complex Alfresco projects in the UK. Help global enterprises manage their growing information volumes by leveraging Alfresco’s outstanding performance and scalability. Alfresco Partner of the Year 2012 & 2013

Audit data from Alfresco Who When What Action Target Details /alfresco-access /transaction /action=<actionNamegt /sub-actions=<sub action listgt /path=<prefixPathgt /type=<prefixTypegt /node=<nodeRefgt /user=<usergt /copy /from /node=<nodeRefgt /path=<prefixPathgt /type=<prefixTypegt /move /from /node=<nodeRefgt /path=<prefixPathgt /type=<prefixTypegt /properties /from=<mapOfValuesgt /<propertyNamegt=<propertyValuegt /to=<mapOfValuesgt /<propertyNamegt=<propertyValuegt /add=<mapOfValuesgt /<propertyNamegt=<propertyValuegt /delete=<mapOfValuesgt /<propertyNamegt=<propertyValuegt /aspects /add=<mapOfNamesgt /<aspectNamegt=null /delete=<mapOfNamesgt /<aspectNamegt=null /version-properties=<mapOfValuesgt /sub-action/<sequencegt /action=<actionNamegt /copy /move /properties /aspects

Existing Alfresco audit Sharing Alfresco database

Components overview

Alfresco's audit dashlet

Alfresco's audit storage

SQL to retrieve audit entries SELECT entry.id 'Id', entry.audit_time 'Time', user_string.string_value 'User', act_string.string_value 'Application', sv.string_value 'Value' FROM alf_audit_entry entry INNER JOIN alf_prop_value user ON (entry.audit_user_id = user.id) INNER JOIN alf_prop_string_value user_string ON ((user.persisted_type = 3 OR user.persisted_type = 5) AND user.long_value = user_string.id) INNER JOIN alf_audit_app app ON (entry.audit_app_id = app.id) INNER JOIN alf_prop_value act ON (app.app_name_id = act.id) INNER JOIN alf_prop_string_value act_string ON ((act.persisted_type = 3 OR act.persisted_type = 5) AND act.long_value = act_string.id) INNER JOIN alf_prop_link pl on (pl.root_prop_id = entry.audit_values_id) INNER JOIN alf_prop_value pv on (pl.value_prop_id = pv.id) LEFT JOIN alf_prop_string_value sv on (sv.id = pv.long_value and (pv.persisted_type = 3 OR pv.persisted_type = 5))

Alfresco's RM audit

Scaling the audit Presentation of the components

The challenges Make the audit scale without hindering Alfresco Keep the audit queries fast Not delay too much time until stored Backward compatible with Alfresco's default Not break existing RM view audit Have a similar look-and-feel

The Alfresco search approach Move from Lucene Embedded in Alfresco Limited inspection tools To SOLR Externalised Can be clustered Comes with an administration console

Components overview

Syslog Standard, efficient and well integrated in Java Easy to implement a file rotation Possibility to re-compute all the audit data from file Lighten the weight on the database that is no longer a bottleneck

Logstash Open Source Works well with log files Able to handle our audit.log But potentially others too (OSSEC) Already has an ElasticSearch connector OOTB

Elastic search Open source Powerful indexing capabilities Easily scalable Can be queried from Alfresco

Kibana Open source Web UI for easy access

Scaling the audit (2) Overview of the implementation

Override the audit component

Specialise the audit component auditDAO.createAuditEntry(applicati onId, time, username, auditData); logAudit.createAuditEntry(applicati onId, time, username, auditData); Create JSON for action Add non-action-related parameters to JSON Configure Syslog Log the resulting JSON to audit.log

Quality-of-life improvements Poll Syslog availability If not available, switch system to readonly When back available, re-enable Toggle logging system Only file/only database/both Availability through JMX

Retrieve audit data

Replace audit webscript var nodeRefAuditURI = '/api/node/' model.nodeRef +'/rmauditlog'; var auditURI = "/api/rma/admin/rmauditlog"; var nodeRefAuditURI = '/api/node/'+ model.nodeRef +'/esauditlog?appname=RM'; var auditURI = "/api/es/admin/esauditlog"; audit.get.js Switch at the share level Enables us to handle a richer result returned Modifying the Alfresco webscript would be a viable approach too

Reap the benefit What this whole work was for

Performance / stability Asynchronous Processing after audit.log non-blocking Independent Audit falling doesn't bring Alfresco down Alfresco falling doesn't prevent to consult ES No stress on Alfresco database when querying audit data

Extend use of “view audit log”

Draw real time statistics out of it

Expanding on the architecture Non-alfresco-generated audit data

Auditing the logins <RecordValue key="user" dataExtractor="simpleValue" dataSource="/alfresco-access/loginUser" dataTrigger="/alfresco-access/login" /> <RecordValue key="user" dataExtractor="simpleValue" dataSource="/alfresco-access/loginUser" dataTrigger="/alfresco-access/loginFailure" /> /alfresco-access /login/user= /loginFailure/user= /logout/user=

Introduction to OSSec Open Source Intrusion Detection System log analysis file integrity checking rootkit detection Grabs data from most of our systems Software OS Inject in ElasticSearch via logstash

Auditing the security

Going forward How does it fit with Alfresco 5?

Alfresco5 analytics

Take-away Intellectual doggy bag

Conclusion A lot Of open source products Of scaling potential A reasonable amount Of Alfresco customisation A little Changed from the Out Of The Box Alfresco UI No Code change for non-alfresco technologies