Audit & Reporting with Alfresco & NoSQL architecture Lucas Patingre Alfresco consultant and technical lead at Zaizi
Summary Some context Alfresco audit Scaling the audit Benefits Expanding on the architecture Going forward
Why auditing? A bit of context
Zaizi for Alfresco: Platinum Partner and Best System Integrator Partner of Year 2012 & 2013 Specialist Alfresco ECM partner. Implemented the biggest and most complex Alfresco projects in the UK. Help global enterprises manage their growing information volumes by leveraging Alfresco’s outstanding performance and scalability. Alfresco Partner of the Year 2012 & 2013
Audit data from Alfresco Who When What Action Target Details /alfresco-access /transaction /action=<actionNamegt /sub-actions=<sub action listgt /path=<prefixPathgt /type=<prefixTypegt /node=<nodeRefgt /user=<usergt /copy /from /node=<nodeRefgt /path=<prefixPathgt /type=<prefixTypegt /move /from /node=<nodeRefgt /path=<prefixPathgt /type=<prefixTypegt /properties /from=<mapOfValuesgt /<propertyNamegt=<propertyValuegt /to=<mapOfValuesgt /<propertyNamegt=<propertyValuegt /add=<mapOfValuesgt /<propertyNamegt=<propertyValuegt /delete=<mapOfValuesgt /<propertyNamegt=<propertyValuegt /aspects /add=<mapOfNamesgt /<aspectNamegt=null /delete=<mapOfNamesgt /<aspectNamegt=null /version-properties=<mapOfValuesgt /sub-action/<sequencegt /action=<actionNamegt /copy /move /properties /aspects
Existing Alfresco audit Sharing Alfresco database
Components overview
Alfresco's audit dashlet
Alfresco's audit storage
SQL to retrieve audit entries SELECT entry.id 'Id', entry.audit_time 'Time', user_string.string_value 'User', act_string.string_value 'Application', sv.string_value 'Value' FROM alf_audit_entry entry INNER JOIN alf_prop_value user ON (entry.audit_user_id = user.id) INNER JOIN alf_prop_string_value user_string ON ((user.persisted_type = 3 OR user.persisted_type = 5) AND user.long_value = user_string.id) INNER JOIN alf_audit_app app ON (entry.audit_app_id = app.id) INNER JOIN alf_prop_value act ON (app.app_name_id = act.id) INNER JOIN alf_prop_string_value act_string ON ((act.persisted_type = 3 OR act.persisted_type = 5) AND act.long_value = act_string.id) INNER JOIN alf_prop_link pl on (pl.root_prop_id = entry.audit_values_id) INNER JOIN alf_prop_value pv on (pl.value_prop_id = pv.id) LEFT JOIN alf_prop_string_value sv on (sv.id = pv.long_value and (pv.persisted_type = 3 OR pv.persisted_type = 5))
Alfresco's RM audit
Scaling the audit Presentation of the components
The challenges Make the audit scale without hindering Alfresco Keep the audit queries fast Not delay too much time until stored Backward compatible with Alfresco's default Not break existing RM view audit Have a similar look-and-feel
The Alfresco search approach Move from Lucene Embedded in Alfresco Limited inspection tools To SOLR Externalised Can be clustered Comes with an administration console
Components overview
Syslog Standard, efficient and well integrated in Java Easy to implement a file rotation Possibility to re-compute all the audit data from file Lighten the weight on the database that is no longer a bottleneck
Logstash Open Source Works well with log files Able to handle our audit.log But potentially others too (OSSEC) Already has an ElasticSearch connector OOTB
Elastic search Open source Powerful indexing capabilities Easily scalable Can be queried from Alfresco
Kibana Open source Web UI for easy access
Scaling the audit (2) Overview of the implementation
Override the audit component
Specialise the audit component auditDAO.createAuditEntry(applicati onId, time, username, auditData); logAudit.createAuditEntry(applicati onId, time, username, auditData); Create JSON for action Add non-action-related parameters to JSON Configure Syslog Log the resulting JSON to audit.log
Quality-of-life improvements Poll Syslog availability If not available, switch system to readonly When back available, re-enable Toggle logging system Only file/only database/both Availability through JMX
Retrieve audit data
Replace audit webscript var nodeRefAuditURI = '/api/node/' model.nodeRef +'/rmauditlog'; var auditURI = "/api/rma/admin/rmauditlog"; var nodeRefAuditURI = '/api/node/'+ model.nodeRef +'/esauditlog?appname=RM'; var auditURI = "/api/es/admin/esauditlog"; audit.get.js Switch at the share level Enables us to handle a richer result returned Modifying the Alfresco webscript would be a viable approach too
Reap the benefit What this whole work was for
Performance / stability Asynchronous Processing after audit.log non-blocking Independent Audit falling doesn't bring Alfresco down Alfresco falling doesn't prevent to consult ES No stress on Alfresco database when querying audit data
Extend use of “view audit log”
Draw real time statistics out of it
Expanding on the architecture Non-alfresco-generated audit data
Auditing the logins <RecordValue key="user" dataExtractor="simpleValue" dataSource="/alfresco-access/loginUser" dataTrigger="/alfresco-access/login" /> <RecordValue key="user" dataExtractor="simpleValue" dataSource="/alfresco-access/loginUser" dataTrigger="/alfresco-access/loginFailure" /> /alfresco-access /login/user= /loginFailure/user= /logout/user=
Introduction to OSSec Open Source Intrusion Detection System log analysis file integrity checking rootkit detection Grabs data from most of our systems Software OS Inject in ElasticSearch via logstash
Auditing the security
Going forward How does it fit with Alfresco 5?
Alfresco5 analytics
Take-away Intellectual doggy bag
Conclusion A lot Of open source products Of scaling potential A reasonable amount Of Alfresco customisation A little Changed from the Out Of The Box Alfresco UI No Code change for non-alfresco technologies