Download presentation
Presentation is loading. Please wait.
Published byChristina Watts Modified over 8 years ago
1
Content Analytics - Gaining Insight from Your Content with NOSQL
2
Hey, I just met you...And this is crazy. But here's my info...So connect with me, maybe? Kyle Adams Americas Support Team Lead, Alfresco Twitter: @kylefadams LinkedIn: www.linkedin.com/in/kylefernandadamswww.linkedin.com/in/kylefernandadams Email Me: kyle.adams@alfresco.comkyle.adams@alfresco.com
3
Carly Rae Jepsen Anyone?...I apologize in advance
4
Agenda The NOSQL Universe Overview of Graph Databases and Neo4j Alfresco & Neo4j Use Cases
5
What is Content Analytics Applying Business Intelligence and Analytics to ECM Reporting != Analytics Identifying trends, patterns, and correlations Tell me what I don’t know!
6
Why NOSQL?
7
Driving Trends - Data Size Data size is increasing exponentially year after year
8
Driving Trends - Connectivity of Data Web 2.0 Flat Docs Hypertext “Web 3.0” Web 1.0.0 Feeds Hypertext Folksonomies Hypertext Tagging Hypertext Wikis Hypertext UGC Hypertext Blogs RDFa Hypertext Ontologies Hypertext GGG Hypertext
9
Driving Trend: Semi-Structured Information Content is becoming more unique We can blame generation Y for this! Before: Job Title - Software Engineer After: Job Title - ZOMG Awesome Core Repository Developer
10
RDBMS Performance Curve
11
NOSQL Database Types Column Family GraphDocument Key-Value
12
Key-Value Stores Most based on Dynamo white paper Dynamo: Amazon’s Highly Available Key Value Store (2007) Data Model Global key-value mapping Massively scalable HashMap Highly Fault Tolerant Examples Riak, Redis, Voldemort
13
Key Value Stores: Strengths and Weaknesses Strengths Simple data model Horizontally scalable Weaknesses Simple data model Poor at handling complex data
14
Column Family Based on Google’s BigTable white paper BigTable: Distributed Storage System for Structured Data (2006) Tuple of K-V where the key maps to a set of columns Map Reduce for querying and processing Examples Cassandra, HBase, Hypertable
15
Column Family: Strengths and Weaknesses Strengths Data model supports semi-structured data Naturally indexed (columns) Horizontally scalable Weaknesses Does not handle interconnected data well
16
Document-oriented Databases Data Model Collection of documents Document is a key-value collection Index-centric Examples MongoDB, CouchDB, Lotus Notes?
17
Document-oriented Databases: Strengths and Weaknesses Strengths Simple, but powerful data model Good scalability Weaknesses Does not handle interconnected data well Querying is limited to keys and indexes MapReduce for large queries
18
Graph Databases Data Model Nodes with properties Named relationships with properties Examples Neo4j, Sones GraphDB, OrientDB, InfiniteGraph, AllegroGraph
19
Graph Databases: Strengths and Weaknesses Strengths Extremely powerful data model Performant when querying interconnected data Easily to query Weaknesses Sharding Rewiring your brain
20
Complexity vs Size
21
Typical Use Cases for Graph Databases Recommendations Business Intelligence Social Computing Master Data Management Geospatial Genealogy Time Series Data Web Analytics Bioinformatics Indexing RDBMS
22
Maturity of Data Models Most NOSQL: ~6 years Relational: 42 years Graph Theory: 276 years
23
Leonhard Euler Inventor of Graph Theory (1736) Swiss mathematician The original hipster
24
Graph Data Model Person City Event
25
Graph Data Model Person City Event Is Attending Hosted In Is Located In Rated
26
Graph Data Model Person City Event Is Attending Hosted In Is Located In firstName:kyle lastName:adams name:DevCon name:San Jose country:USA Rated score: 11 out of 10 comment: Amazing!!!
27
Neo4j (...finally)
28
What is Neo4j? Leading Open Source graph database Embeddable and Server ACID compliant White board friendly Stable Has been in 24/7 operation since 2003
29
More Reasons Why Neo4j is Great High performance graph operations Traverse 1,000,000+ relationships/sec on commodity hardware 32 billion nodes & relationships per Neo4j instance 64 billion properties per Neo4j instance Small footprint Standalone server is ~65mb
30
If NOSQL stands for Not Only SQL,....then how do we execute queries?!
34
Traversals!
35
Social Network Performance MySQL vs Neo4j
36
Social Network Performance First rule of fight club: Run a friends of friends query Second rule of fight club: 1,000 Users Third rule of fight club: Average of 50 friends per user Fourth rule of fight club: Limit the depth of 5 Fifth rule of fight club: Intel i7 commodity laptop w/8GB RAM The Experiment: Round 1
37
Social Network Performance RDBMS Schema
38
Social Network Performance select distinct uf3.* from t_user_friend uf1 inner join t_user_friend uf2 on uf1.user_1 = uf2.user_2 inner join t_user_friend uf3 on uf2.user_1 = uf3.user_2 where uf1.user_1 = ? SQL: Friends of friends at depth 3
39
Social Network Performance MySQL Results: Round 1- 1,000 Users Depth Execution Time (sec) Records Returned 20.028~900 30.213~999 410.273~999 592,613.150~999
40
Social Network Performance Social Graph
41
Social Network Performance Neo4j Traversal API TraversalDescription traversalDescription = Traversal.description().relationships("IS_FRIEND_OF",Direction.OUTGOING).evaluator(Evaluators.atDepth(2)).uniqueness(Uniqueness.NODE_GLOBAL); Iterable nodes = traversalDescription.traverse(nodeById).nodes();
42
Social Network Performance Neo4j Results: Round 1- 1,000 Users Depth Execution Time (sec) Records Returned 20.04~900 30.06~999 40.07~999 50.07~999
43
Social Network Performance First rule of fight club: Run a friends of friends query Second rule of fight club: 1,000,000 Users Third rule of fight club: Average of 50 friends per user Fourth rule of fight club: Limit the depth of 5 Fifth rule of fight club: Intel i7 commodity laptop w/8GB RAM The Experiment: Round 2
44
Social Network Performance MySQL Results: Round 1- 1,000,000 Users Depth Execution Time (sec) Records Returned 20.016~2,500 330.267~125,000 4 1,543.505 ~600,00 5 Did not finish after an hour N/A
45
Social Network Performance Neo4j Results: Round 1- 1,00,000 Users Depth Execution Time (sec) Records Returned 20.010~2,500 30.168~110,000 41.359~600,000 52.132~800,000
46
Social Network Performance Why is RDBMS performance horrible? To find all friends on depth 5,MySQL will create Cartesian product on t_user_friend table 5 times Resulting in 50,000^5 records return All except 1,000 are discarded Neo4j will simply traverse through the nodes in the database until there are no more nodes in which to traverse
47
Social Network Performance The power of traversals Graphs data structures are localized Count all of the people around you Adding more people in the room may only slightly impact your performance to count your neighbors
48
Alfresco + Neo4j Integration
49
CMIS Web Scripts Spring Data Alfresco Insight CMIS: Document & Folders Web Scripts: Users & Groups Spring Data: Persist to Neo4j OpenNLP: Parse text and leave relevant words for analysis
50
Alfresco + Neo4j Use Cases
51
Retail Analytics: ACME Mullet Group ‘Murican Mullets Euro Mullets Mullets 4 Musicians
52
CMIS Web Scripts Alfresco Insight Spring Data Creates Social ContentFollows Retail Analytics: ACME Mullet Group
53
Healthcare Analytics: Chronic Disease Management Congestive Heart Failure (CHF)
54
CHF patients have the highest number of 30-day readmissions in the United States 1 in 5 patients suffer from preventable readmissions Preventable 30-day readmissions account for ~$17 billion in total Medicare costs Under the Patient Protection and Affordable Care Act Centers for Medicare & Medicaid services will be penalizing providers for high rates of preventable 30-day readmissions Healthcare Analytics: Chronic Disease Management
55
Use comparative analytics to compare graphs of patients with CHF and family members with CHF Preventative Care via Identifying indicators for CHF Problem: structured data only represent ~20% of total available data (i.e. data from EMR’s) Solution: Import unstructured data such as nurses notes and lab results and provide the ability to analyze such data Healthcare Analytics: Chronic Disease Management
56
Alfresco + Neo4j Integration Progress
57
CMIS integration Web Script integration Users and Groups Alfresco + Neo4j: What have I Completed to Date
58
CMIS Integration Web Script Integration Workflow Auditing Activities UI Graph Visualization Charting Activiti integration Create a process instance to analyze data if pattern X exists in the graph Alfresco + Neo4j: What’s Next
59
https://github.com/kylefernandadams/alfresco-insight Soon to be on GitHub
60
Neo Technology Neo4j and Spring Data Communities Authors of Neo4j in Action Credits
61
Questions?!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.