Download presentation
Presentation is loading. Please wait.
1
Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com
Battle of the Giants Round 2 VS Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com
2
Copyright 2013 Sematext Group. Inc. All rights reserved
Ich bin ein… Sematext consultant & engineer Solr Cookbook series author „ElasticSearch Server” author „Mastering ElasticSearch” author Solr.pl co-founder Father and husband Copyright 2013 Sematext Group. Inc. All rights reserved
3
Copyright 2013 Sematext Group. Inc. All rights reserved
VS Copyright 2013 Sematext Group. Inc. All rights reserved
4
Copyright 2013 Sematext Group. Inc. All rights reserved
Under the Hood Lucene 4.3 Lucene 4.3 Copyright 2013 Sematext Group. Inc. All rights reserved
5
Copyright 2013 Sematext Group. Inc. All rights reserved
Expectations Scalability Fault toleranance High availablity Features Manageability Ease of installation Tools Support Copyright 2013 Sematext Group. Inc. All rights reserved
6
Expectations vs Reality
Solr + ZooKeeper Leader per shard Only ElasticSearch nodes Single leader Distributed Fault tolerant Automatic leader election Copyright 2013 Sematext Group. Inc. All rights reserved
7
All Time Top Committers
Copyright 2013 Sematext Group. Inc. All rights reserved
8
Copyright 2013 Sematext Group. Inc. All rights reserved
Active Contributors Copyright 2013 Sematext Group. Inc. All rights reserved
9
Copyright 2013 Sematext Group. Inc. All rights reserved
The Code Copyright 2013 Sematext Group. Inc. All rights reserved
10
Copyright 2013 Sematext Group. Inc. All rights reserved
The Mailing Lists Copyright 2013 Sematext Group. Inc. All rights reserved
11
Copyright 2013 Sematext Group. Inc. All rights reserved
Trends Copyright 2013 Sematext Group. Inc. All rights reserved
12
Copyright 2013 Sematext Group. Inc. All rights reserved
Collection vs Index Collection – main logical index Index – main logical structure Collections and Indices can be spread among different nodes in the cluster Copyright 2013 Sematext Group. Inc. All rights reserved
13
Apache Solr Index Structure
Field and types defined in schema Automatic value copying Dynamic fields Custom similarity Custom postings format Multiple document types require shared schema Can be read using API Copyright 2013 Sematext Group. Inc. All rights reserved
14
ElasticSearch Index Structure
Schema - less Fields and types defined with HTTP API Multi – field support Nested and parent – child documents Custom similarity Custom postings format Multiple document with different structure Can be read and written using API Copyright 2013 Sematext Group. Inc. All rights reserved
15
Copyright 2013 Sematext Group. Inc. All rights reserved
Shards and Replicas Many shards 0 or more replicas Replica can become leader Replicas can be created on live cluster Copyright 2013 Sematext Group. Inc. All rights reserved
16
Configuration Static in solrconfig.xml Can be reloaded with core reload Static in elasticsearch.yml Changable at runtime Copyright 2013 Sematext Group. Inc. All rights reserved
17
Copyright 2013 Sematext Group. Inc. All rights reserved
Discovery Apache Zookeeper Zen Discovery Copyright 2013 Sematext Group. Inc. All rights reserved
18
Copyright 2013 Sematext Group. Inc. All rights reserved
Solr & ZooKeeper Requires additional software Prevents split – brain situations Holds collections configurations ZooKeeper ensemble needed Copyright 2013 Sematext Group. Inc. All rights reserved
19
ElasticSearch Zen Discovery
Automatic node discovery Multicast and unicast discovery methods Automatic master detection Two - way failure detection Copyright 2013 Sematext Group. Inc. All rights reserved
20
Copyright 2013 Sematext Group. Inc. All rights reserved
HTTP FTW HTTP REST API in ElasticSearch or Query String for simple queries HTTP with Query String in Apache Solr Both provide specialized Java API Copyright 2013 Sematext Group. Inc. All rights reserved
21
Copyright 2013 Sematext Group. Inc. All rights reserved
Results Grouping Group on: field value query result function query Copyright 2013 Sematext Group. Inc. All rights reserved
22
Copyright 2013 Sematext Group. Inc. All rights reserved
Prospective Search Called Percolator Matches documents to stored queries Copyright 2013 Sematext Group. Inc. All rights reserved
23
Full Text Search Capabilities
Variety of queries Control score calculation Different query parsers Advanced Lucene queries Copyright 2013 Sematext Group. Inc. All rights reserved
24
Copyright 2013 Sematext Group. Inc. All rights reserved
Score Calculation Leverage Lucene scoring Control importance of: documents queries terms phrases Similiarity configuration Copyright 2013 Sematext Group. Inc. All rights reserved
25
Apache Solr and Score Influence
Index - time boosting Query - time Term boosts Field boosts Phrases boost Function queries Sub-queries used for boosting Copyright 2013 Sematext Group. Inc. All rights reserved
26
ElasticSearch and Score Influence
Index - time Query - time Different queries provide different boost controls Can calculate distributed term frequencies Negative and Positive boosting queries Custom score filters Scripts Copyright 2013 Sematext Group. Inc. All rights reserved
27
ElasticSearch Query Rescore
Reorders top N hits by using other query Executed on shards before results are returned to the node handling it Not executed with scan and count Copyright 2013 Sematext Group. Inc. All rights reserved
28
ElasticSearch Nested Objects
Indexed as separate documents Stored in the same part of index as root doc Hidden from standard queries and filters Need appropriate queries and filters (nested) Top level documents can be sorted on the basis of nested ones Copyright 2013 Sematext Group. Inc. All rights reserved
29
Solr Parent – Child Relationship
Used at query time Multi core joins possible select?q={!join from=parent to=id}color:Yellow Copyright 2013 Sematext Group. Inc. All rights reserved
30
ElasticSearch Parent – Child
Proper indexing required Indexed as separate documents Standard queries don’t return child documents Retrieve parent docs using queries and filters (has_child, has_parent, top_children) Copyright 2013 Sematext Group. Inc. All rights reserved
31
Copyright 2013 Sematext Group. Inc. All rights reserved
Filters Used to narrown down query results Good candidates for caching and reuse Addictive Can use different query parsers Can use local params Narrows down faceting results Defined using Query DSL Can be used for score calculation Doesn’t narrow down faceting results Copyright 2013 Sematext Group. Inc. All rights reserved
32
Copyright 2013 Sematext Group. Inc. All rights reserved
Faceting Pivot Histograms Terms Range & query Terms statistics Spatial distance Copyright 2013 Sematext Group. Inc. All rights reserved
33
Real Time Or Not ? Get not yet indexed docs from transaction log Don’t need searcher reopening Separate Realtime Get Handler Separate Get and Multi Get API Copyright 2013 Sematext Group. Inc. All rights reserved
34
Data Handling Single and batch indexing supported
Different formats allowed (XML, JSON, CSV, binary) JSON in / JSON out (and YAML) Single and batch indexing supported Copyright 2013 Sematext Group. Inc. All rights reserved
35
Partial Document Updates
Not based on LUCENE-3837 Server-side doc reindexing Both servers use versioning Decreases network traffic Copyright 2013 Sematext Group. Inc. All rights reserved
36
Apache Solr Partial Doc Update
Sent to the standard update handler Requires _version_ field curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[ { "id" : "12345", "enabled" : { "set" : true } } ]' Copyright 2013 Sematext Group. Inc. All rights reserved
37
ElasticSearch Partial Doc Update
Special end – point exposed - _update Supports parameters like routing, parent, replication, percolate, etc (similar to Index API) Uses scripts to perform document updates curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{ "script" : "ctx._source.enabled = enabled", "params" : { "enabled" : true } }' Copyright 2013 Sematext Group. Inc. All rights reserved
38
Copyright 2013 Sematext Group. Inc. All rights reserved
Solr Collections API Collection creation reload deletion shards splitting Copyright 2013 Sematext Group. Inc. All rights reserved
39
ElasticSearch Indices REST API
Index creation deletion closing and opening refreshing existence checking Copyright 2013 Sematext Group. Inc. All rights reserved
40
Apache Solr Shard Splitting
admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1 Copyright 2013 Sematext Group. Inc. All rights reserved
41
Cluster State Monitoring
Multiple MBeans exposed by JMX Multiple REST end – points exposed to get different statistics Copyright 2013 Sematext Group. Inc. All rights reserved
42
ElasticSearch Statistics API
Health and state check Nodes information Cache statistics Segments information Index information Mappings information SPM – „One to rule them all” Copyright 2013 Sematext Group. Inc. All rights reserved
43
ElasticSearch Cluster Settings Update
Control rebalancing recovery allocation Change cluster configuration properties Copyright 2013 Sematext Group. Inc. All rights reserved
44
ElasticSearch Custom Shard Allocation
Cluster level: Index level: curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.exclude._ip" : " " } }' curl -XPUT localhost:9200/sematext/_settings/ -d '{ "index.routing.allocation.include.tag" : "nodeOne,nodeTwo" }' Copyright 2013 Sematext Group. Inc. All rights reserved
45
Moving Shards and Replicas
Move shards between nodes on demand curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}}, {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}} ] }' Copyright 2013 Sematext Group. Inc. All rights reserved
46
Copyright 2013 Sematext Group. Inc. All rights reserved
The Verdict Copyright 2013 Sematext Group. Inc. All rights reserved
47
Copyright 2013 Sematext Group. Inc. All rights reserved
And The Winner Is ? The Users Copyright 2013 Sematext Group. Inc. All rights reserved
48
Copyright 2013 Sematext Group. Inc. All rights reserved
We Are Hiring ! Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open – source ? We’re hiring world – wide ! Copyright 2013 Sematext Group. Inc. All rights reserved
49
Copyright 2013 Sematext Group. Inc. All rights reserved
Thank You ! Rafał Kuć @kucrafal Sematext @sematext ElasticSearch Server 25% off: MREESS25 Copyright 2013 Sematext Group. Inc. All rights reserved
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.