Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rafał Kuć – Sematext sematext.com

Similar presentations


Presentation on theme: "Rafał Kuć – Sematext sematext.com"— Presentation transcript:

1 Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com
Battle of the Giants Round 2 VS Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com

2 Copyright 2013 Sematext Group. Inc. All rights reserved
Ich bin ein… Sematext consultant & engineer Solr Cookbook series author „ElasticSearch Server” author „Mastering ElasticSearch” author Solr.pl co-founder Father and husband  Copyright 2013 Sematext Group. Inc. All rights reserved

3 Copyright 2013 Sematext Group. Inc. All rights reserved
VS Copyright 2013 Sematext Group. Inc. All rights reserved

4 Copyright 2013 Sematext Group. Inc. All rights reserved
Under the Hood Lucene 4.3 Lucene 4.3 Copyright 2013 Sematext Group. Inc. All rights reserved

5 Copyright 2013 Sematext Group. Inc. All rights reserved
Expectations Scalability Fault toleranance High availablity Features Manageability Ease of installation Tools Support Copyright 2013 Sematext Group. Inc. All rights reserved

6 Expectations vs Reality
Solr + ZooKeeper Leader per shard Only ElasticSearch nodes Single leader Distributed Fault tolerant Automatic leader election Copyright 2013 Sematext Group. Inc. All rights reserved

7 All Time Top Committers
Copyright 2013 Sematext Group. Inc. All rights reserved

8 Copyright 2013 Sematext Group. Inc. All rights reserved
Active Contributors Copyright 2013 Sematext Group. Inc. All rights reserved

9 Copyright 2013 Sematext Group. Inc. All rights reserved
The Code Copyright 2013 Sematext Group. Inc. All rights reserved

10 Copyright 2013 Sematext Group. Inc. All rights reserved
The Mailing Lists Copyright 2013 Sematext Group. Inc. All rights reserved

11 Copyright 2013 Sematext Group. Inc. All rights reserved
Trends Copyright 2013 Sematext Group. Inc. All rights reserved

12 Copyright 2013 Sematext Group. Inc. All rights reserved
Collection vs Index Collection – main logical index Index – main logical structure Collections and Indices can be spread among different nodes in the cluster Copyright 2013 Sematext Group. Inc. All rights reserved

13 Apache Solr Index Structure
Field and types defined in schema Automatic value copying Dynamic fields Custom similarity Custom postings format Multiple document types require shared schema Can be read using API Copyright 2013 Sematext Group. Inc. All rights reserved

14 ElasticSearch Index Structure
Schema - less Fields and types defined with HTTP API Multi – field support Nested and parent – child documents Custom similarity Custom postings format Multiple document with different structure Can be read and written using API Copyright 2013 Sematext Group. Inc. All rights reserved

15 Copyright 2013 Sematext Group. Inc. All rights reserved
Shards and Replicas Many shards 0 or more replicas Replica can become leader Replicas can be created on live cluster Copyright 2013 Sematext Group. Inc. All rights reserved

16 Configuration Static in solrconfig.xml Can be reloaded with core reload Static in elasticsearch.yml Changable at runtime Copyright 2013 Sematext Group. Inc. All rights reserved

17 Copyright 2013 Sematext Group. Inc. All rights reserved
Discovery Apache Zookeeper Zen Discovery Copyright 2013 Sematext Group. Inc. All rights reserved

18 Copyright 2013 Sematext Group. Inc. All rights reserved
Solr & ZooKeeper Requires additional software Prevents split – brain situations Holds collections configurations ZooKeeper ensemble needed Copyright 2013 Sematext Group. Inc. All rights reserved

19 ElasticSearch Zen Discovery
Automatic node discovery Multicast and unicast discovery methods Automatic master detection Two - way failure detection Copyright 2013 Sematext Group. Inc. All rights reserved

20 Copyright 2013 Sematext Group. Inc. All rights reserved
HTTP FTW HTTP REST API in ElasticSearch or Query String for simple queries HTTP with Query String in Apache Solr Both provide specialized Java API Copyright 2013 Sematext Group. Inc. All rights reserved

21 Copyright 2013 Sematext Group. Inc. All rights reserved
Results Grouping Group on: field value query result function query Copyright 2013 Sematext Group. Inc. All rights reserved

22 Copyright 2013 Sematext Group. Inc. All rights reserved
Prospective Search Called Percolator Matches documents to stored queries Copyright 2013 Sematext Group. Inc. All rights reserved

23 Full Text Search Capabilities
Variety of queries Control score calculation Different query parsers Advanced Lucene queries Copyright 2013 Sematext Group. Inc. All rights reserved

24 Copyright 2013 Sematext Group. Inc. All rights reserved
Score Calculation Leverage Lucene scoring Control importance of: documents queries terms phrases Similiarity configuration Copyright 2013 Sematext Group. Inc. All rights reserved

25 Apache Solr and Score Influence
Index - time boosting Query - time Term boosts Field boosts Phrases boost Function queries Sub-queries used for boosting Copyright 2013 Sematext Group. Inc. All rights reserved

26 ElasticSearch and Score Influence
Index - time Query - time Different queries provide different boost controls Can calculate distributed term frequencies Negative and Positive boosting queries Custom score filters Scripts Copyright 2013 Sematext Group. Inc. All rights reserved

27 ElasticSearch Query Rescore
Reorders top N hits by using other query Executed on shards before results are returned to the node handling it Not executed with scan and count Copyright 2013 Sematext Group. Inc. All rights reserved

28 ElasticSearch Nested Objects
Indexed as separate documents Stored in the same part of index as root doc Hidden from standard queries and filters Need appropriate queries and filters (nested) Top level documents can be sorted on the basis of nested ones Copyright 2013 Sematext Group. Inc. All rights reserved

29 Solr Parent – Child Relationship
Used at query time Multi core joins possible select?q={!join from=parent to=id}color:Yellow Copyright 2013 Sematext Group. Inc. All rights reserved

30 ElasticSearch Parent – Child
Proper indexing required Indexed as separate documents Standard queries don’t return child documents Retrieve parent docs using queries and filters (has_child, has_parent, top_children) Copyright 2013 Sematext Group. Inc. All rights reserved

31 Copyright 2013 Sematext Group. Inc. All rights reserved
Filters Used to narrown down query results Good candidates for caching and reuse Addictive Can use different query parsers Can use local params Narrows down faceting results Defined using Query DSL Can be used for score calculation Doesn’t narrow down faceting results Copyright 2013 Sematext Group. Inc. All rights reserved

32 Copyright 2013 Sematext Group. Inc. All rights reserved
Faceting Pivot Histograms Terms Range & query Terms statistics Spatial distance Copyright 2013 Sematext Group. Inc. All rights reserved

33 Real Time Or Not ? Get not yet indexed docs from transaction log Don’t need searcher reopening Separate Realtime Get Handler Separate Get and Multi Get API Copyright 2013 Sematext Group. Inc. All rights reserved

34 Data Handling Single and batch indexing supported
Different formats allowed (XML, JSON, CSV, binary) JSON in / JSON out (and YAML) Single and batch indexing supported Copyright 2013 Sematext Group. Inc. All rights reserved

35 Partial Document Updates
Not based on LUCENE-3837 Server-side doc reindexing Both servers use versioning Decreases network traffic Copyright 2013 Sematext Group. Inc. All rights reserved

36 Apache Solr Partial Doc Update
Sent to the standard update handler Requires _version_ field curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[ { "id" : "12345", "enabled" : { "set" : true } } ]' Copyright 2013 Sematext Group. Inc. All rights reserved

37 ElasticSearch Partial Doc Update
Special end – point exposed - _update Supports parameters like routing, parent, replication, percolate, etc (similar to Index API) Uses scripts to perform document updates curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{ "script" : "ctx._source.enabled = enabled", "params" : { "enabled" : true } }' Copyright 2013 Sematext Group. Inc. All rights reserved

38 Copyright 2013 Sematext Group. Inc. All rights reserved
Solr Collections API Collection creation reload deletion shards splitting Copyright 2013 Sematext Group. Inc. All rights reserved

39 ElasticSearch Indices REST API
Index creation deletion closing and opening refreshing existence checking Copyright 2013 Sematext Group. Inc. All rights reserved

40 Apache Solr Shard Splitting
admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1 Copyright 2013 Sematext Group. Inc. All rights reserved

41 Cluster State Monitoring
Multiple MBeans exposed by JMX Multiple REST end – points exposed to get different statistics Copyright 2013 Sematext Group. Inc. All rights reserved

42 ElasticSearch Statistics API
Health and state check Nodes information Cache statistics Segments information Index information Mappings information SPM – „One to rule them all” Copyright 2013 Sematext Group. Inc. All rights reserved

43 ElasticSearch Cluster Settings Update
Control rebalancing recovery allocation Change cluster configuration properties Copyright 2013 Sematext Group. Inc. All rights reserved

44 ElasticSearch Custom Shard Allocation
Cluster level: Index level: curl -XPUT localhost:9200/_cluster/settings -d '{     "persistent" : {         "cluster.routing.allocation.exclude._ip" : " "     } }' curl -XPUT localhost:9200/sematext/_settings/ -d '{     "index.routing.allocation.include.tag" : "nodeOne,nodeTwo" }' Copyright 2013 Sematext Group. Inc. All rights reserved

45 Moving Shards and Replicas
Move shards between nodes on demand curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}}, {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}} ] }' Copyright 2013 Sematext Group. Inc. All rights reserved

46 Copyright 2013 Sematext Group. Inc. All rights reserved
The Verdict Copyright 2013 Sematext Group. Inc. All rights reserved

47 Copyright 2013 Sematext Group. Inc. All rights reserved
And The Winner Is ? The Users Copyright 2013 Sematext Group. Inc. All rights reserved

48 Copyright 2013 Sematext Group. Inc. All rights reserved
We Are Hiring ! Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open – source ? We’re hiring world – wide ! Copyright 2013 Sematext Group. Inc. All rights reserved

49 Copyright 2013 Sematext Group. Inc. All rights reserved
Thank You ! Rafał Kuć @kucrafal Sematext @sematext ElasticSearch Server 25% off: MREESS25 Copyright 2013 Sematext Group. Inc. All rights reserved


Download ppt "Rafał Kuć – Sematext sematext.com"

Similar presentations


Ads by Google