Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com Battle of the Giants Round 2 VS Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com
Copyright 2013 Sematext Group. Inc. All rights reserved Ich bin ein… Sematext consultant & engineer Solr Cookbook series author „ElasticSearch Server” author „Mastering ElasticSearch” author Solr.pl co-founder Father and husband Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved VS Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Under the Hood Lucene 4.3 Lucene 4.3 Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Expectations Scalability Fault toleranance High availablity Features Manageability Ease of installation Tools Support Copyright 2013 Sematext Group. Inc. All rights reserved
Expectations vs Reality Solr + ZooKeeper Leader per shard Only ElasticSearch nodes Single leader Distributed Fault tolerant Automatic leader election Copyright 2013 Sematext Group. Inc. All rights reserved
All Time Top Committers Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Active Contributors Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved The Code Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved The Mailing Lists Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Trends Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Collection vs Index Collection – main logical index Index – main logical structure Collections and Indices can be spread among different nodes in the cluster Copyright 2013 Sematext Group. Inc. All rights reserved
Apache Solr Index Structure Field and types defined in schema Automatic value copying Dynamic fields Custom similarity Custom postings format Multiple document types require shared schema Can be read using API Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Index Structure Schema - less Fields and types defined with HTTP API Multi – field support Nested and parent – child documents Custom similarity Custom postings format Multiple document with different structure Can be read and written using API Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Shards and Replicas Many shards 0 or more replicas Replica can become leader Replicas can be created on live cluster Copyright 2013 Sematext Group. Inc. All rights reserved
Configuration Static in solrconfig.xml Can be reloaded with core reload Static in elasticsearch.yml Changable at runtime Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Discovery Apache Zookeeper Zen Discovery Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Solr & ZooKeeper Requires additional software Prevents split – brain situations Holds collections configurations ZooKeeper ensemble needed Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Zen Discovery Automatic node discovery Multicast and unicast discovery methods Automatic master detection Two - way failure detection Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved HTTP FTW HTTP REST API in ElasticSearch or Query String for simple queries HTTP with Query String in Apache Solr Both provide specialized Java API Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Results Grouping Group on: field value query result function query Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Prospective Search Called Percolator Matches documents to stored queries Copyright 2013 Sematext Group. Inc. All rights reserved
Full Text Search Capabilities Variety of queries Control score calculation Different query parsers Advanced Lucene queries Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Score Calculation Leverage Lucene scoring Control importance of: documents queries terms phrases Similiarity configuration Copyright 2013 Sematext Group. Inc. All rights reserved
Apache Solr and Score Influence Index - time boosting Query - time Term boosts Field boosts Phrases boost Function queries Sub-queries used for boosting Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch and Score Influence Index - time Query - time Different queries provide different boost controls Can calculate distributed term frequencies Negative and Positive boosting queries Custom score filters Scripts Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Query Rescore Reorders top N hits by using other query Executed on shards before results are returned to the node handling it Not executed with scan and count Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Nested Objects Indexed as separate documents Stored in the same part of index as root doc Hidden from standard queries and filters Need appropriate queries and filters (nested) Top level documents can be sorted on the basis of nested ones Copyright 2013 Sematext Group. Inc. All rights reserved
Solr Parent – Child Relationship Used at query time Multi core joins possible select?q={!join from=parent to=id}color:Yellow Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Parent – Child Proper indexing required Indexed as separate documents Standard queries don’t return child documents Retrieve parent docs using queries and filters (has_child, has_parent, top_children) Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Filters Used to narrown down query results Good candidates for caching and reuse Addictive Can use different query parsers Can use local params Narrows down faceting results Defined using Query DSL Can be used for score calculation Doesn’t narrow down faceting results Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Faceting Pivot Histograms Terms Range & query Terms statistics Spatial distance Copyright 2013 Sematext Group. Inc. All rights reserved
Real Time Or Not ? Get not yet indexed docs from transaction log Don’t need searcher reopening Separate Realtime Get Handler Separate Get and Multi Get API Copyright 2013 Sematext Group. Inc. All rights reserved
Data Handling Single and batch indexing supported Different formats allowed (XML, JSON, CSV, binary) JSON in / JSON out (and YAML) Single and batch indexing supported Copyright 2013 Sematext Group. Inc. All rights reserved
Partial Document Updates Not based on LUCENE-3837 Server-side doc reindexing Both servers use versioning Decreases network traffic Copyright 2013 Sematext Group. Inc. All rights reserved
Apache Solr Partial Doc Update Sent to the standard update handler Requires _version_ field curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[ { "id" : "12345", "enabled" : { "set" : true } } ]' Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Partial Doc Update Special end – point exposed - _update Supports parameters like routing, parent, replication, percolate, etc (similar to Index API) Uses scripts to perform document updates curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{ "script" : "ctx._source.enabled = enabled", "params" : { "enabled" : true } }' Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Solr Collections API Collection creation reload deletion shards splitting Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Indices REST API Index creation deletion closing and opening refreshing existence checking Copyright 2013 Sematext Group. Inc. All rights reserved
Apache Solr Shard Splitting admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1 Copyright 2013 Sematext Group. Inc. All rights reserved
Cluster State Monitoring Multiple MBeans exposed by JMX Multiple REST end – points exposed to get different statistics Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Statistics API Health and state check Nodes information Cache statistics Segments information Index information Mappings information SPM – „One to rule them all” Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Cluster Settings Update Control rebalancing recovery allocation Change cluster configuration properties Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Custom Shard Allocation Cluster level: Index level: curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.exclude._ip" : "192.168.2.1" } }' curl -XPUT localhost:9200/sematext/_settings/ -d '{ "index.routing.allocation.include.tag" : "nodeOne,nodeTwo" }' Copyright 2013 Sematext Group. Inc. All rights reserved
Moving Shards and Replicas Move shards between nodes on demand curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}}, {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}} ] }' Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved The Verdict Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved And The Winner Is ? The Users Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved We Are Hiring ! Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open – source ? We’re hiring world – wide ! http://sematext.com/about/jobs.html Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved Thank You ! Rafał Kuć @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com http://blog.sematext.com ElasticSearch Server 25% off: MREESS25 Copyright 2013 Sematext Group. Inc. All rights reserved