Amaze business, make your devs happy curl -XGET http://localhost:9200/ ElasticSearch Amaze business, make your devs happy Kilka słów o sobie Sebastian Belczyk, @sbelczyk 25/03/2013 #EllerslieDNUG
ElasticSearch You know for search Real time search and analytics engine No-SQL Document database Use Lucene for indexing It’s horizontally and verticaly scalable Automatic cluster formation Fault tolerant Zero config (at the begining) Nice RESTfull API You know for search Structured data, like well defined json objects Unstructred data like logs Full text search (pdfs, real world documents) Real time search and analytics You basicly feed tons of data, then search it, and it’s lighting fast Document No-SQL database Use JSON Use Lucene for indexing Java library for creating full text search index It’s horizontally scalable Sharding Automatic cluster formation By defualt use multicast, new nodes connect to cluster with the same name Fault tolerant Partition tolerant, shrad repolication, automatic data recovery Zero config (at the begining) Later you need tune configuration to your need Who’s using GitHub (migrated form solar) Wikimedia Guardian LiveChat XING Fog Creek SoundClound
ElasticSearch Index data Search and retrive SQL DB Application
Data storege ElasticSearch stores documents in indices Each index can contain multiple types of documents Index is splited into multiple shards Each shard may be stored on a different node ElasticSearc stores documents in indices Something like SQL Database Each index can contain multiple types of documents Something like table Each type has type specific schema, which tells what are types of fields Index is splited into multiple shards Each shard may be stored on a different node
Shrads allocation Node 1 Node 1 Node 2 P1 P2 P3 P1 P2 P3 R1 R2 R3 When we carete index we decide how many shrads we want By default it’s 5 which means we can have up to 5 nodes each containing one primary shard Primary shard means it’s not replica Each primary shard is mapped 1:1 to lucene index We use overallocation to accomodate index for future groth Depending on configuration search will be completed on a node we’re connected to or on a seperate nodes (if we require search to work on primary shards If we add a node shard distribution will be balanced
Shrads allocation Node 1 Node 2 Node 3 Node 1 Node 2 Node 3 P2 P3 P1 When we carete index we decide how many shrads we want By default it’s 5 which means we can have up to 5 nodes each containing one primary shard Primary shard means it’s not replica Each primary shard is mapped 1:1 to lucene index We use overallocation to accomodate index for future groth Depending on configuration search will be completed on a node we’re connected to or on a seperate nodes (if we require search to work on primary shards If we add a node shard distribution will be balanced
Quering Search Facets and Aggregations Suggestions Words and n-grams Geo location Date and time Value ranges Fuzzy maching Facets and Aggregations Distinct values for given field with document count Statistics for numeric fileds (average, min, max) Time series Suggestions Autocomplete Did you mean More like Search based on number of cryteria Words and ngrams Geo location geo_distance geo_bounding_box Geo_polygon Time Statistics (facets,aggregations) Distinct values for given field with document count For numeric fileds statistics (average, min, max)
Query example
{ "query": { "filtered": { "match": { "name": { "query": "amd" } }, "bool": { "must": [ "term": { "category": "CPUs" "range": { "price": { "from": 200, "to": 300 "cores": "4" ]
.net Clients NEST PlainElastic.Net ElasticSearch.NET NEST Most mature, static or dynamic PlainElastic.Net jNo json generation ElasticSearch.NET Requires Thrift plugin
Scoring Scoring functions Boost queries Boost filters Decay functions Custom score functions
Indexing Clinet Index Stored in transaction log Flush Indexed in ES Refresh Available for search
Indexing When indexing large amount of documents adjust: refresh_interval translog.flush_threshold_period translog.flush_threshold_ops
Testing
Deployment Requirements: Steps Java Server JRE JAVA_HOME variable pointing to JRE (not bin) Steps From ElasticSearch dir run bin/service install Change service start mode to automatic and run service
Tools Sense Kibana Logstash Marvel Rivers
Tools
Learning materials http://goo.gl/JUNWRZ Videos Articles Books http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/index.h tml