Safe by default, optimized for efficiency
RAVENDB Open source Document database Built with C# Data saved as JSON Uses Lucene.NET for indexing Uses Esent for storage Latest stable version: 3.5.3 Available only for Windows OS and Linux 64 bit
RAVENDB users And much more…
Fundamentals No schema Documents are stored as JSON The main focus of RavenDB is to allow developers to build high- performance, low latency applications quickly and efficiently. { "Address" : “Vinarska", "City" : “I love Brno", "PostalCode" : 60200 }
Queries and Indexes Documents stored as collections Indexing for deep properties MapReduce support It supports static and ad-hoc indexes Indexing performed in background Map(k1,v1) → list(k2,v2) Reduce(k2, list (v2)) → list(v3)
Scaling & Replication Sharding support Replication support Full backup support
MapReduce MapReduce done as indexes In RavenDB, MapReduce is defined as an index and is precalculated in the background It doesn’t support MapReduce pipeline
Querying The basic operation on RavenDB
Querying
Querying Insertion Done on the server side
Querying Filtering Return records that match the given condition
Querying
Querying
Querying Searching Using the WHERE closure to create conditions
Querying Paging Splitting the databases into pages, and reading one page at a time.
Indexing Indexes are server-side functions that define using which fields (and what values) document can be searched on and are the only way to satisfy queries in RavenDB. The whole indexing process is done in the background and is triggered whenever data is added or changed. The core of every index is its mapping function with LINQ-like syntax and the result of such a mapping is converted to Lucene index entry, which is persisted for future use to avoid re-indexation each time the query is issued and to achieve fast response times. Even when you do not create an index, RavenDB will use one to execute queries. In fact there are no O(N) operations in general in RavenDB queries. Using indexes, queries in RavenDB are O(logN) operations.
Indexing RavenDB is safe by default and whenever you make a query, the query optimizer will try to select an appropriate index to use. If there is no such appropriate index, then the query optimizer will create an index for you. Map indexes (sometimes referred as simple indexes) contain one (or more) mapping functions that indicate which fields from documents should be indexed (in other words they indicate which documents can be searched by which fields). multi-map indexes allow you to index data from multiple collections e.g. polymorphic data
Indexing-Map Reduced Index Map-Reduce indexes that allow complex aggregations to be performed in two-step process. First by selecting appropriate records (using Map function), then by applying specified reduce function to these records to produce smaller set of results. In essence, it is just a way to take a big task and divide it into discrete tasks that can be done in parallel. The notion of stale indexes comes from an observation deep in ravendb's design, assuming that the user should never suffer from assigning the server big tasks. as far as ravendb is concerned, it is better to be stale than offline, and as such it will return results to queries even if it knows they may not be as up-to-date as possible. A fanout index is an index that outputs multiple index entries per each document.
Customizing using sort Indexes in RavenDB are lexicographically sorted by default, so all queries return results which are ordered lexicographically. When putting a static index in RavenDB, you can specify custom sorting requirements, to ensure results are sorted the way you want them to. Dates are written to the index in a form which preserves lexicography order, and is readable by both human and machine (like so: 2011-04- 04T11:28:46.0404749+03:00), so this requires no user intervention, too. Numerical values, on the other hand, are stored as text and therefore require the user to specify explicitly what is the number type used so a correct sorting mechanism is enforced. This is quite easily done, by declaring the required sorting setup in SortOptions SORTOPTION(NUMBER)-(1,2,3,11) SORTOPTION(string)-(1,11,2,3)
Boosting & Analyzers Another great feature that Lucene engine provides and RavenDB leverages.This feature gives user the ability to manually tune the relevance level of matching documents when performing a query. From the index perspective we can associate with an index entry a boosting factor and the higher value it has, the more relevant term will be. To do this we must use Boost extension method from Raven.Client.Linq.Indexing The indexes each RavenDB server instance uses to facilitate fast queries are powered by Lucene, the full-text search engine.
Boosting & Analyzers Lucene takes a Document , breaks it down into fields , and then splits all the text in a Field into tokens (Terms) in a process called Tokenization. Those tokens are what will be stored in the index, and later will be searched upon. After a successful indexing operation, RavenDB feeds Lucene with each entity from the results as a Document, and marks every property in it as a Field . Then every property is going through the Tokenization process using an object called a "Lucene Analyzer", and then finally is stored into the index.
Boosting & Analyzers after the tokenization and analysis process is complete, the resulting tokens are stored in an index, which is now ready to be search with. only fields in the final index projection could be used for searches, and the actual tokens stored for each depend on how the selected analyzer processed the original text. lucene allows storing the original token text for fields, and ravendb exposes this feature in the index definition object via stores. Lucene offers several out-of-the-box Analyzers, and the new ones can be created easily. Various analyzers differ in the way they split the text stream ("tokenize"), and in the way they process those tokens post-tokenization.
Boosting & Analyzers StandardAnalyze StopAnalyzer SimpleAnalyzer WhitespaceAnalyzer KeywordAnalyzer By default, RavenDB uses a custom analyzer called LowerCaseKeywordAnalyzer for all content. This implementation behaves like Lucene's KeywordAnalyzer, but it also performs case normalization by converting all characters to lower case. In other words, by default, RavenDB stores the entire term as a single token, in a lower case form. So given the same sample text from above, LowerCaseKeywordAnalyzer will produce a single token looking like this:
Term Vectors & Dynamic fields Term Vector is a representation of a text document as a vector of identifiers that can be used for similarity searches, information filtering, information retrieval, and indexing. In RavenDB the features like MoreLikeThis or text highlighting are leveraging the term vectors to accomplish their purposes. While strongly typed entities are well processed by LINQ expressions, some scenarios demand the use of dynamic properties. To support searching in object graphs they cannot have their entire structure declared upfront. RavenDB exposes low-level API for creating fields from within index definitions.
Testing Indices and side by side index The common problem, especially when data set is too big and indexation takes very long time, is the need of changing the index definition. As you know, each change of the definition will reset index and start indexation process (for this index) from scratch which in many cases in fine, but not during the development, when you are shaping the index and demanding immediate feedback from server with the results (or at least partial results). To resolve this issue, we have introduced the ability to test indexes on a limited data set. This way developers will get index results immediately from a limited data set so the can proceed with the index creation process, without resetting the main index till the new definition is ready.
Testing Indices and side by side index This feature enables you to create an index that will be replaced by another one after one of the following conditions are met: new index becomes non-stale (non-optional) new index reaches last indexed etag (in the moment of creation of a new side-by- side index) of a index that will be replaced (optional) particular date is reached (optional)