Safe by default, optimized for efficiency

Slides:



Advertisements
Similar presentations
PHP I.
Advertisements

Lucene Part3‏. Lucene High Level Infrastructure When you look at building your search solution, you often find that the process is split into two main.
Information Retrieval in Practice
Search Engines and Information Retrieval
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
Basic features ● Document database ● Paid deployment ● JSON ● C#, HTTP REST, Java ● version 3.0.
With Microsoft Access 2010 © 2011 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access.
Tutorial 11: Connecting to External Data
Overview of Search Engines
Access Tutorial 3 Maintaining and Querying a Database
Definitions Collaboration – working together on team projects and sharing information, often through ad-hoc processes, to accomplish project goals. Document.
CORE 2: Information systems and Databases STORAGE & RETRIEVAL 2 : SEARCHING, SELECTING & SORTING.
Advanced Database CS-426 Week 2 – Logic Query Languages, Object Model.
Apache Lucene in LexGrid. Lucene Overview High-performance, full-featured text search engine library. Written entirely in Java. An open source project.
Search Engines and Information Retrieval Chapter 1.
® IBM Software Group © 2009 IBM Corporation Rational Publishing Engine RQM Multi Level Report Tutorial David Rennie, IBM Rational Services A/NZ
Data and its manifestations. Storage and Retrieval techniques.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
ITGS Databases.
MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
1 Chapter 6: Using Prompts in Tasks and Queries 6.1 Prompting in Projects 6.2 Creating and Using Prompts in Tasks 6.3 Creating and Using Prompts in Queries.
Unit-8 Introduction Of MySql. Types of table in PHP MySQL supports various of table types or storage engines to allow you to optimize your database. The.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
SQL Basics Review Reviewing what we’ve learned so far…….
Adding Concurrency to a Programming Language Peter A. Buhr and Glen Ditchfield USENIX C++ Technical Conference, Portland, Oregon, U. S. A., August 1992.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Databases and Database User ch1 Define Database? A database is a collection of related data.1 By data, we mean known facts that can be recorded and that.
Information Retrieval in Practice
Product Training Program
CS4222 Principles of Database System
Managing State Chapter 13.
Designing High Performance BIRT Reports
Memory Hierarchy Ideal memory is fast, large, and inexpensive
View Integration and Implementation Compromises
IST 220 – Intro to Databases
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Dynamic SQL Writing Efficient Queries on the Fly
Datab ase Systems Week 1 by Zohaib Jan.
Using E-Business Suite Attachments
Object-Oriented Analysis and Design
Searching and Indexing
Information Retrieval in Practice
Spark Presentation.
Physical Database Design and Performance
Larry Reaves October 9, 2013 Day 16: Access Chapter 2 Larry Reaves October 9, 2013.
Senior Solutions Architect, MongoDB Inc.
Methodology – Physical Database Design for Relational Databases
Aggregation Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together,
Database Performance Tuning and Query Optimization
Microsoft Office Access 2003
Microsoft Office Access 2003
New input for CEOS Persistent Identifier Best Practices
Database Systems Instructor Name: Lecture-3.
A Simple Two-Pass Assembler
Chapter 11 Database Performance Tuning and Query Optimization
MAPREDUCE TYPES, FORMATS AND FEATURES
Tutorial 7 – Integrating Access With the Web and With Other Programs
Rational Publishing Engine RQM Multi Level Report Tutorial
Grauer and Barber Series Microsoft Access Chapter One
5/7/2019 Map Reduce Map reduce.
Information Retrieval and Web Design
Shelly Cashman: Microsoft Access 2016
Contract Management Software 100% Cloud-Based ContraxAware provides you with a deep set of easy to use contract management features.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Map Reduce, Types, Formats and Features
Presentation transcript:

Safe by default, optimized for efficiency

RAVENDB Open source Document database Built with C# Data saved as JSON Uses Lucene.NET for indexing Uses Esent for storage Latest stable version: 3.5.3 Available only for Windows OS and Linux 64 bit

RAVENDB users And much more…

Fundamentals No schema Documents are stored as JSON The main focus of RavenDB is to allow developers to build high- performance, low latency applications quickly and efficiently. { "Address" : “Vinarska", "City" : “I love Brno", "PostalCode" : 60200 }

Queries and Indexes Documents stored as collections Indexing for deep properties MapReduce support It supports static and ad-hoc indexes Indexing performed in background Map(k1,v1) → list(k2,v2) Reduce(k2, list (v2)) → list(v3)

Scaling & Replication Sharding support Replication support Full backup support

MapReduce MapReduce done as indexes In RavenDB, MapReduce is defined as an index and is precalculated in the background It doesn’t support MapReduce pipeline

Querying The basic operation on RavenDB

Querying

Querying Insertion Done on the server side

Querying Filtering Return records that match the given condition

Querying

Querying

Querying Searching Using the WHERE closure to create conditions

Querying Paging Splitting the databases into pages, and reading one page at a time.

Indexing Indexes are server-side functions that define using which fields (and what values) document can be searched on and are the only way to satisfy queries in RavenDB. The whole indexing process is done in the background and is triggered whenever data is added or changed. The core of every index is its mapping function with LINQ-like syntax and the result of such a mapping is converted to Lucene index entry, which is persisted for future use to avoid re-indexation each time the query is issued and to achieve fast response times. Even when you do not create an index, RavenDB will use one to execute queries. In fact there are no O(N) operations in general in RavenDB queries. Using indexes, queries in RavenDB are O(logN) operations.

Indexing RavenDB is safe by default and whenever you make a query, the query optimizer will try to select an appropriate index to use. If there is no such appropriate index, then the query optimizer will create an index for you. Map indexes (sometimes referred as simple indexes) contain one (or more) mapping functions that indicate which fields from documents should be indexed (in other words they indicate which documents can be searched by which fields). multi-map indexes allow you to index data from multiple collections e.g. polymorphic data 

Indexing-Map Reduced Index Map-Reduce indexes that allow complex aggregations to be performed in two-step process. First by selecting appropriate records (using Map function), then by applying specified reduce function to these records to produce smaller set of results. In essence, it is just a way to take a big task and divide it into discrete tasks that can be done in parallel. The notion of stale indexes comes from an observation deep in ravendb's design, assuming that the user should never suffer from assigning the server big tasks. as far as ravendb is concerned, it is better to be stale than offline, and as such it will return results to queries even if it knows they may not be as up-to-date as possible.  A fanout index is an index that outputs multiple index entries per each document.

Customizing using sort Indexes in RavenDB are lexicographically sorted by default, so all queries return results which are ordered lexicographically. When putting a static index in RavenDB, you can specify custom sorting requirements, to ensure results are sorted the way you want them to. Dates are written to the index in a form which preserves lexicography order, and is readable by both human and machine (like so: 2011-04- 04T11:28:46.0404749+03:00), so this requires no user intervention, too. Numerical values, on the other hand, are stored as text and therefore require the user to specify explicitly what is the number type used so a correct sorting mechanism is enforced. This is quite easily done, by declaring the required sorting setup in SortOptions SORTOPTION(NUMBER)-(1,2,3,11) SORTOPTION(string)-(1,11,2,3)

Boosting & Analyzers Another great feature that Lucene engine provides and RavenDB leverages.This feature gives user the ability to manually tune the relevance level of matching documents when performing a query. From the index perspective we can associate with an index entry a boosting factor and the higher value it has, the more relevant term will be. To do this we must use Boost extension method from Raven.Client.Linq.Indexing The indexes each RavenDB server instance uses to facilitate fast queries are powered by Lucene, the full-text search engine.

Boosting & Analyzers Lucene takes a Document , breaks it down into fields , and then splits all the text in a Field into tokens (Terms) in a process called Tokenization. Those tokens are what will be stored in the index, and later will be searched upon. After a successful indexing operation, RavenDB feeds Lucene with each entity from the results as a Document, and marks every property in it as a Field . Then every property is going through the Tokenization process using an object called a "Lucene Analyzer", and then finally is stored into the index.

Boosting & Analyzers after the tokenization and analysis process is complete, the resulting tokens are stored in an index, which is now ready to be search with. only fields in the final index projection could be used for searches, and the actual tokens stored for each depend on how the selected analyzer processed the original text.  lucene allows storing the original token text for fields, and ravendb exposes this feature in the index definition object via stores.  Lucene offers several out-of-the-box Analyzers, and the new ones can be created easily. Various analyzers differ in the way they split the text stream ("tokenize"), and in the way they process those tokens post-tokenization. 

Boosting & Analyzers StandardAnalyze StopAnalyzer SimpleAnalyzer WhitespaceAnalyzer KeywordAnalyzer By default, RavenDB uses a custom analyzer called LowerCaseKeywordAnalyzer for all content. This implementation behaves like Lucene's KeywordAnalyzer, but it also performs case normalization by converting all characters to lower case. In other words, by default, RavenDB stores the entire term as a single token, in a lower case form. So given the same sample text from above, LowerCaseKeywordAnalyzer will produce a single token looking like this:

Term Vectors & Dynamic fields Term Vector is a representation of a text document as a vector of identifiers that can be used for similarity searches, information filtering, information retrieval, and indexing. In RavenDB the features like MoreLikeThis or text highlighting are leveraging the term vectors to accomplish their purposes. While strongly typed entities are well processed by LINQ expressions, some scenarios demand the use of dynamic properties. To support searching in object graphs they cannot have their entire structure declared upfront. RavenDB exposes low-level API for creating fields from within index definitions.

Testing Indices and side by side index The common problem, especially when data set is too big and indexation takes very long time, is the need of changing the index definition. As you know, each change of the definition will reset index and start indexation process (for this index) from scratch which in many cases in fine, but not during the development, when you are shaping the index and demanding immediate feedback from server with the results (or at least partial results). To resolve this issue, we have introduced the ability to test indexes on a limited data set. This way developers will get index results immediately from a limited data set so the can proceed with the index creation process, without resetting the main index till the new definition is ready.

Testing Indices and side by side index This feature enables you to create an index that will be replaced by another one after one of the following conditions are met: new index becomes non-stale (non-optional) new index reaches last indexed etag (in the moment of creation of a new side-by- side index) of a index that will be replaced (optional) particular date is reached (optional)