Adam Koehler Index Speed Demons - How To Turbo-Charge Your Text Based Queries Using Full-Text Indexing.

Slides:

Advertisements

Similar presentations

Denny Cherry Manager of Information Systems MVP, MCSA, MCDBA, MCTS, MCITP.

Advertisements

© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert

Information Retrieval in Practice

Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc.

Overview of Search Engines

Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.

Denny Cherry twitter.com/mrdenny.

Enterprise Search. Search Architecture Configuring Crawl Processes Advanced Crawl Administration Configuring Query Processes Implementing People Search.

Databases & Data Warehouses Chapter 3 Database Processing.

Conceptual Architecture of PostgreSQL PopSQL Andrew Heard, Daniel Basilio, Eril Berkok, Julia Canella, Mark Fischer, Misiu Godfrey.

Overview of SQL Server Alka Arora.

Performing Indexing and Full-Text Searching Lesson 21.

Database Design for DNN Developers Sebastian Leupold.

Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.

Module 8 Improving Performance through Nonclustered Indexes.

DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,

Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.

Architecture Rajesh. Components of Database Engine.

Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.

Module 5 Planning for SQL Server® 2008 R2 Indexing.

Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.

Table Indexing for the.NET Developer Denny Cherry twitter.com/mrdenny.

Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,

1 Chapter Overview Performing Configuration Tasks Setting Up Additional Features Performing Maintenance Tasks.

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Full Text Search. Some Info  An optional component  Much faster and complex than the previous version  Allow you to search for words and tokens in.

Denny Cherry twitter.com/mrdenny.

Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.

DATABASE MANAGEMENT SYSTEM ARCHITECTURE

Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.

SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.

SQL SERVER DAYS 2011 Table Indexing for the.NET Developer Denny Cherry twitter.com/mrdenny.

Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.

Session 1 Module 1: Introduction to Data Integrity

A search engine is a web site that collects and organizes content from all over the internet Search engines look through their own databases of.

Analyzing Text with SQL Server 2014, R, AND Azure ML Dejan Sarka.

General Architecture of Retrieval Systems 1Adrienn Skrop.

SQL Basics Review Reviewing what we’ve learned so far…….

In this session, you will learn to: Create and manage views Implement a full-text search Implement batches Objectives.

Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.

SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.

SQL Database Management

Information Retrieval in Practice

Creating Database Objects

Chris Index Feng Shui Chris

Isolation Levels Understanding Transaction Temper Tantrums

Database Performance Tuning and Query Optimization

What is that service I never turn on?

Multimedia Information Retrieval

Getting To Know Your Indexes

Search Techniques and Advanced tools for Researchers

Table Indexing for the .NET Developer

MANAGING DATA RESOURCES

MANAGING DATA RESOURCES

Conceptual Architecture of PostgreSQL

Database Systems Instructor Name: Lecture-3.

Conceptual Architecture of PostgreSQL

The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)

Introduction to Operating Systems

Database Design Hacettepe University

Chapter 11 Database Performance Tuning and Query Optimization

A – Pre Join Indexes.

Information Retrieval and Web Design

Creating Database Objects

Isolation Levels Understanding Transaction Temper Tantrums

Michelle Haarhues Keeping up with SSMS.

All about Indexes Gail Shaw.

Advanced Database Topics

SQL Server Indexing for the Client Developer

XML? What’s this doing in my database? Adam Koehler

Presentation transcript:

Adam Koehler Index Speed Demons - How To Turbo-Charge Your Text Based Queries Using Full-Text Indexing

Thank you sponsors!

About Me: Adam Koehler, Senior Database Administrator at ScriptPro LLC 15 years of progressive experience with SQL Server from 7.0 to 2014 E-mail: ajkoehl@gmail.com Twitter: @sql_geek LinkedIn: https://www.linkedin.com/in/adam-j-koehler Blog: https://sqlgeekery.wordpress.com

What are we going to cover? Relational & full-text indexes: What are they? How are they implemented? What are the benefits? What are the downsides? Other search products: Apache Lucene.NET

Relational Indexes – What are they? A set of pages that are organized in a B-tree structure with a multiple level hierarchy Can be defined as a clustered or non-clustered index Built into the SQL Server Engine itself Can be created on tables or views

Relational Indexes – Clustered Indexes Is the physical ordering of the data into an organized structure based on the key values of the index Only 1 allowed per table based on the physical order (ascending or descending) The leaf nodes of a clustered index are actually the data rows themselves

Relational Indexes – Non-clustered Indexes Contain a row-locator to the clustered index or the data row if the table is a heap Can create up to 999 on an individual table Can add non-key columns as included columns on the leaf level of the index that allow for fully-covered queries to execute optimally

Relational Indexes – Benefits They’re easy to implement, no additional code required Commonly used, so there’s a ton of information out there on indexing strategies Not constrained by data type limitations

Relational Indexes – Downsides Dependent on the index structure, as the table gets bigger, so does the indexes on the table As the indexes get larger, the time to query data based on that index can increase Fragmentation can occur in the indexes, which can increase space usage & slow down queries Queries against this data are row by row and byte by byte, which can be slow, dependent on the amount of data you’re dealing with Certain data types cannot be key columns varchar(max),nvarchar(max), varbinary(max), xml

Relational Indexes – Implementation Uses the CREATE CLUSTERED INDEX & CREATE NONCLUSTERED INDEX statement. Have visibility into indexes using the following DMV’s: sys.dm_db_index_physical_stats sys.dm_db_index_operational_stats sys.dm_db_index_usage_stats sys.dm_db_partition_stats sys.allocation_units

DEMO

Full-Text Indexing – What is it? A token (word) -based index that allows for searching against character and BLOB data types (such as Excel & Word documents) Been a part of SQL Server since SQL 7.0 Significantly updated in SQL Server 2008 to fully integrate into the SQL Server Engine

Full-Text Indexes – Architecture Consists of two parts Full-Text Engine in sqlservr.exe Responsible for query compilation and processing Filter daemon host process - fdhost.exe Responsible for loading the filters that the Full-Text Engine uses Is the MSSQLFDLauncher service

Full-Text Indexes – Full-Text Engine SQLServr.exe is responsible for the following components of Full-Text Search: User Tables Full-text gatherer Works with the full-text crawl threads for scheduling and executing the populating of the indexes and monitoring full-text catalogs Thesaurus files Stored in <sql instance directory>\MSSQL\FTData Stoplist objects Common words that are noise words not to search on Query Processor If a query contains a full-text search, the processor passes it off to the Full-Text Engine for compilation and execution Full-Text Engine Index Writer Builds the structure used to store the indexed items

Full-Text Indexes –FD Host Process Is responsible for accessing, filtering, and word breaking data from tables and stemming the query input. Has the following components: Protocol Handler pulls the data from memory for processing and accesses data from user tables. Filters Data in varbinary, varbinary(max), image or xml columns require filtering the data in the document before it can be indexes. The filters are based on the document type and extract chunks of data from the documents removing formatting and leaving the text and position information. Word breakers and stemmers Are language specific components that find word boundaries based on the literal rules of a given language (breaking). Stemmers conjucate verbs and perform expansion of word tenses. At the time of indexing, the filter daemon uses these to perform linguistic analysis on the text data from a given column based on the language defined on the index itself.

Full-Text Indexes – Search Processing https://docs.microsoft.com/en-us/sql/relational-databases/search/full-text-search

Full-Text Indexes – Benefits Allows for semantic search operations against fields in the database As long as automatic population is turned on, full-text index maintenance is fairly simple The size of the full-text index on the table is usually smaller than that of a relational index

Full-Text Indexes – Downsides Requires modification of existing code to support searches Only one Full-Text index allowed per table Can only be created on the following data types: char, varchar, nchar, nvarchar text, ntext image xml varbinary and varbinary(max) columns

Full-Text Indexes – Implementation The FDHost service must be started Named Pipes must be an enabled network protocol for SQL Server Must create a full-text catalog first in order to group any full-text indexes together (CREATE FULLTEXT CATALOG) Can have multiple catalogs per database Must have a unique key index defined on the table you’re going to put the full-text index on (i.e. primary key or unique index)

Full-Text Indexes – Implementation Have visibility into the Full-Text subsystem via the following DMVs/DMFs Database level: Sys.fulltext_indexes Sys.fulltext_catalogs Sys.fulltext_stopwords Sys.fulltext_stoplists Sys.dm_fts_index_keywords Sys.dm_fts_index_keywords_by_document Sys.dm_fts_index_keywords_position_by_document Instance Level: Sys.dm_fts_active_catalogs Sys.dm_fts_fdhosts Sys.dm_fts_index_population Sys.dm_fts_memory_Buffers Sys.dm_fts_memory_pools Sys.dm_db_fts_index_phyiscal_stats Sys.dm_Fts_parser

Full-Text Indexes – Usage In order to use the full-text index, your query must include one of the following functions: FREETEXT, FREETEXTTABLE CONTAINS CONTAINSTABLE

Full-Text Indexes – CONTAINS Used in the WHERE clause of a query Searches for precise or less precise matches to single words and phrases Can search for the following: Prefix of a word or phrase Word near another word A word that is inflectionally generated from another (i.e. drive, drives, drove, driving, driven) Synonyms of another word using a thesaurus

Full-Text Indexes – CONTAINSTABLE Returns a table of zero or one or more rows for the columns queried containing precise or less precise matches to single words and phrases, proximity of words within a distance of one another or weighted matches. Used in the FROM clause Returns a relevance ranking value and full-text key in the result set

Full-Text Indexes – FREETEXT Used in the WHERE clause of a query Searches for values that match the meaning and not the exact wording of the search criteria Queries using FREETEXT are less precise than CONTAINS Matches are generated if any term or form of any term is found

Full-Text Indexes – FREETEXTTABLE Uses the same search conditions as FREETEXT, but also adds a rank and key value for each row Used in the FROM clause of a query like CONTAINSTABLE

DEMO

Apache Lucene.NET – What is it? Port of the java Lucene search library to .NET. Based on an inverted index Mapping from content to locations in files or a database Used in search engine indexing

Apache Lucene.Net – Benefits Allows C# developers to index documents and tables only having to learn basic T-SQL constructs Is a module that can be pre-built into C# applications with minimal effort It does not interact with the database, except when the query is executed to build and maintain the index files on disk.

Apache Lucene.Net – Downsides Separate files exist on disk that must be maintained & backed up with file backups to make sure that the indexing service still runs. Cannot tune the queries against the index files without recompiling your application Unless those queries are in stored procedures, then you can tune the stored procedures

Apache Lucene.Net – Implementation Main components of Lucene.NET Analyzer – Breaks down the search criteria into single words/terms IndexWriter – Coordinates with the Analyzer and moves results into storage IndexSearcher – performs the actual search against the index file Document – entity which is to be retrieved by the index Table in a database Field – metadata that describes a document. This data is what is searchable Columns in a table Store Directory – Directory in which the index files are stored

DEMO

Summary Relational indexes are the easiest to implement to get good performance boosts on your systems. Full-Text indexes increase what can be indexed on your database and allow for search engine-like queries against SQL Server and can speed up your character based queries dramatically Apache Lucene.NET is nice for C# developers, but not for DBA’s to implement

Links, and Thank you! CREATE FULLTEXT INDEX http://bit.ly/2wwBLhR Query with Full-Text Search http://bit.ly/2xcTpvB Apache Lucene.NET http://bit.ly/2fVhorU Lucene.NET main concepts http://bit.ly/2xXjPka Lucene.NET Sample application http://bit.ly/2yFt85l E-mail: ajkoehl@gmail.com Twitter: @sql_geek LinkedIn: http://www.linkedin.com/in/adam-j-koehler