Connect with life Nauzad Kapadia Quartz Systems
Session Objectives And Takeaways Session Objective(s): Discover what we have learned from using the new Integrated Full Text Search Find out what works and what doesn’t work iFTS is faster than SQL 2005 full text A great base for future improvements
Background Based on MSSearch Lack of Integration limited the performance of queries Limits the ability to integrate with high availability and scalability functionality
New Architecture
T-SQL Parser Algebrizer SQL Algeb. FTS Algeb. Bind Language Module FTS Algeb. Parse Ranking Func. Integration SQL SERVER process FDHOST process Shared Memory WB Stemmer iFilters STOPLIST THESAURUS WB client QO QE SQL/FTS integrated query tree FTLogicalOperator Cardinality FTExecutionOperator QUERY Execution Plan Full-Text Index Results
Demo – Using Full Text Search
StopLists New STOPLIST support Simplified noise words utilization and manageability. DB object associated with the FT index. CREATE FULLTEXT STOPLIST stoplist_name [ FROM {[database_name.] source_stoplist_name} | SYSTEM STOPLIST] [AUTHORIZATION owner_name] ALTER FULLTEXT STOPLIST stoplist_name { | ADD LANGUAGE language_term | DROP { | LANGUAGE language_term | ALL LANGUAGE language_term | ALL }
Demo – Stop Lists and Creating FullText Indexes
Thesaurus improvements Stored in internal tables (in tempdb) in XML form instead of being parsed from external files Instance level thesaurus sys.sp_fulltext_load_thesaurus_file (lcid) Loads all the data specified in the Thesaurus XML corresponding to the language with specified lcid.
Demo - Thesaurus
New family of Word-Breakers (WB): WBs are components responsible of parse the textual data in a given language and pass the tokenized result to the Full-Text Index. 51 languages/WBs out of the box Improved quality in many already existing word-breakers
English English UK Simplified Chinese Traditional Chinese Chinese (Hong Kong) Chinese (Macau) Chinese (Singapore) Thai Korean French German Japanese Italian Spanish Bengali Bulgarian Catalan Croatian Neutral Punjabi Romanian Serbian Cyrillic Serbian Latin Slovak Slovenian Tamil Telugu Ukrainian Urdu Lithuanian Malay Icelandic Indonesian Hindi Gujarati Vietnamese Arabic Norwegian Portuguese Brazilian Russian Dutch Malayalam Marathi Hebrew Canada Latvian Swedish Danish Polish Turkish WBs available in SQL Server 2008: Languages present but disabled by default New languages supported in SQL Server 2008 Existing in SQL Server 2005, and being replaced by new WBs in SQL Server 2008 Unchanged language/WB from SQL Server 2005
The indexing performance has improved in most scenarios 2005 Crawl2005 TotalIFTS CrawlIFTS Total 20M rows 1k text data 02:0602:2501:2201:28 5M rows 8k text data 02:1002:4102:2202:32 20M rows 1k nvarchar data 01:3701:5501:2001:26 Measured on 4 processor AMD MHz, 8G RAM. Numbers are in HH:MM format. Total time is combining time to crawl and time of merge into index For some HW configuration and data types, specific best practices are recommended to improve indexing performance (i.e: capping SQL Server’s memory)
To see the word frequency Sys.dm_fts_index_keywords() Sys.dm_fts_index_keywords_by_document() Get number and size of fragments Sys.fulltext_index_fragments Understanding Query Behavior Sys.dm_fts_parser(““This is test” AND “This is also a test”, 1033,0)
Demo – Understanding Indexes
- Due to new architecture, we have now new Full-Text Indexes. Former ones are not compatible in SQL Server Solution: Full-Text Catalog Upgrade Option - Import: (default) Faster method although performance and semantic implications are possible. - Rebuild: Slower method although ideal final state of new FTCatalogs guaranteed. - Reset: Faster Upgrade method although your Search app will not have the FTCatalogs available afterwards. You need to rebuild them when possible. - Possible Upgrade methods: 1.In place Upgrade: User will be prompted for what Upgrade Option to choose for existing FTCatalogs. 2.Restore/Attach : Instance level setting will be applied to former Full-Text Catalogs brought up with the former DB.
Put full text index on separate file group to avoid fragmenting main data files Use varchar(max) instead of text/image If you see excessive blocking by FTGATHER turn off auto change tracking Schedule manual job to do updates Watch number of fragments to determine how often to schedule Don’t run full text master merges with other index rebuilds or reorgs at the same time If you have large documents (>2MB) may need to reduce SQL memory a bit so FDHost daemon has memory to run.
No fielded searches on XML or any other document format. Still will be at document level. No partitioned full-text indexes. So no support for SWITCH partition on tables that are FT indexed. Some of the customer wish list items: Snippets, column weights, language detection, customizable wordbreakers and proximity operators, etc.. Forms of words when searching secondary languages; i.e. -s, -ing, -ed; e.g. in a document that used Russian word breaker initially and you know it contains the English word ‘Books’, it will find ‘book’ but not ‘books’.
Related Content Webcasts MSDN Webcast: Using Full-Text Search in SQL Server Express 9&Culture=en-US White Paper on SQL Server FTS MSDN Technical Case Study FTS 2008 (iFTS) White Paper Program Manager on SQL Server FTS
Demo
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.