SharePoint 2010 Search Deep Dive Corey Erkes, Manager Consultant Sogeti USA
About Me SharePoint 2010 Deep Dive Manager Consultant within Sogeti SharePoint Practice Worked with SharePoint since V2 MCTS: Microsoft SharePoint 2010, Configuring Co-Leader of Omaha SharePoint User Group Coauthor of SharePoint 2010 Governance Book Member of UNO IS&T Alumni Board
SharePoint 2010 Search Deep Dive Agenda SharePoint 2010 Search Versions SharePoint 2010 Foundation Search Server Express Search Server SharePoint 2010 Server FAST Search 2010 Architecture How to Configure Crawl Component Query Component Associated Databases How to Scale Out
SharePoint 2010 Search Deep Dive SharePoint 2010 Search Versions
SharePoint 2010 Search Deep Dive Wait, there are different flavors of Search? SharePoint Foundation 2010 Search Server 2010 Express Search Server 2010 SharePoint Server 2010 FAST Search Server 2010 for SharePoint Search Server 2010 Express is a separate product outside of SharePoint 2010, but when installed with SharePoint Foundation 2010, can provide a lot of functionality Foundation Scoped at site collection level No external data sources Each search server uses a separate crawl database and property database for indexing and responding to queries. Search Server Express Crawls external data stores, including SharePoint sites, Web sites, Windows file shares, Exchange Public Folders, Business Data Catalog connections, and Lotus Notes. Deployment is limited and cannot be scaled to multiple database or application servers for redundancy or to increase capacity or performance. However, if you install using the Advanced option, you can add Web servers. Search Server Scale out across farm
SharePoint 2010 Search Deep Dive SharePoint 2010 Search Functionality Breakdown Feature SharePoint Foundation 2010 Search Server Express Search Server 2010 SharePoint Server 2010 FAST Search Server 2010 for SharePoint Visual Best Bets Limited Scopes Search enhancements based on user context Custom properties Property extraction Query federation Query suggestions Similar results Sort results on managed properties or rank profiles Relevancy tuning by document or site promotions
SharePoint 2010 Search Deep Dive SharePoint 2010 Search Functionality Breakdown - Continued Feature SharePoint Foundation 2010 Search Server Express Search Server 2010 SharePoint Server 2010 FAST Search Server 2010 for SharePoint Shallow results refinement Deep results refinement Document preview and thumbnails Windows 7 federation People search Social search Taxonomy integration Multi-tenant hosting Rich Web indexing support
SharePoint 2010 Search Deep Dive SharePoint 2010 Index Size Capabilities SharePoint Foundation 2010 can be scaled out to over ~10 million with addition of search server and assign it to crawl different content databases
SharePoint 2010 Search Deep Dive Available Search Repositories Repository SharePoint Foundation 2010 Search Server Express Search Server 2010 SharePoint Server 2010 FAST Search Server 2010 for SharePoint SharePoint sites Windows file shares Exchange public folders Lotus Notes Web sites IFilters for additional systems Structured content in databases
SharePoint 2010 Search Deep Dive Search Manageability Manageability SharePoint Foundation 2010 Search Server Express Search Server 2010 SharePoint Server 2010 FAST Search Server 2010 for SharePoint UI-based administration Limited Scriptable deployment and management via PowerShell Microsoft System Center Operations Manager Pack Health Monitoring Usage Reporting
SharePoint 2010 Search Deep Dive So wait, Search Server Express is free? Feature Search Server Express SharePoint Server 2010 Performance with sub-second response time 10 million items* 100 million items Scriptable deployment and management via PowerShell User interface–based (UI-based) administration Relevancy tuning by document or site promotions Common connector framework for indexing and federation Search from Windows 7 and Windows Mobile Metadata-based refinement panel Metadata extraction on managed properties Scriptable deployment and management using Windows PowerShell Relevance improves with social behavior Query suggestions, related searches, and improved “Did you mean?” * - assumes SQL Server and not SQL Server Express
SharePoint 2010 Search Deep Dive Really, Search Server Express is free? Feature Search Server Express SharePoint Server 2010 People and expertise search Taxonomy and term store integration Phonetic and nickname search Integration with My Site That’s a lot of goodness for free!
SharePoint 2010 Search Deep Dive Unfortunately, FAST Search is not free!
SharePoint 2010 Search Deep Dive SharePoint 2010 Search Architecture
SharePoint 2010 Search Deep Dive Goodbye SSP, Hello SharePoint Search Service! Search Service Application Creation of Search Service Application\Proxy can be provisioned in one of three ways: Central Administration Manage Service Applications Page Central Administration Farm Configuration Wizard PowerShell (how the cool kids do it!) Creation of Search Service Application PowerShell Walk-Thru http://blogs.msdn.com/b/russmax/archive/2009/10/20/sharepoint-2010-configuring-search-service-application-using-powershell.aspx
SharePoint 2010 Search Deep Dive SharePoint Search Roles Four unique roles involved in Search Web server role Provides interface for searching Query server role Serves search results to web server(s) Crawl server role Responsible for crawling content Database server role Hosts the three databases associated with search Property database Crawl database Search administration database
SharePoint 2010 Search Deep Dive Search Components WCF Call Property Store Database Search Administration Database Web Front End Search Service Application Proxy Query Server / Query Processor Query Component Index Propagation Content Data Sources SharePoint Web Sites Shared Folders External Custom Databases Other Systems Index Index Server Crawler Crawl Database Connector(s)
SharePoint 2010 Search Deep Dive Database Role A minimum of three databases are required to support Search: Property databases Contains metadata or associated custom properties for all crawled items Crawl databases Contains history of the crawl Manages start and stop points of crawls Database can have more than one crawl associated to it, but a single crawler can only be associated to one database Search Administration database Stores search configuration data such as scopes and refiners. Contains security information for the crawl content
SharePoint 2010 Search Deep Dive Database Sizing Calculations for sizing databases Property databases 0.046 x (sum of content databases) Crawl databases 0.015 x (sum of content databases) Search Administration database Allocate 10 GB Database Characteristics Write-heavy, 1:2 ratio Read-heavy, 3:1 ratio Should not be collocated with Property DB Equal read/write
SharePoint 2010 Search Deep Dive Crawl Role Purpose of crawl server is to index content Crawl runs under MSSeach.exe (SharePoint Server Search 14) Crawl sever does not contain copy of index, index is streamed/propagated to Query server No longer a single point of failure Crawler component needs to be mapped to SQL crawl database Possible to create multiple Crawl databases and Crawler components
SharePoint 2010 Search Deep Dive Crawl Architecture WCF Call Property Store Database Search Administration Database Web Front End Search Service Application Proxy Query Server / Query Processor Query Component Index Propagation Content Data Sources SharePoint Web Sites Shared Folders External Custom Databases Other Systems Index Index Server Crawler Crawl Database Connector(s)
SharePoint 2010 Search Deep Dive Crawl Role – Fault Tolerance Can be achieved by provisioning a secondary crawl component on a secondary server Can be mapped to same SQL Crawl database Having more crawl databases than Crawl components doesn’t make sense and wastes system resources Crawl Database fault tolerance should be handled through SQL mirroring
SharePoint 2010 Search Deep Dive Crawl Role – Performance Performance is improved by adding additional Crawl components as two or more are crawling content instead of one Load is distributed across both Crawl components Overlapping would not occur as items are crawled in batches by both crawlers
SharePoint 2010 Search Deep Dive Crawl Role – Distribution Can be accomplished by doing the following: Crawl Component 1 Crawl DB 1 Crawl Component 2 Crawl DB 2 Each web application host is assigned a crawl component and attempts to distribute load evenly across crawl databases sales.company.com Crawl Component 1 Crawl DB 1 hr.company.com Crawl Component 2 Crawl DB 2 Distribution is based off # of items/doc id’s that are stored in crawl DB
SharePoint 2010 Search Deep Dive Crawl Role – Distribution Example Let’s say you have two web applications sales.company.com Crawl Component 1 Crawl DB 1 hr.company.com Crawl Component 2 Crawl DB 2 Crawl DB 1 contains 3000 items Crawl DB 2 contains 10,000 items New web application is provisioned: finance.company.com No need to create additional crawl component or crawl DB What crawl DB will new host be associated to?
SharePoint 2010 Search Deep Dive Query Role Purpose of query server is to server up queries to WFE Index is stored on Query server(s) Query server(s) contains one or more Query Components Query Component is mapped to only one Property Store DB Query Component is where index that is propagated from Crawler resides
SharePoint 2010 Search Deep Dive Query Architecture WCF Call Property Store Database Search Administration Database Web Front End Search Service Application Proxy Query Server / Query Processor Query Component Index Propagation Content Data Sources SharePoint Web Sites Shared Folders External Custom Databases Other Systems Index Index Server Crawler Crawl Database Connector(s)
SharePoint 2010 Search Deep Dive Query Component – Fault Tolerance Highly recommended to create fault tolerance index by mirroring a Query component onto another server in the farm. Check “Fail-over Query Component” if you only want fault tolerance and not increase in query performance.
SharePoint 2010 Search Deep Dive Query Component – Sizing the Index Index will be approximately 3.5% of Index size Don’t forget about size needed for mirror Additional space needed for master merge Example: 100 GB Content Database Index partition: 100 GB x 3.5% = 3.5 GB Index partition mirror: 100 GB x 3.5% = 3.5 GB Space for master merge: All index partitions x 3 Total Space = (3.5 x 2) x 3 = 21 GB Recommend having enough memory to fit 33% of the index in RAM.
SharePoint 2010 Search Deep Dive Query Component – Performance Index size is the main bottleneck for query performance Index contains 10 million documents = Avg. of 2 seconds per query Index contains 20 million documents = Avg. of 4 seconds per query Creating multiple index partitions is the key to reducing query times and reducing bottlenecks. A new index partition can be added through Search Application Topology in Central Administration.
SharePoint 2010 Search Deep Dive Property DB Store – Fault Tolerance & Performance Fault Tolerance SQL mirroring should be used to achieve fault tolerance. Performance Add addition Property Store DB if bottlenecks occur Must first create new Property Store DB, then create new Query component and map to new Property Store DB Additional Query component should not include mirror if performance is wanted You will need to reset index and re-crawl as a new Query component (index partition) would be created
SharePoint 2010 Search Deep Dive Property Store DB – Add Query Component Property Store DB must be created before adding Query Component so it appears in dropdown
SharePoint 2010 Search Deep Dive Query Processor Runs under w3wp.exe process Processes a query by retrieving results from the index\Query Components Utilizes the Property Store DB and Search Administration DB to obtain metadata and perform security trimming Will load balance requests if more than one Query Component (mirrored) exists within the same Index Partition Query Processor connects to every Property Store DB and Query Component to retrieve results Unlike MOSS 2007 where the Query Processor ran on the WFE, any server can run the Query Processor in SharePoint 2010
SharePoint 2010 Search Deep Dive Query Processor – Fault Tolerance & Performance Add additional Query Processor service to another machine in farm Doesn’t have to be WFE Requested will be load balanced in a round-robin fashion to each Query Processor Search Query and Site Settings Service can be found in CA Services On Server
SharePoint 2010 Search Deep Dive Overall Search Architecture WCF Call Property Store Database Search Administration Database Web Front End Search Service Application Proxy Query Server / Query Processor Query Component Index Propagation Content Data Sources SharePoint Web Sites Shared Folders External Custom Databases Other Systems Index Index Server Crawler Crawl Database Connector(s)
SharePoint 2010 Search Deep Dive Scale-out Decision Points Number of items Action 0 – 1 million All Search roles can coexist on one or two servers 1 – 10 million Move crawl components to another server, while the query components remain on the Web servers. 10 – 20 million Add a crawl server. Each crawl server has one crawler. Create another index partition with query components and distribute these across query servers. 20 – 40 million Add index partitions with distributed query components. Add another crawl database, and then add a new associated crawler to each crawl server. 40 – 100 million Isolate each topology layer into server groups in which each role is deployed to its own set of servers. Each server group can be scaled out to meet specific requirements for the components in that role. http://www.microsoft.com/download/en/details.aspx?id=20066
SharePoint 2010 Search Deep Dive Performance Metrics Thoughts To improve this metric… Take these actions Full crawl time and result freshness Add crawl servers, crawlers, and crawl databases. Each crawl database contains content from independent sources. Each crawl database can have several crawl components associated with it, and those crawl components can be distributed among many crawl servers. If you have several content sources, multiple crawl components and associated crawl databases allow you to crawl the content concurrently. Time required for results to be returned If query latency is caused by high peak query load, add query servers and index partitions. Each index partition can contain up to ~10 million items. You can also add a mirror for each query component for a given index partition. Place the mirror copy on a different server. Query throughput increases when you add index partition instances. If query latency is caused by database load, isolate the property database from crawl databases by moving it to a separate database server. http://www.microsoft.com/download/en/details.aspx?id=20066
SharePoint 2010 Search Deep Dive Small Farm Topology http://www.microsoft.com/download/en/details.aspx?id=20066
SharePoint 2010 Search Deep Dive Medium Farm Topology http://www.microsoft.com/download/en/details.aspx?id=20066
SharePoint 2010 Search Deep Dive Medium Search Farm Topology http://www.microsoft.com/download/en/details.aspx?id=20066
SharePoint 2010 Search Deep Dive Medium Dedicated Search Farm Topology http://www.microsoft.com/download/en/details.aspx?id=20066
SharePoint 2010 Search Deep Dive Large Dedicated Search Farm Topology http://www.microsoft.com/download/en/details.aspx?id=20066
SharePoint 2010 Search Deep Dive References Search Technologies for SharePoint 2010 Products http://download.microsoft.com/download/0/0/0/00015E0A-67CD-490C-9C1B-DCFA8E9BAEFC/Search%20Model%201%20of%204%20-%20Search%20Technologies.pdf SharePoint Brew – Search 2010 Architecture and Scale, Part 1 Crawl http://blogs.msdn.com/b/russmax/archive/2010/04/23/search-2010-architecture-and-scale-part-1-crawl.aspx SharePoint Brew – Search 2010 Architecture and Scale, Part 2 Query http://blogs.msdn.com/b/russmax/archive/2010/04/23/search-2010-architecture-and-scale-part-2-query.aspx