Architecting Search in SharePoint 2016 Ajay Iyer Sr. Consultant (MCS)
Thank You Sponsors for participating in SPS St. Louis 2017! You can use the hashtag #SPSSTL & follow us @SPSStlouis Gold Sponsors Silver Sponsors
Ajay Iyer SharePoint Architect @shankarajay1 http://sharepointadminstuff.wordpress.com Solution Architecture, Capacity Planning, Search, Migrations, Enterprise Content Management, Document Imaging and….. InfoPath
Outline What’s new in SP2016 & Search? Review Search Components & Terminologies Gather Requirements for Search (What specifics should you ask?) Search Farm Architecture & Design Incremental Crawls or Continuous Crawls?
What’s new in SP2016
What’s New in SP2016
MinRole Auto-Provisioning of SP Services/Service App Endpoints Provisioning is based on Server Roles Ensures each server in farm is running services it needs Farm Admins can manage services at farm-level, not server-level Improves farm reliability Simplified Capacity Planning Choose “Custom” role to manage services on server-level
What’s New in SP2016 Search Each Search Service Application now supports indexing up to 500 million items (SP2013 supported 250 million items) Searching for sensitive documents. E.g. documents with SSNs, credit cards, passport numbers etc. in them, using Data Loss Prevention (DLP) Hybrid Search is available in SharePoint 2016 and SharePoint 2013 + September 2015 CU Can crawl Office 365 content as well as On-Prem content and get unified results across both.
Other Notable Changes Supports upload & download of files > 2GB Filenames can now include special characters &, ~, {, } are allowed # and % are still NOT allowed Search now supports indexing up to 500 million items (SP2013 supported 250 million items) STSADM is deprecated! PowerShell is your best friend! SharePoint Foundation NOT available for SP2016 Easy User Profile Synch – Microsoft Identity Manager! Excel Services are now part of Excel Online 10 million per index partition and up to 25 index partitions per search service application
Upgrade Farm Requirements SharePoint Server 2013 + March 2013 PU (v15.0.4481.1005) or higher All content databases that need to be upgraded should at least also be on v15.0.4481.1005 Use Database Attach Upgrade Make sure no site collections/subsites are in 2010 Compatibility Mode Custom code from SP2013 “should” work with SP2016 Source: https://technet.microsoft.com/en-us/library/mt422728(v=office.16).aspx General purpose slide
Review of Search Components Search Roles Search Databases Search Administration Crawl Content Processing Analytics Processing Index Query Processing Search Administration DB Crawl DB Analytics Reporting DB Link DB Search Roles Crawl component crawls the data and discovers all the properties. It then goes to the content processing component (CPC) that parses the documents, extracts metadata and creates properties mappings. It then sends this all to the Indexing component. Index component performs link extraction of all the documents and stores them in the links db. These links are then processed by the Analytics Processing component (APC). APC also processes information like user clicks, referrer, etc. The Query Processing Component (QPC) performs word breaking, linguistics/stemming and parsing operations. It runs under the noderunner.exe service. It no longer queries the search db for security trimming info. It gets that from the index. Search Administration: The Search Administration database hosts the Search service application configuration and access control list (ACL) for the crawl component. Analytics Reporting: The Analytics Reporting database stores the results for usage analysis reports and extracts information from the Link database when needed. Crawl: The Crawl database stores the state of the crawled data and the crawl history. Link: The Link database stores the information that is extracted by the content processing component and the click through information
Other Search-related Terminologies No. of items in Index Search Topology Index Partitions Content Sources Incremental Crawls/Continuous Crawls Full Crawls Crawled Properties Managed Properties General purpose slide
Review of Search Components WFEs Crawl Content Processing Index Crawl DB Query Processing Analytics Processing Links DB Crawl component crawls the data and discovers all the properties. It then goes to the content processing component (CPC) that parses the documents, extracts metadata and creates properties mappings. It then sends this all to the Indexing component. Index component performs link extraction of all the documents and stores them in the links db. These links are then processed by the Analytics Processing component (APC). APC also processes information like user clicks, referrer, etc. The Query Processing Component (QPC) performs word breaking, linguistics/stemming and parsing operations. It runs under the noderunner.exe service. It no longer queries the search db for security trimming info. It gets that from the index. Crawl DB Analytics DB Search Index
Gathering Requirements for Search
Gathering Requirements for Search Questions How many web applications & site collections in current environment? How many total documents? What are the different document file types? PDFs, TIFFs, DOCX, XLSX, etc. ? Are the PDF’s, image-only or searchable? What’s the frequency of content additions and changes? How quickly do the users expect their documents to be searchable, after uploading to SharePoint? Are there any Line-Of-Business external applications that interface with SharePoint?
Gathering Requirements for Search Sample Response Questions Response No. of Web Apps 2 No. of Site Collections 25 Total No. of Documents 3,980,272 Document File Types PDFs (mostly), approx. 25,000 office documents, approx. 3000 TIFFs PDF Types Mostly Searchable PDFs, and approx. 2000 Image-Only PDFs Frequency of content change Approx. 200 documents added per hour How soon should docs be retrievable from search 10 minutes or less
Search Farm Architecture & Design
Search Farm Architecture & Design Sample Search Farm Design WFE 1 APP 1 Office Online Server DB Server APP 2 Workflow Server WFE 2
Search Farm Architecture & Design Sample Search Farm Design WFE 1 Crawl/Search Admin components Query component APP 1 CRWLR 1 Office Web Apps DB Server APP 2 CRWLR 1 WAWS Crawl/Search Admin components WFE 2 Query component
Search Servers Specifications Search Farm Architecture & Design Search Servers Specifications Crawl & Search Admin Servers: 8 vCPUs 32GB RAM 80GB - System Drive 120GB - Data Drive (SP Binaries, Data, Logs) 40GB – Drive for Search Index
Search Farm Architecture & Design Search Servers Specifications Analytics & Content Processing Servers: 4 vCPUs 16GB RAM 80GB - System Drive 120GB - Data Drive (SP Binaries, Data, Logs)
Search Farm Architecture & Design Search Servers Specifications Query Servers: 4 vCPUs 16GB RAM 80GB - System Drive 120GB - Data Drive (SP Binaries, Data, Logs) 40GB – Drive for Search Index
Incremental Crawls or Continuous Crawls?? Search Farm Architecture & Design Incremental Crawls or Continuous Crawls??
Search Farm Architecture & Design Possible Resolutions: Check disk I/O on the SQL server data drive Move the Search Databases to a different LUN (if virtual) or DISK (if physical) Change the Search Performance level from the default “Maximum” to “Partly Reduced” or “Reduced” Here’s how you fix it using PowerShell Check memory utilization on the crawl & query servers Look for RAM utilization on the msssearch.exe and noderunner.exe services
Additional Questions? Questions slide
Thank You Ajay Iyer Sr. SharePoint Consultant ajiyer@microsoft.com
Thank You Sponsors for participating in SPS St. Louis 2017! You can use the hashtag #SPSSTL & follow us @SPSStlouis Gold Sponsors Silver Sponsors