Download presentation
Presentation is loading. Please wait.
1
SharePoint 2013 Enterprise Search Topology
Tyler Bithell B2B TECHNOLOGIES | PRESENTATION
2
Who I am Tyler Bithell Chief Technical Architect of Portals at B2B Technologies. Worked with SharePoint since SharePoint Consultant since MCPD/MCITP – SharePoint MS in Computer Science. Blog… B2B TECHNOLOGIES | PRESENTATION
3
Tyler Bithell Follow me on twitter @B2B_Tech_TB
Mention me on Twitter using the hashtag #SPSATL Scan the QR code to fill out a survey and potentially win prizes Fill out surveys for big prizes SCAN QR CODES
4
Session Topics Search Topology Overview
Plan for Your Search Deployment Search Scaling How to Add, Move and Delete Components Demo Questions? First off I’m going to discuss the search topology and architecture. I’m going to explain in detail the function of each component that makes up the search service application Next I’m going to discuss planning, limits, best practices etc. I’ll discuss search scaling when to add components and how to add move and delete components. I will then demo how to modify the search topology And will then take questions B2B TECHNOLOGIES | PRESENTATION
5
Search Architecture Architecture Pictured here is the Logical Architecture of the SharePoint 2013 Search Service. This illustrates the path of content from content source to query If you are familiar with SharePoint 2010 search you’ll notice a few new components here, those being the content processing component That is between the crawl and index components Also, the web analytics component is new along with the link and analysis reporting database and the event store B2B TECHNOLOGIES | PRESENTATION
6
Search Topology Topology GUI
Architecture Topology GUI Pictured here is the Logical Architecture of the SharePoint 2013 Search Service. This illustrates the path of content from content source to query If you are familiar with SharePoint 2010 search you’ll notice a few new components here, those being the content processing component That is between the crawl and index components Also, the web analytics component is new along with the link and analysis reporting database and the event store B2B TECHNOLOGIES | PRESENTATION
7
Crawl Component Is responsible for crawling content sources.
Crawl and content Crawl and Content Processes Is responsible for crawling content sources. Invokes connectors or protocol handlers that interact with content sources to retrieve data. Uses one or more crawl databases to temporarily store information about crawled items and to track crawl history. Extracts crawled properties and metadata to send to the content processing component The crawl component is your crawler it crawls content, SharePoint or otherwise, and extracts crawled properties and metadata to send to the content processing component It talks to one or more crawl database to temporarily store info about crawled items and to track crawl history The more content you have the more of these you need, so you will need to add them as your crawled content increases B2B TECHNOLOGIES | PRESENTATION
8
Crawl Database Crawl and content Crawl and Content Processes Contains detailed tracking and historical information about crawled items Holds information such as the last crawl time, the last crawl ID and the type of update during the last crawl Can have one or more crawl components associated with it. If your database IO is taking a big hit during crawl times you are going to need to add another crawl database. Not only that but you are going to need to add it to a new disc or spindle. Also, you need to have one of these per 20 million items. B2B TECHNOLOGIES | PRESENTATION
9
Content Processing Component
Crawl and content Crawl and Content Processes Lives between the Crawl and Index Components Transforms crawled items into artifacts that can be included in the search index Performs linguistics processing Interacts with the analytics component Writes information about links and URLs to the link database Maps crawled properties to managed properties The content processing component, which is new to SharePoint 2013, receives info from the crawl component and then processes and sends it to The indexing component. It also interacts with the analytics component by writing information about links and urls to the link database And is responsible for mapping crawled properties to managed properties It also performs linguistics processing by way of language detection and entity extraction B2B TECHNOLOGIES | PRESENTATION
10
Indexing Component Logical representation of an index replica
Index and query Index and query processes Logical representation of an index replica One index component must be provisioned for each replica Receives processed items from the content processing component and writes them to the index file Receives queries from the query processing component and provides a result set in return Next I’m going to go over the Index and query processes. An index component is a logical representation of an index replica And has a one to one relationship with an index replica Your index component is what populates your index and is also what serves up result sets to your query component Index replicas are the means by which to achieve fault tolerance For instance say you have 1 index partition. You can house this partition and one of its replicas on 2 different servers if one goes Down the query component just talks to the other and the user receives their results Index partitions are just what they sound like. They are pieces of the index that collectively make up the index. Each partition is stored in a set of files on a disk. Each has a max item limit of 10 million items, which is very much something you need To consider when you are planning your farm. Something else to note is that it takes quite a while to split an existing partition, so if you have 8 million search items and your Content growth is 20% a year, you are going to want to go ahead and start out with more than one partition Index partitions are needed as your content grows, index replicas are needed for fault tolerance and query load. A typical way to scale out search is to replicate one index partition across 2 servers or VMs, if you are using VMs you want them to be on different physical hosts in order to achieve fault tolerance. B2B TECHNOLOGIES | PRESENTATION
11
Query Processing Component
Index and query Index and query processes Lives between the search front end and the index component Analyzes and processes search queries and results Performs linguistics processing Receives queries from the query processing component and provides a result set in return This component is what serves up results to your front end servers It analyzes queries and pulls back results based on the queries It performs linguistics processing such as word-breaking and stemming It talks to the index component which returns a result set based on the processed query back to the query processing component And then processes that result prior to returning it to the users B2B TECHNOLOGIES | PRESENTATION
12
Search Administration
Made up of the Search Administration Component and its Database The Search Administration Component runs a number of system processes and carries out provisioning The Search Administration Database stores search configuration data The search administration component does what it sounds like it does, it keeps search running. It handles all the essential system processes that allow search to function You can have more than one search admin component, but only one can be active at any given time The search administration database is where the configuration data is stored. This is where your query rules Crawl rules, topology and crawled and managed property mappings live. It is worth noting that you can only have one Admin database per search service application. B2B TECHNOLOGIES | PRESENTATION
13
Analytics Processing Component
Analytics Processes Performs search and usage analytics The results from the analyses are added to the items in the search index The results from the usage analytics are stored in the analytics reporting database The analytics processing component performs both search and usage analytics It uses information from these analyses to improve search relevance, create search reports and generate recommendations And deep links Search analytics is about extracting information like links, number of times an item is clicked, anchor text, data related to people and metadata – from the link database All of which is important to relevance. So one major function of this component is to make sure that your users are finding what they need quickly Usage analytics is about analyzing usage log info received from the front end via the even store. This is where your usage and statistics reports are generated. Basically, this component analyzes what users are querying and how they interact with the results. B2B TECHNOLOGIES | PRESENTATION
14
Link Database Analytics Analytics Processes Stores information extracted by the content processing component Stores information about search clicks The link data base stores info extracted by the content processing component as well as search click data That being the number of times people click on a search result from the search result page. It is all stored unprocessed so it can be analyzed by the analytics processing component B2B TECHNOLOGIES | PRESENTATION
15
Analytics Reporting Database
Analytics Processes Stores the results of usage analytics Stores statistics information from the analyses SharePoint uses this information to create Excel reports that show different statistics B2B TECHNOLOGIES | PRESENTATION
16
Event Store Holds usage events that are captured on the front-end
Analytics Analytics Processes Holds usage events that are captured on the front-end The events are stored as log files on the application server that hosts the analytics processing component An example of the event captured is the number of times an items is viewed B2B TECHNOLOGIES | PRESENTATION
17
Search Topology Small Farm Topology
This is an example of a small search farm topology that is fault tolerant. Notice it is just search. That is an option you can build out a farm that’s sole purpose it to serve as the search service for other farms. I chose this graphic because it contains only what is relevant to this session. Something to note here, no server has two of the same components. That is not allowed. Notice this farm has one index partition, so it can only handle 10 million items. One thing this doesn’t show is disc. You are going to want to have you index partitions live on dedicated discs/spindles if possible. They are very IO intensive and anything else you have going on, on that disc will degrade performance. In addition to that DO NOT let your antivirus scan your index ever. One other thing to note that along with the antivirus scan was mentioned multiple times at Share Conference last year, don’t use dynamic ram allocation on search servers. Make sure each server has dedicated RAM they need it. Search is very resource intensive. B2B TECHNOLOGIES | PRESENTATION
18
Search Topology Medium Farm Topology
Here we have a medium farm topology that can handle up to 40 million items in its index. Note that Bulk processing, such as Crawl Analytics and content processing have been split from Query Traffic. If you have the resources this is what you want to do to achieve the best performance results. B2B TECHNOLOGIES | PRESENTATION
19
Search Topology Large Farm Topology
Here we have a large farm topology. This farm is capable of handing 100 million items. 2 sets of clustered or mirrored database servers with every database that you could have more than one of living on both. B2B TECHNOLOGIES | PRESENTATION
20
Content Volume Scaling Scaling Dimensions Content volume scaling is just a matter of making sure you stay within the items limits that Microsoft has specified 10 million items per index partition 20 million items per query database Content Volume is easy, if you have more than 10 million items in your index you need to create a new partition which mean a new index component and replica are required. If you need fault tolerance you will need to span this partition between two servers via replicas, this can be VMs but they need to be on separate hosts. Another thing to consider here is that you should have a query database per 20 million items. B2B TECHNOLOGIES | PRESENTATION
21
Query Load Query Processing Component Considerations CPU Load
Search Scaling Scaling Dimensions Query Processing Component Considerations CPU Load Queries per second Query transformations Network Load Number of index partitions Size of queries and results There is a bit more to consider for query load. When it comes to query load you have to consider CPU and Network load. As far as CPU Load is concerned you need have to factor in Queries per second and Query transformations. Your general guideline for this is 4QPS per CPU core, so if you have a quad core box you are capable of 16 queries per second. If you need to accommodate more than that you have a couple options. You can throw more resources at the server, in this case more cores or you can add another server with a new Query Processing component. As far a network load your factors are going to be number of index partitions and size of queries and results if this is your issue you are going to need to monitor traffic during high query volume times to see where your bottleneck is. B2B TECHNOLOGIES | PRESENTATION
22
Index Component Considerations
Crawl Load Search Scaling Scaling Dimensions Index Component Considerations CPU Load Queries per second Item Count Index Disk IOPS Crawl Component Considerations Document per second Link Discovery Crawl Management Network Load Disk Load When considering scaling for Crawl load you need to look at the Index component, Crawl component, and Content processing component. When looking at the index component you need to pay attention to queries per second and your item count. The index component is a bit unique too because it spans both crawl and query load. For CPU load you are going to need to monitor the server during crawl times to determine if you need to add additional CPU. During high query times you need to do the same. You also need to pay attention to disk IO, your best bet here is to start out with the index living on a dedicated drive/spindle. When looking at Crawl components you need to consider docs per second, link discovery and crawl management. Really you are going to want to monitor your CPU here during crawls and adjust as needed. Again you may be able to scale up, add more resources, or out, add more servers. You are going to need to monitor network traffic here and adjust or tune as needed, Disk load is also a factor, the index needs to be on its own disk and so does the temporary index that the crawl component creates as it is crawling. Just do this, it makes life easier from jump, it is worth the expense and will prevent future headaches etc. B2B TECHNOLOGIES | PRESENTATION
23
Content Processing Component
Crawl Load Continued Search Scaling Scaling Dimensions Content Processing Component CPU Load Documents per second Document size and complexity Feature extraction Network Load Document Size For the content processing component you will need to take CPU and network load into consideration when you are planning and scaling. If you have a large number of document, lots of large or complex documents you are going to want to throw a lot of CPU at the server/s that house your content processing component. Network load is going to be determined by this as well. If you experience slow response during crawls and your other servers seem fine you are going to want to monitor your content processing component and address any issues you come across. B2B TECHNOLOGIES | PRESENTATION
24
Analytics Processing Component
Analytics Load Search Scaling Scaling Dimensions Analytics Processing Component CPU Load Number of items Site activity Network Load Same as CPU load For the analytics processing component you will need to take CPU and network load into consideration when you are planning and scaling. If you have a large number of items, and a very active site you are going to want to throw a lot of CPU at the server/s that house your analytics processing component. Network load is going to be determined by this as well. Remember this component serves to make search better it needs to be able to pull the resources needed in order to do so, so make sure to give it what it needs to help make your end users search experience as good as is possible. B2B TECHNOLOGIES | PRESENTATION
25
Search Resources Resources B2B TECHNOLOGIES | PRESENTATION
26
Making Topology Changes
Topology modifications Changing Topology Steps to make topology changes Turn on the search service on all servers that will house search components Clone the existing search topology Add or delete search components Promote the cloned topology to active This slide is a bit brief because I’m about to walk through this in a demo. It does cover the process though. Those of you who used the GUI to change the 2010 search topology might be sad to learn that is no longer an option. Powershell is the only way to go when making these changes. The important thing to understand is that you can’t make changes to an active topology, so you have to clone the topology in order to make changes to it. B2B TECHNOLOGIES | PRESENTATION
27
Demo Demo Search Demonstration B2B TECHNOLOGIES | PRESENTATION
28
Please thank our sponsors!
Item & Event Sponsors Speaker Shirts Attendee Shirts Abel Solutions SharePint Speaker Dinner
29
Please thank our sponsors!
Platinum Sponsors
30
Please thank our sponsors!
Gold Sponsors
31
Please thank our sponsors!
Silver Sponsors
32
Twitter Contests! Mention @SPS_ATL & #SPSATL
Win prizes for best tweets! Session Prizes 1 4 $25 gift cards 2 3 4 5 Multiple prizes each session! & #SPSATL You MUST Mention a speaker OR a sponsor to qualify @B2B_Tech_TB You must be present to win! Grand Prize winner selected from session winners
33
Tyler Bithell Visit my blog at http://sharepointv15.wordpress.com
Follow me on Mention me on Twitter using the hashtag #SPSATL Scan the QR code to fill out a survey and potentially win prizes Fill out surveys for big prizes SCAN QR CODES
34
Join us for SharePint Meehan’s – 200 Peachtree Street 6 – 8pm
Sponsored by:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.