Download presentation
Presentation is loading. Please wait.
Published byRachel Perkins Modified over 9 years ago
1
Search Gotchas Sharon Richardson Joining Dots
2
Indexing Architecture There can be only one… …indexing server
3
Front-end Index Internal Content Web front-ends Internal & External Content sources Single Server Deployment Web Farm Queries Indexing Index Server External Content sources
4
Large Web Farm Indexing server Web front-ends Query servers Internal & External Content sources
5
Scaling to 50 million docs Source: Estimate performance and capacity requirements for search environments 600Gb content created 100Gb index Full crawl took 35 days (approx 15 docs per sec) If 2% of content changes, an incremental crawl will take approx 8 to 12 hours SharePoint Sites10 million items File shares15 million items Web content15 million items People profiles2.5 million Documents (auto-generated)7.5 million Properties (metadata)1 million
6
Scaling to 50 million docs Source: Estimate performance and capacity requirements for search environments Test Lab Recommended 4 dual-core Intel Xeon 2.66 Ghz processors Dual 3Ghz processors 32Gb RAM 4Gb RAM (for > 1m docs) Index server disk space requirements (according to doc) Size of data crawled = Y Size of index = range of 5% thru 12% * Y = X Initial disk space = a minimum of 2.5 * X
7
Taxonomy Management Still haven’t found what you’re looking for?
8
Taxonomy Management Conceptual and related-term searches require classification –Manual = user tagging –Automatic = provide training set (Bayesian inference algorithm Rev. Thomas Bayes (1702–1761)
9
Taxonomy Options Scopes Keywords Columns (metadata for internal content) Customised results pages Third-party add-ons
10
From Gotcha to Oscar
11
Social Searches Social network User profile Diary and contact info Organisation hierarchy
12
What vs Who? Employees get 50%-75% of their relevant information directly from other people More than 80% of enterprises’ digitized information reside on individual hard drives and in personal files –Source: “The Knowledge Worker Investment Paradox” Gartner research 7/17/2002
13
Define Search …beyond queries SharePointPlatformServices Find Use Share Web Desktop Intranet
14
References Estimate performance and capacity requirements for search environments http://technet2.microsoft.com/Office/en- us/library/5465aa2b-aec3-4b87-bce0- 8601ff20615e1033.mspx?mfr=truehttp://technet2.microsoft.com/Office/en- us/library/5465aa2b-aec3-4b87-bce0- 8601ff20615e1033.mspx?mfr=true Third-party tools and add-ons http://markharrison.co.uk/blog My blog http://www.joiningdots.net/blog
15
Thank you! Sharon Richardson Joining Dots Email: sharonr@joiningdots.netsharonr@joiningdots.net
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.