Download presentation
Presentation is loading. Please wait.
Published bySusanna Sherman Modified over 8 years ago
1
11 Why tune relevance Because we want to find the one single best item, among a large group of possible candidates….
2
22 What Is Relevancy –Quantified through a Rank Score –A query's result set sorted in descending order by rank score (or by any other sorting filed or combination of sorting fields) –Can be configured through Rank Profile (Default model for the site or sections of the site Search GUI (users can define, e.g. highly rated or poplar results show up higher) Search Business Center (Business Managers have fine grained control of results for any query)
3
Multiple levels of control Relevancy Ranking Precision Recall Business Rules InPerspective™ Core Algorithmic Model Application Model Levels of control
4
FAST Relevancy Framework Business Rules InPerspective™ Core Algorithmic Model Application Model Sorting order, navigation, relevance feedback Accessible to…Control Mechanisms End Users Business Managers Query and document “boosting” (BMCP) Administrator “Rank Profile” DeveloperAlgorithm “weights” Levels of control Multiple levels of control
5
FAST Relevancy Framework InPerspective™ Freshness How fresh is the document compared to the time of the query?Completeness How well does the query match superior contexts like the title or the url? Example: query=”Mexico”, Is ”Mexico” or ”University of New Mexico” best?Authority Is the document considered an authority for this query? Examples: Web link cardinality, article references, product revenue, page impressions,...Statistics How well does the contents of this document on overall match the query? Examples: Proximity, context weights, tf-idf, degree of linguistic normalization,++Quality What is the quality of the document? Examples: Homepage?, Press release?,...Distance What is the distance from where I am? InPerspective
6
6 FAST Relevancy Framework : Rank Profile Rank-Profile: A Relevancy Mixing Board Authority : Freshnes s: Proximity : Context:Body: Descripti on URL: Keyword s: Title:
7
Search Business Center SBC
8
FAST Unity TM What It Does, How It Works, and What Value It Provides
9
9 FAST Unity at a Glance Search Index Internal SourcesExternal Sources (e.g. another ESP instance) Front-end Search Application FAST ESP Federation … Web Search Engine Web Site … FAST Sources FAST ESP 5.x FAST Data Search 4.x FAST ImPulse FAST AdMomentum FAST RetrievalWare External Sources Microsoft SharePoint 2003 & 2007 Web search engines Google, Yahoo, OpenSearch, Gigablast Web services Match.com, PriceGrabber, Google Image Advertising services Google Adsense
10
10 Look and feel - Unity Featured Content Calls-to- Action Ads User- generated Content Third-party Content Multimedia Subscription Feeds
11
11 Example Web 2.0 Model –One query - multiple result sets –Results are returned asynchronously –Delivered directly to the browser
12
Single Search Node Performance –20-50 Million documents Up to 1TB of information –100-500 queries per second –20-50 ms query response time –Down to 50 ms indexing latency –Indexing 50+ documents per second while maintaining search performance FAST Scalability Facts: Deployments with >40TB Deployments with >3B documents Deployments with 1 to 1000+ servers Deployments with 1000s of queries per second Deployments with >500 updates per second 20-50 ms query response time Sub-second indexing latency Crawling >200 documents per second per server Document Freshness SCALING FAST ESP - Scalability 3D Scalability: #Documents - #Users - Index Latency Dual Pentium4, 3 Ghz 4 GB Ram 3 X SCSI 15K rpm HW RAID-0 derivate
13
13 Query Performance of FAST Search VS RDBMS Proven High QPS, Low Latency Access – Database Offloading Structured data: 5 million records; 13 fields per record Structured queries: 22 SQL queries ( Representative in ERP ) ESP5 RDBMS #1: FAST ESP4 w/ disk Mean = 99 [ms] St.dev. = 36 [ms] #2: Oracle w/ memory mapping Mean = 4 057 [ms] St.dev. = 9 368 [ms] QPS Latency Identical HW : single node, 2 CPU, 4GB ram 3 SCSI disks Identical data : auction data from eBay, 3.6 million doc’s Identical queries: 200 queries defined by Oracle ESP5 RDBMS
14
14 ESP5 Scalability Efficiency Per Server & Linear Scaling CONTENT REFINEMENT QUERY PROCESSING RESULT PROCESSING SEARCH INDEX... Pluggable Content DispatcherQuery & Result Distribution Documents Query
15
15 » Linear scaling of feeding capacity » Archival solutions @ 40 PB » 14G Search solution (14X google) » Feed @ >6000 updates/s » Querying @ >2000 QPS ESP5 – Raising the Bar Enabling the Adaptive Information Warehouse » 100M documents per server » >2 X indexing throughput » Consistent low latency » Reduced disk footprint » Feeding architecture improved » Simplified state management » Improved fault-tolerance » Out-of-the-box monitoring » End2End SOA philosophy » Studio&Programmatic extensibility » Semantic index » SAN/NAS optimizations SCALABLEHIGH PERFORMING RELIABLEFLEXIBLE
16
FAST ESP Competence Analysis 16. Performance & Scalability with commodity servers. 70+ multi-language support. Easy to use management tool and security control. Relevancy/Precision find what users want. Navigation to quickly to find what users want within few clicks. Add-on applications including Recommendation, Advertising promotion, Mobile access, DB cleansing/offloading, …. 200+ connectors to connect market popular silos. Extensibility and Integration with open architecture. Market leading #1. Large R&D investment and commitment
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.