MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight
Significant Content is Outside Structured Storage (RDBMS, OLAP, BI) Integration of this Content is Prohibitively Expensive (Time, Money, Resources) Extracting Insight, Analytics, and Recommendations is even harder Situation is a Confluence of Search | Predictive Analytics | Large-Scale Collaborative Filtering
Having all forms of digital information on a single platform allows people to blend unstructured and structured content and to drive insight and decision making Microsoft Semantic Engine provides a combination of technologies to form a contextual understanding of all digital content
Critical Business Need Analysts gather documents, media and web content about “Business Analytics”, “Data Integration” and “Search and Discovery” Core Machine Learning Unsupervised learning infers “Unified Information Access” concept cluster based on automated analysis of content Efficient Data Aggregation Cluster gains in relevance from mining across unstructured and structured sources added from ERP and BI systems User Relevance Boost Users (BDM) re- label cluster as “Unified Search, Discovery and Insight” and engine adopts it further boosting that cluster relevance Collaborative Boost Analysts collate this content requiring multi- resolution super- clusters with embedded sub- clusters Business Decision Making The CxO explores super-cluster and drafts business plan for her new division
Search and Collaboration | Personalized search, discovery and organization Legal | Precedent and subject based search over large scale textual corpuses Life Sciences | Systems biology with large volume data correlation and search Government Services | Intelligence, real-time analytics, visualization, clustering Social Networking | Social graph relevance mining, ranking criteria auto tuning
Unified Search, Discovery and Insight Automatic Clustering and Organization Meaning-Driven Indexing, Classification and Storage Scalable Content Processing over all Content Types Instant On Experience for Out of Box Value
Search, Discover and Organize features exposed via sample UX gallery Seamless installation and indexing of desktop, and web content Fully documented Managed APIs used in UX gallery and JavaScript / C# samples
Streams | Descriptors (Properties) | Kinds (Concepts) Streams processed into contextualized and indexed concepts for search | discovery | organization KR_CLIENT_225.docx STREAM KR_CLIENT_225.docx STREAM LEGAL DOCUMENT CONCEPT LEGAL DOCUMENT CONCEPT BILLABLE WORK CONCEPT BILLABLE WORK CONCEPT EVIDENCE CONCEPT EVIDENCE CONCEPT DEPOSITION CONCEPT DEPOSITION CONCEPT EXTRACTED PROPERTIES PROPERTY EXTRACTED PROPERTIES PROPERTY LEGAL CASE [xxx] CONCEPT CLUSTER LEGAL CASE [xxx] CONCEPT CLUSTER SEARCH AND SHARE MDP SEARCH AND SHARE MDP
Engine consists of self-contained set of pluggable services Text Processing Image Processing Video Processing Audio Processing Supervised Machine Learning Clustering MDI (RBV) Conceptual Search Inference Sequence Store (Suffix Tree) Distributed Content Store Ontology and Taxonomy Management Semantic Engine Search and Markup Trend and Predictive Analysis Automatic Organization Recommendation and Discovery
The logical architecture partitions analysis, indexing and storage API 1 API 2 API 3 Analysis 3 Analysis 2 Analysis 1 Staging Core Index Stream Store( )Annotate( ) Index( )Organize( ) Search( )… Text Image Audio Video
Designed to be hassle free out of the box Several programming languages and frameworks supported CLR/.NET, JavaScript, TSQL, C++
Sample of storing a stream in the system Initiates the content processing, classification, and indexing
Sample of search and recommendations Returns contextual results from the store and the web
Seamless Integration in Windows Desktop Federated Search Expose Meaning-Driven Indexing and Semantic Actions Zero Learning Curve
Importers Files PlugIns Plug-Ins Semantic Engine Database Kind Descriptor Stream KindLink ListKind
KindIDSourceUri C:\My Documents\Saint Germain Des Pres Cafe (Finest electro-jazz compilation)\05 Track 5.wma StreamIDKindIDStreamUriFormatStream audio/x- ms-wma 0xFFD8FFE000104A … DescriptorIDKindIDTypeAttributeValueDescriptorIDKindIDTypeAttributeValue Classificat ion Audio MetadataName05 Track 5.wma MetadataItem TypeWindows Media Audio File DescriptorIDKindIDTypeAttributeValue Classificat ion Audio MetadataName05 Track 5.wma MetadataItem TypeWindows Media Audio File MetadataLength00:05: MetadataWM/ProviderStyl e Electronica DescriptorIDKindIDTypeAttributeValue Classificat ion Audio MetadataName05 Track 5.wma MetadataItem TypeWindows Media Audio File MetadataLength00:05: MetadataWM/ProviderStyl e Electronica AudioTonality/Major AudioTempo/Moderato0.79 DescriptorIDKindIDTypeAttributeValue Classificat ion Audio MetadataName05 Track 5.wma MetadataItem TypeWindows Media Audio File MetadataLength00:05: MetadataWM/ProviderStyl e Electronica AudioTonality/Major AudioTempo/Moderato Classificat ion Music.8
Seamless Integration of Meaning-Driven Indexing in ALL SQL Tables Expose Meaning-Driven Indexing via T-SQL
PARTING THOUGHTS Unified Search, Discovery and Insight over Every Digital Artifact Extensible and Scalable Semantic Platform Zero Learning Curve
Built by Developers for Developers….
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.