Download presentation
Presentation is loading. Please wait.
Published byTracy Gray Modified over 9 years ago
1
Confidential. The material in this presentation is the property of Fair Isaac Corporation, is provided for the recipient only, and shall not be used, reproduced, or disclosed without Fair Isaac Corporation's express consent. © 2008 Fair Isaac Corporation. HNC Data Alignment Research Direction Richard Rohwer Senior Principal Scientist, Advanced Technologies HNC Software / Fair Isaac
2
2 © 2008 Fair Isaac Corporation. Confidential. Cognition needs Semantics needs Massive Data Massive Data Tacit Knowledge Explicit Knowledge KNOWLEDGE Statistics includes Semantics / Meaning = Association Statistics Information Organization Statistics Reasoning Theorem: Probability distributions are the UNIQUE logically consistent knowledge representation.
3
3 © 2008 Fair Isaac Corporation. Confidential. From massive data to machine cognition: The technical principles Mathematical ingredients: Association-Grounded Semantics (AGS) - To capture meaning mathematically. Semantically-Driven Segmentation (SDS) - To extract the most meaningful patterns. Distributional Alignment (DA) - To compare meanings abstractly. Semantically Enriched Reasoning Engine To think in terms of meanings instead of symbols.
4
4 © 2008 Fair Isaac Corporation. Confidential. Association-Grounded Semantics (AGS): Meaning = Usage fro onto reachin g acrs btwn beyond frm inside alg across via thru ovr around near between within through into over by from at jun sept apr jul nov oct dec aug feb sep jan captain mr gen msgt ltc tsgt cpt sgt ssgt capt maj lt bsb msj tng opv adm atm cpo bdo notal u b Cables
5
5 © 2008 Fair Isaac Corporation. Confidential. Distributional Alignment (DA) Abstraction ~ Structural Commonality Align semantic spaces by distribution of content. No need to understand content. Transport meaning between Languages Dialects Cultures Transport metaphorically between topics. transLign algorithm: No language knowledge. No tie words. No aligned corpora.
6
6 © 2008 Fair Isaac Corporation. Confidential. Alignment: Terminology RP English Cable English Blog Dialects Less Commonly Taught Language Institutional Dialects Terror Cell Obfuscated Slang Professional Dialects Newswire English Foreign Newswire Polysemy (Sense resolution) Good solutions from NIMD: Entity Disambiguation (5.5% err vs. 13.5% err in KDD) General terms Information Loss (Unequal expressive power) Automation AGS techniques do not require manually constructed resources… … but can use them when available. “bank” “river bank” “bank note” AGS Semantic Space fluffy snow What ‘cha call it? Naïve Bayes
7
7 © 2008 Fair Isaac Corporation. Confidential. Alignment: Schemata Column name I n s t a n c e Table name Column name I n s t a n c e Table name Natural Language Corpora Natural Language Corpora Semantic Alignment Instance Statistics (Joined across schema) Semantic Alignment Structural Alignment Schema Graph
8
8 © 2008 Fair Isaac Corporation. Confidential. Alignment: Ontologies More complex graph structure Reflecting multiple (transitive) relations - is-a, part-of, reports-to, prerequisite-for, … Implies more options for defining AGS statistics - More relations, more ways to define co- occurrence. Big Picture issue: Ontological structure makes general statements about instances of relationships within data. So does AGS. How are these related?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.