An Ontological Approach to Financial Analysis and Monitoring
Application Architecture
Research Areas ● Data Extraction ● Data Disambiguation ● Semantic Association and Ranking ● MathML-S (MathML with Semantics)
● Ontology schema designed to provide common terminology for a domain. ● Ontology instances represent actual data from a domain. ● Data is extracted from multiple resources and translated into instances of a single ontology. ● Original data sources include: ● Databases ● XML files ● HTML pages ● Text documents ● Etc. Data Extraction
Data Disambiguation Ongoing research to develop relationship and attribute based disambiguation techniques so that the ontology can be meaningfully populated. Simple Example: Is “Athens” the city in Georgia or the city in Greece?
Data Disambiguation Challenges ● Merging two or more databases/ontologies/xml files with multiple references of the same logical entity ● Adding new entities to an ontology when a similar entity already exists ● Variations in database/ontology/xml schemas ● Variations in information representation ● Incomplete information ● Use of abbreviations, mis-spellings, various naming conventions, format changes, etc.
Data Disambiguation Schema Person -- SSN -- TelNumber -- FirstName -- MiddleName -- LastName -- Generation -- Marital Status -- Applicant -- dependent of -- spouse of -- works for -- affiliated with -- foreign influence event -- address Tim Robins Tim Robins Single People Soft event place23 Conflicting instances Timothy Wallace Robinson Timothy -- Wallace -- Robinson Married person Oracle event place23 Reconciling Oracle and PeopleSoft indicates the two person entities work for the same organization Recognized as a time sensitive attribute String similarity metrics Nature of attribute indicates its relative importance – SSN given a high weight in disambiguating person entities
Semantic Associations and Ranking Semantic Associations ● Semantic associations are relationships or paths between concepts in an ontology Ranking ● Ranking based on multiple factors ● Number of links, types of links, location in ontology, etc. ● Ranking indicates degree of semantic “closeness”
Semantic Associations and Ranking Characterizing document content in terms of ontology “semantic annotation” ● Correlate words/phrases from document with entities/relationships in ontology ● Entity Identification ● Meta-data added to document (from associated ontological knowledge) ● Active area of research but practically useful technology now available ● Constrained to content of ontology
Semantic Associations and Ranking Semantic Relationships between Documents and Ontology ● Semantic associations: relationships between document concepts and ontology concepts are discovered and ranked ● Ranking based on multiple factors ● no. of links, types of links, location in ontology, … ● Ranking indicates degree of semantic “closeness”
Semantic Associations and Ranking ● Highly relevant ● Closely related ● Ambiguous ● Not relevant ● Undeterminable Documents Ranking
Semantic Associations and Ranking Research Content ● Discovery & Ranking of semantic semantic associations ● Characterizing “need to know” in terms of ontological concepts & relationships ● Meta-data annotation of data and (semi-structured & unstructured) documents ● correlation of document content & concepts in ontology
Semantic Associations and Ranking Research Challenges In this project we are addressing: ● Discovery of Semantic Associations per entity per document ● Input/Visualization/Management of Context of Investigation ● Scalability on number of documents & ontology size ● Performs well with thousand documents ● Ranking of documents
Semantic Associations and Ranking Ranking of Documents Relevance “Closely related entities are more relevant than distant entities” E = {e | e Document } Ek = {f | distance(f, e E) = k }
Semantic Associations and Ranking Relevance Measures for Documents ● Relevance engine input ● the set of semantically annotated documents ● the context of investigation for the assignment ● the ontology schema represented in RDFS, and the ontology instances represented in RDF ● Relevance measure function used to verify whether the entity annotations in the annotated document can be fit into the entity classes, entity instances, and/or keywords specified in the context of investigation.
Semantic Associations and Ranking Ranking of Documents Relevance Four groups of document-ranking: ● Not Related Documents ● unable to determine relation to context ● Ambiguously Related Documents ● some relationship exists to the context ● Somehow Related Documents ● Entities are closely related to the context ● Highly Related Documents ● Entities are a direct match to the context
Semantic Associations and Ranking Ranking of Documents Relevance continued Cut-off values determine grouping of documents w.r.t. relevance ● These are customizable cut-off values (more control and more meaningful parameters compared to say automatic classification or statistical approaches) “Inspection” of a document is possible via (a) original document or (b) original document with highlighted entities
Semantic Associations and Ranking Ontology-driven Thematic Association Lifecycle Building a scalable and high performance capability with support for: Task domain ontology creation and maintenance Ontology “Knowledge” based on trusted sources supporting Document Classification Ontology-driven Semantic Metadata Extraction/Annotation Utilizing semantic metadata and ontology to associate document theme(s) with analytical task Weighting process used to measure degree of relevance Task Domain Schema Creation Ontology Population Metadata Extraction And Annotation Enhancement Thematic Association/ Relationships Discovery Semantic Relationship Rank Analysis Ontology API MB KB
MathML-S ● An interface has been developed that allows the user to specify the set of things that need to be verified for any given individual. ● This kind of “ultimate flexibility” is possible due to the ontological approach used. ● An application of research in modeling rules for identifying financial irregularities using MathML (MathML-S) ● Traversals, formulas, rules and profiles represent the data, calculations and checks that need to be performed ● These are created using the graphical interface developed, and stored in the Component Library
MathML-S ● A traversal is a path through the ontology ending with a data-type value. ● A formula is a computation of a value using concepts found in the ontology. It may be constrained using data found in the ontology instance data. ● A rule is a verification (with boolean value) performed on: ● A computed value that come from a formula The existence of certain types of relationships Other types of rules may be added based on feedback. ● A profile is a collection of rules, where rules may be given different weights. A profile value can be computed for a person represented in the ontology.
MathML-S Rule Example Solvency Ratio Check Traversals ● Asset_T = value of Asset(n) ● Liability_T = value of Liability(n) Formulas ● Total_Assets = Asset_T1 + Asset_T2_ + … + Asset_Tn ● Total_Liabilities = Liability_T1 + Liability_T2 + … + Liability_Tn ● Solvency_Ratio= Total_Assets / Total_Liabilities Rule ● Solvency_Ratio_Check = Solvency_Ratio > 1.1
MathML-S ● Data integrated in the ontology is queried to verify compliance with respect to a set of customizable rules ● These rules include math calculations, verification of conditions, calculation and verification of ratios, etc. ● The ontological approach allows definition of such rules by using concepts and relationships-types of the ontology. (Thus applicable to sub- concepts)
MathML-S ● Semantic matching and the graph-like nature of ontology provide flexibility on defining the rules yet few research issues are to be addressed ● Operands for a formulas are data items in the ontology. Some data values are retrieved by traversing specific sequences of relationships ● Formulas are self-contained so that they can be re-uses in various rules (thus providing flexibility on maintenance) ● A profile is a set of rules where its verification implies querying the ontology and at the same time computing formulas and rule values
MathML-S