Presentation is loading. Please wait.

Presentation is loading. Please wait.

Amit Sheth, CTO, Semagix Inc

Similar presentations


Presentation on theme: "Amit Sheth, CTO, Semagix Inc"— Presentation transcript:

1 Amit Sheth, CTO, Semagix Inc http://www.semagix.com
Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Keynote - the First Online Metadata and Semantics Research Conference Part I: Industrial Applications November 23, 2005 Amit Sheth, CTO, Semagix Inc 4/5/2019 2005 SEMAGIX All rights reserved.

2 Part II: Health-care Semantic Web Application
Outline 4/5/2019 I will drive the talk with applications. In the process, we will review underlying processes, technologies and research challenges. Part I: Industrial Semantic Technology Applications in Risk and Compliance Part II: Health-care Semantic Web Application Part III: Bioinformatics Semantic Web applications Part I relates to applications developed for Semagix’s customers using a technology that commercialized research at University of Georgia’s LSDIS lab. Many slides have notes which provide additional material and pointers to related documents/papers and talks for further information. 2004 SEMAGIX All rights reserved.

3 Things to Consider About the Semantic (Web) Technologies
4/5/2019 Build Ontology Build Schema (model level representation Populate with Knowledgebase (people, location, organizations, events) Automatic Semantic Annotation (Extract Semantic Metadata) Any type of document, multiple sources of documents Metadata can be stored with or sparely from documents Applications: search (ranked list of documents of interest (semantic search), integrate/portal, summarize/explain, analyze, make decisions Reasoning techniques: graph analysis, inferencing Types of content/documents Use of standards Scalability Performance opscenter 2004 SEMAGIX All rights reserved.

4 Ontology-driven Information System Lifecycle
Semantic (Web) Technology State of the Art 4/5/2019 Ontology-driven Information System Lifecycle Building a scalable and high performance system with support for: Ontology creation and maintenance Ontology-driven Semantic Metadata Extraction/Annotation Utilizing semantic metadata and ontology Semantic search/querying/browsing Information and application integration - normalization Analysis/Mining/Discovery – relationships Schema Creation Ontology API Analytic Application Creation Ontology Population MB KB Semantic Web in a Nutshell: - Ontology as the centerpiece - Metadata that associate meaning to content Computing (complex querying, inferencing, other reasoning) that support semantic applications Further discussion on this lifecycle can be found in BSBQ Application Creation Metadata Extraction 2004 SEMAGIX All rights reserved.

5 Types of Ontologies (or things close to ontology)
4/5/2019 Upper ontologies: modeling of time, space, process, etc Broad-based or general purpose ontology/nomenclatures: Cyc, WordNet ; Domain-specific or Industry specific ontologies News: politics, sports, business, entertainment (also see TAP and SWETO) Financial Market Terrorism Biology: Open Biomedical Ontologies , GlycO; PropeO Clinical (See Open Clinical) GO (nomenclature), NCI (schema), UMLS (knowledgebase), … Application Specific and Task specific ontologies Anti-money laundering, NeedToKnow, (Employee or Vendor Whetting) Equity Research Repertoire Management CENTRAL ROLE OF ONTOLOGIES Ontology represents agreement, represents common terminology/nomenclature Ontology is populated with extensive domain knowledge or known facts/assertions Key enabler of semantic metadata extraction from all forms of content: unstructured text (and 150 file formats) semi-structured (HTML, XML) and structured data Ontology is in turn the center price that enables resolution of semantic heterogeneity semantic integration semantically correlating/associating objects and documents Large number of ontologies have been developed and many are in use Fundamentally different approaches in developing ontologies: schema vs populated; community efforts vs reusing knowledge sources 2004 SEMAGIX All rights reserved.

6 More sophisticated semantic technologies exploit ontologies and
Evolution of Meta Data 4/5/2019 More sophisticated semantic technologies exploit ontologies and Provide scalability and flexibility Handle all types of data (unstructured, semi-structured, structured) Create SmartData – enhancing raw data with context and relationships Accommodate SmartQuerying – flexible, intelligent querying Enable powerful enterprise decision making See (Semantic Meta Data For Enterprise Information Integration) Large scale metadata extraction and semantic annotation is possible. IBM WebFountain [Dill et al 2003] demonstrates the ability to annotate on a Web scale (i.e., over 2.5 billion pages), while Semagix Freedom related technology [Hammond et al 2002] demonstrates capabilities that work for a few million documents per day per server. However, the general trade-off of depth versus scale applies. Storage and manipulation of metadata for millions to hundreds of millions of content items requires database techniques with the challenge of improving performance and scale in presence of more complex structures 2004 SEMAGIX All rights reserved.

7 Semagix Semantic Enhancement Engine
Automatic Semantic Matadata Extraction from unstructured data 4/5/2019 Semagix Semantic Enhancement Engine See Hammond, Sheth, Kochut 2002: Semantic Enhancement Engine: [Hammond, Sheth, Kochut 2002] 2004 SEMAGIX All rights reserved.

8 Semantic Annotation/ Metadata Extraction + Enhancement
4/5/2019 Semantic Annotation/ Metadata Extraction Enhancement Large scale metadata extraction and semantic annotation is possible. IBM WebFountain [Dill et al 2003] demonstrates the ability to annotate on a Web scale (i.e., over 2.5 billion pages), while Semagix Freedom related technology [Hammond et al 2002] demonstrates capabilities that work for a few million documents per day per server. However, the general trade-off of depth versus scale applies. Storage and manipulation of metadata for millions to hundreds of millions of content items requires database techniques with the challenge of improving performance and scale in presence of more complex structures. 2004 SEMAGIX All rights reserved.

9 Automatic Semantic Annotation
4/5/2019 Limited tagging (mostly syntactic) COMTEX Tagging Content ‘Enhancement’ Rich Semantic Metatagging Value-added Semagix Semantic Tagging Value-added relevant metatags added by Semagix to existing COMTEX tags: Private companies Type of company Industry affiliation Sector Exchange Company Execs Competitors © Semagix, Inc. 2004 SEMAGIX All rights reserved.

10 Semagix Freedom Architecture
for building ontology-driven information system 4/5/2019 Further details of this technology can be found in Managing Semantic Content for the Web 2004 SEMAGIX All rights reserved.

11 Global Bank 4/5/2019 Aim Legislation (PATRIOT ACT) requires banks to identify ‘who’ they are doing business with Problem Volume of internal and external data needed to be accessed Complex name matching and disambiguation criteria Requirement to ‘risk score’ certain attributes of this data Approach Creation of a ‘risk ontology’ populated from trusted sources (OFAC etc); Sophisticated entity disambiguation Semantic querying, Rules specification & processing Solution Rapid and accurate KYC checks Risk scoring of relationships allowing for prioritisation of results Full visibility of sources and trustworthiness Additional details can be found in 2004 SEMAGIX All rights reserved.

12 The Process Ahmed Yaseer: Watch list Organization Hamas FBI Watchlist
4/5/2019 Watch list Organization Company Hamas WorldCom FBI Watchlist Ahmed Yaseer: Appears on Watchlist ‘FBI’ Works for Company ‘WorldCom’ Member of organization ‘Hamas’ member of organization appears on Watchlist Ahmed Yaseer works for Company 2004 SEMAGIX All rights reserved.

13 Global Investment Bank
4/5/2019 Law Enforcement Public Records World Wide Web content BLOGS, RSS Watch Lists Regulators Semi-structured Government Data Un-structure text, Semi-structured Data User will be able to navigate the ontology using a number of different interfaces Establishing New Account Scores the entity based on the content and entity relationships (a) Serve global population of 500 users (B) Complete all source checks in 20 seconds or less © Integrate with enterprise single sign-on systems (d) Meet complex name matching and disambiguation criteria (e) Adhere to complex security requirements Results: Rapid, accurate KYC checks; Automatic audit trails; Reduction in in false positives; Streamlines and enhances due diligence of potential high risk accounts Example of Fraud Prevention application used in financial services 2004 SEMAGIX All rights reserved.

14 Law Enforcement Agency
4/5/2019 Aim Provision of an overarching intelligence system that provides a unified view of people and related information Problem Need to create unique entities from across multiple disparate, non-standardised databases; Requirement to disambiguate ‘dirty’ data Need to extract insight from unstructured text Approach Multiple database extractors to disambiguate data and form relevant relationships Modelling of behaviours/patterns within very large ontology (6Mn+ entities) Solution Merged and linked case data from multiple sources using effective identification, disambiguation, and link analysis Dynamic annotation of documents Single query across multiple datasets 360 view of an individual and relevant associations Requirements: (a) Merge and link case data from multiple sources to a taxonomy using effective identification, disambiguation, and analysis; (b) Ability to use pre-defined/investigation-specific case studies for search and match © Positive and negative searching of cases (d) Ability to explore case data starting from any entity via link analysis Results: Superior, faster identification of prolific offenders; Better prioritization of cases; Greater investigator productivity and effectiveness 2004 SEMAGIX All rights reserved.

15 Free text searching across aggregated information sources
Profile Creation Complex Querying Summary of Results Investigation 4/5/2019 Gisondi, white ford expedition, main street, assault, traffic offences Free text searching across aggregated information sources 2004 SEMAGIX All rights reserved.

16 Profile Creation Complex Querying Summary of Results Investigation 4/5/2019 Unified view of direct and indirect results that best match the complex query and the profile 2004 SEMAGIX All rights reserved.

17 Direct and indirect relationship scoring driven by risk weightings
Profile Creation Complex Querying Summary of Results Investigation 4/5/2019 Direct and indirect relationship scoring driven by risk weightings Aggregated knowledge from disparate sources Knowledge Annotation of known entities from within free text 2004 SEMAGIX All rights reserved.

18 Technical Capabilities
4/5/2019 Ontology-driven Information Systems Ontology Quality and Freshness trusted knowledge sources, weekly to daily update Populated Ontology Size millions of assertions; sometimes exceeding 10 million Data: Type and Amount structured, semi-structured, unstructured Metadata Extraction Automatic extraction, semantic metadata; Computation: query expressiveness (over metadata and ontology), rules, ranking Visualization Scalability and Performances main-memory vs database based 2004 SEMAGIX All rights reserved.

19 QUESTIONS? http://www.semagix.com
4/5/2019 A relevant article: A relevant conference: 2005 SEMAGIX All rights reserved.

20 4/5/2019 2004 SEMAGIX All rights reserved.

21 4/5/2019 2004 SEMAGIX All rights reserved.

22 4/5/2019 2004 SEMAGIX All rights reserved.


Download ppt "Amit Sheth, CTO, Semagix Inc"

Similar presentations


Ads by Google