Amit Sheth, CTO, Semagix Inc

Slides:



Advertisements
Similar presentations
Almaden Research Center © 2006 IBM Corporation IOP 06 Open Source Intelligence Lesson Learned.
Advertisements

International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences Paul Smart, Ali.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Business Development Suit Presented by Thomas Mathews.
Chapter 1 Business Driven Technology
COMBASE: strategic content management system Soft Format, 2006.
© 2007 IBM Corporation Enterprise Content Management Integrating Content, Process, and Connectivity for Competitive Advantage Malcolm Holden October 2007.
The Engine Driving Business Management in Project Centric Environments MAGSOFT INTERNATIONAL LLC.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Microsoft Office Open XML Formats Brian Jones Lead Program Manager Microsoft Corporation.
Knowledge Portals and Knowledge Management Tools
Libraries and Institutional Content Management Systems
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Semantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking semantic applications WWW2004 (New York, May 22,
ASIDIC Spring Conference ‘Smart Content’ Uncovering the Value and Benefits of Semantic Technology Richard C. Fusco Director, Content Strategy – McGraw-Hill.
Redefining Perspectives A thought leadership forum for technologists interested in defining a new future June COPYRIGHT ©2015 SAPIENT CORPORATION.
Databases & Data Warehouses Chapter 3 Database Processing.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
DYNAMICS CRM AS AN xRM DEVELOPMENT PLATFORM Jim Novak Solution Architect Celedon Partners, LLC
Chapter 11 Databases.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
SRA Enabling Programme SRA Board Meeting – Public Session Carey Street, London 26 th February 2009.
Contacts Enecto - Turning web visits into business InterAction User Group David Botros Senior Account Manager Tel: +44 (0) Mob: +44.
White House Conference on Semantic Technology Presenter: Clemens Bertram, VP Engineering.
The Engine Driving Purchasing Management in Complex Environments MAGSOFT INTERNATIONAL LLC.
PO320: Reporting with the EPM Solution Keshav Puttaswamy Program Manager Lead Project Business Unit Microsoft Corporation.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Semantic Web & Semantic Web Processes A course at Universidade da Madeira, Funchal, Portugal June 16-18, 2005 Dr. Amit P. ShethAmit P. Sheth Professor,
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Microsoft TechForge 2009 SQL Server 2008 Unplugged Microsoft’s Data Platform Vinod Kumar Technology Evangelist – DB and BI
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Operational vs. Informational System. Operational System Operational systems maintain records of daily business transactions whereas a Data Warehouse.
Semantic (Web) Technology in Action - today The Semantic Web – Scientific American article considered harmful? WWW2003 Panel (PN2), Budapest, May 21, 2003.
Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Aim Ability to automate the detection of financial inconsistency and irregularity Problem Need to create a unified and logically rigorous terminology.
An Ontological Approach to Financial Analysis and Monitoring.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Accurate  Consistent  Compliant Contact: i4i the structured content company the structured content company.
SAP BI – The Solution at a Glance : SAP Business Intelligence is an enterprise-class, complete, open and integrated solution.
Metadata Driven Clinical Data Integration – Integral to Clinical Analytics April 11, 2016 Kalyan Gopalakrishnan, Priya Shetty Intelent Inc. Sudeep Pattnaik,
The Palantir Platform… …Changes in 2.3
CIS 375 Bruce R. Maxim UM-Dearborn
Building a Data Warehouse
Building Enterprise Applications Using Visual Studio®
Data Platform and Analytics Foundational Training
CIM Modeling for E&U - (Short Version)
Cloud based linked data platform for Structural Engineering Experiment
Overview of MDM Site Hub
Technology & Analytics
Elsevier Activity Range
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Overview & Applications Welcome!
Database Management System (DBMS)
Analyzing and Securing Social Networks
Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.
Global Enterprise Search
Stop Data Wrangling, Start Transforming Data to Intelligence
Service-enabling in Financial Domain
Seismic Implementation Kickoff
Web Mining Department of Computer Science and Engg.
Jonathan Griffin, Managing Director, IFIS Publishing &
Microsoft Azure Data Catalog
SDMX IT Tools SDMX Registry
Presentation transcript:

Amit Sheth, CTO, Semagix Inc http://www.semagix.com Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Keynote - the First Online Metadata and Semantics Research Conference http://www.metadata-semantics.org Part I: Industrial Applications November 23, 2005 Amit Sheth, CTO, Semagix Inc http://www.semagix.com 4/5/2019 2005 SEMAGIX All rights reserved.

Part II: Health-care Semantic Web Application Outline 4/5/2019 I will drive the talk with applications. In the process, we will review underlying processes, technologies and research challenges. Part I: Industrial Semantic Technology Applications in Risk and Compliance Part II: Health-care Semantic Web Application Part III: Bioinformatics Semantic Web applications Part I relates to applications developed for Semagix’s customers using a technology that commercialized research at University of Georgia’s LSDIS lab. Many slides have notes which provide additional material and pointers to related documents/papers and talks for further information. 2004 SEMAGIX All rights reserved.

Things to Consider About the Semantic (Web) Technologies 4/5/2019 Build Ontology Build Schema (model level representation Populate with Knowledgebase (people, location, organizations, events) Automatic Semantic Annotation (Extract Semantic Metadata) Any type of document, multiple sources of documents Metadata can be stored with or sparely from documents Applications: search (ranked list of documents of interest (semantic search), integrate/portal, summarize/explain, analyze, make decisions Reasoning techniques: graph analysis, inferencing Types of content/documents Use of standards Scalability Performance opscenter 2004 SEMAGIX All rights reserved.

Ontology-driven Information System Lifecycle Semantic (Web) Technology State of the Art 4/5/2019 Ontology-driven Information System Lifecycle Building a scalable and high performance system with support for: Ontology creation and maintenance Ontology-driven Semantic Metadata Extraction/Annotation Utilizing semantic metadata and ontology Semantic search/querying/browsing Information and application integration - normalization Analysis/Mining/Discovery – relationships Schema Creation Ontology API Analytic Application Creation Ontology Population MB KB Semantic Web in a Nutshell: - Ontology as the centerpiece - Metadata that associate meaning to content Computing (complex querying, inferencing, other reasoning) that support semantic applications Further discussion on this lifecycle can be found in http://www.semagix.com/documents/SA04VIBSemanticViz_000.pdf BSBQ Application Creation Metadata Extraction 2004 SEMAGIX All rights reserved.

Types of Ontologies (or things close to ontology) 4/5/2019 Upper ontologies: modeling of time, space, process, etc Broad-based or general purpose ontology/nomenclatures: Cyc, WordNet ; Domain-specific or Industry specific ontologies News: politics, sports, business, entertainment (also see TAP and SWETO) Financial Market Terrorism Biology: Open Biomedical Ontologies , GlycO; PropeO Clinical (See Open Clinical) GO (nomenclature), NCI (schema), UMLS (knowledgebase), … Application Specific and Task specific ontologies Anti-money laundering, NeedToKnow, (Employee or Vendor Whetting) Equity Research Repertoire Management CENTRAL ROLE OF ONTOLOGIES Ontology represents agreement, represents common terminology/nomenclature Ontology is populated with extensive domain knowledge or known facts/assertions Key enabler of semantic metadata extraction from all forms of content: unstructured text (and 150 file formats) semi-structured (HTML, XML) and structured data Ontology is in turn the center price that enables resolution of semantic heterogeneity semantic integration semantically correlating/associating objects and documents Large number of ontologies have been developed and many are in use Fundamentally different approaches in developing ontologies: schema vs populated; community efforts vs reusing knowledge sources 2004 SEMAGIX All rights reserved.

More sophisticated semantic technologies exploit ontologies and Evolution of Meta Data 4/5/2019 More sophisticated semantic technologies exploit ontologies and Provide scalability and flexibility Handle all types of data (unstructured, semi-structured, structured) Create SmartData – enhancing raw data with context and relationships Accommodate SmartQuerying – flexible, intelligent querying Enable powerful enterprise decision making See http://www.dmreview.com/article_sub.cfm?articleId=6962 (Semantic Meta Data For Enterprise Information Integration) Large scale metadata extraction and semantic annotation is possible. IBM WebFountain [Dill et al 2003] demonstrates the ability to annotate on a Web scale (i.e., over 2.5 billion pages), while Semagix Freedom related technology [Hammond et al 2002] demonstrates capabilities that work for a few million documents per day per server. However, the general trade-off of depth versus scale applies. Storage and manipulation of metadata for millions to hundreds of millions of content items requires database techniques with the challenge of improving performance and scale in presence of more complex structures 2004 SEMAGIX All rights reserved.

Semagix Semantic Enhancement Engine Automatic Semantic Matadata Extraction from unstructured data 4/5/2019 Semagix Semantic Enhancement Engine See Hammond, Sheth, Kochut 2002: Semantic Enhancement Engine: http://lsdis.cs.uga.edu/lib/download/HSK02-SEE.pdf [Hammond, Sheth, Kochut 2002] 2004 SEMAGIX All rights reserved.

Semantic Annotation/ Metadata Extraction + Enhancement 4/5/2019 Semantic Annotation/ Metadata Extraction + Enhancement Large scale metadata extraction and semantic annotation is possible. IBM WebFountain [Dill et al 2003] demonstrates the ability to annotate on a Web scale (i.e., over 2.5 billion pages), while Semagix Freedom related technology [Hammond et al 2002] demonstrates capabilities that work for a few million documents per day per server. However, the general trade-off of depth versus scale applies. Storage and manipulation of metadata for millions to hundreds of millions of content items requires database techniques with the challenge of improving performance and scale in presence of more complex structures. 2004 SEMAGIX All rights reserved.

Automatic Semantic Annotation 4/5/2019 Limited tagging (mostly syntactic) COMTEX Tagging Content ‘Enhancement’ Rich Semantic Metatagging Value-added Semagix Semantic Tagging Value-added relevant metatags added by Semagix to existing COMTEX tags: Private companies Type of company Industry affiliation Sector Exchange Company Execs Competitors © Semagix, Inc. 2004 SEMAGIX All rights reserved.

Semagix Freedom Architecture for building ontology-driven information system 4/5/2019 Further details of this technology can be found in Managing Semantic Content for the Web http://lsdis.cs.uga.edu/lib/download/S+2002-SCORE-IC.pdf 2004 SEMAGIX All rights reserved.

Global Bank 4/5/2019 Aim Legislation (PATRIOT ACT) requires banks to identify ‘who’ they are doing business with Problem Volume of internal and external data needed to be accessed Complex name matching and disambiguation criteria Requirement to ‘risk score’ certain attributes of this data Approach Creation of a ‘risk ontology’ populated from trusted sources (OFAC etc); Sophisticated entity disambiguation Semantic querying, Rules specification & processing Solution Rapid and accurate KYC checks Risk scoring of relationships allowing for prioritisation of results Full visibility of sources and trustworthiness Additional details can be found in http://www.semagix.com/documents/SemagixCIRASFinal_004.pdf 2004 SEMAGIX All rights reserved.

The Process Ahmed Yaseer: Watch list Organization Hamas FBI Watchlist 4/5/2019 Watch list Organization Company Hamas WorldCom FBI Watchlist Ahmed Yaseer: Appears on Watchlist ‘FBI’ Works for Company ‘WorldCom’ Member of organization ‘Hamas’ member of organization appears on Watchlist Ahmed Yaseer works for Company 2004 SEMAGIX All rights reserved.

Global Investment Bank 4/5/2019 Law Enforcement Public Records World Wide Web content BLOGS, RSS Watch Lists Regulators Semi-structured Government Data Un-structure text, Semi-structured Data User will be able to navigate the ontology using a number of different interfaces Establishing New Account Scores the entity based on the content and entity relationships (a) Serve global population of 500 users (B) Complete all source checks in 20 seconds or less © Integrate with enterprise single sign-on systems (d) Meet complex name matching and disambiguation criteria (e) Adhere to complex security requirements Results: Rapid, accurate KYC checks; Automatic audit trails; Reduction in in false positives; Streamlines and enhances due diligence of potential high risk accounts Example of Fraud Prevention application used in financial services 2004 SEMAGIX All rights reserved.

Law Enforcement Agency 4/5/2019 Aim Provision of an overarching intelligence system that provides a unified view of people and related information Problem Need to create unique entities from across multiple disparate, non-standardised databases; Requirement to disambiguate ‘dirty’ data Need to extract insight from unstructured text Approach Multiple database extractors to disambiguate data and form relevant relationships Modelling of behaviours/patterns within very large ontology (6Mn+ entities) Solution Merged and linked case data from multiple sources using effective identification, disambiguation, and link analysis Dynamic annotation of documents Single query across multiple datasets 360 view of an individual and relevant associations Requirements: (a) Merge and link case data from multiple sources to a taxonomy using effective identification, disambiguation, and analysis; (b) Ability to use pre-defined/investigation-specific case studies for search and match © Positive and negative searching of cases (d) Ability to explore case data starting from any entity via link analysis Results: Superior, faster identification of prolific offenders; Better prioritization of cases; Greater investigator productivity and effectiveness 2004 SEMAGIX All rights reserved.

Free text searching across aggregated information sources Profile Creation Complex Querying Summary of Results Investigation 4/5/2019 Gisondi, white ford expedition, main street, assault, traffic offences Free text searching across aggregated information sources 2004 SEMAGIX All rights reserved.

Profile Creation Complex Querying Summary of Results Investigation 4/5/2019 Unified view of direct and indirect results that best match the complex query and the profile 2004 SEMAGIX All rights reserved.

Direct and indirect relationship scoring driven by risk weightings Profile Creation Complex Querying Summary of Results Investigation 4/5/2019 Direct and indirect relationship scoring driven by risk weightings Aggregated knowledge from disparate sources Knowledge Annotation of known entities from within free text 2004 SEMAGIX All rights reserved.

Technical Capabilities 4/5/2019 Ontology-driven Information Systems Ontology Quality and Freshness trusted knowledge sources, weekly to daily update Populated Ontology Size millions of assertions; sometimes exceeding 10 million Data: Type and Amount structured, semi-structured, unstructured Metadata Extraction Automatic extraction, semantic metadata; Computation: query expressiveness (over metadata and ontology), rules, ranking Visualization Scalability and Performances main-memory vs database based 2004 SEMAGIX All rights reserved.

QUESTIONS? http://www.semagix.com 4/5/2019 A relevant article: http://68.236.189.240/article/stoy-20050401-05.html A relevant conference: 2005 SEMAGIX All rights reserved.

4/5/2019 2004 SEMAGIX All rights reserved.

4/5/2019 2004 SEMAGIX All rights reserved.

4/5/2019 2004 SEMAGIX All rights reserved.