Connecting Knowledge Silos using Federated Text Mining Guy Singh Senior Manager, Product & Strategic Alliances ©2014 Linguamatics Ltd.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…
Chapter 10: Designing Databases
Resource Navigator Discovering, delivering and managing your information resources.
COMOS Mobile Solutions 1.0 Simplified global collaboration
Chapter 13 The Data Warehouse
© Paradigm Publishing, Inc Access 2010 Level 2 Unit 2Advanced Reports, Access Tools, and Customizing Access Chapter 8Integrating Access Data.
C6 Databases.
Virtualization in Bizagi is a data-level integration mechanism t hat allows the Process data model to connect t o external data sources. Connect Introduction.
Managing Data Resources
Xyleme A Dynamic Warehouse for XML Data of the Web.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Chapter 3 Database Management
Organizing Data & Information
Chapter 4: Managing Information Resources with Databases Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Chapter
Data Warehouse Components
1 Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies.
Bentley Geospatial Server. Value Proposition The Geospatial Server provides a secured centralized environment to contain the explosion of information.
Agenda 02/21/2013 Discuss exercise Answer questions in task #1 Put up your sample databases for tasks #2 and #3 Define ETL in more depth by the activities.
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
What Can Do for You! Fabian Christ
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 31 Slide 1 Service-centric Software Engineering 2.
Understanding Data Warehousing
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Fundamentals of Information Systems, Fifth Edition
Office Live Workspace Visio 2007 Outlook 2007 Groove 2007 Access 2007 Excel 2007 Word 2007.
Message Brokers and B2B Application Integration Chap 13 B2B Application Integration Sungchul Hong.
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
Database Essentials. Key Terms Big Data Describes a dataset that cannot be stored or processed using traditional database software. Examples: Google search.
Fundamentals of Information Systems, Seventh Edition 1 Chapter 3 Data Centers, and Business Intelligence.
Nate Trail Network Development & MARC Standards Office 8/1/2006 With help from Sydney Olive How to Build, Display and Find METS Objects.
Announcements. Data Management Chapter 12 Traditional File Approach  Structure Field  Record  File  Fixed All records have common fields, and a field.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Chapter 4: Managing Information Resources with Databases Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Chapter
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
ITGS Databases.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Create Content Capture Content Review Content Edit Content Version Content Version Content Translate Content Translate Content Format Content Transform.
OLE Slide No. 1 Object Linking and Embedding H OLE H definition H add other information to documents H copy.
CMPE 226 Database Systems October 28 Class Meeting
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
CS 157B: Database Management Systems II April 10 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level 1 CustomerSoft ESP Contact Operations.
Biosimilar (Insulin) – Competitive Landscape and Market & Pipeline Analysis, 2016 DelveInsight’s, “Biosimilar (Insulin) – Competitive Landscape and Market. Request for sample of this research report:
VIVO architecture March 1, Major Components Vitro is a general-purpose Web-based application leveraging semantic standards VIVO is a customized.
Virtual Collections VIRTUAL COLLECTIONS LDI Architecture Meeting, Tuesday, July 19.
BUSINESS INTELLIGENCE. The new technology for understanding the past & predicting the future … BI is broad category of technologies that allows for gathering,
نمايندگي استان يزد. نمايندگي استان يزد طراحی کسب و کار الکترونیکی ارائه کننده : محسن افسر قره باغ.
WEB TESTING
Popular Database Management Systems
Analytics Warehouse P.J. Kelly.
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
Users and Administrators
Easily retrieve data from the Baan database
Fundamentals of Information Systems, Sixth Edition
Microsoft Office Illustrated
Data Warehouse.
Databases.
MANAGING DATA RESOURCES
ISI Web of Knowledge Early updates
The Celera Genome Browser: A Tool for Visualizing and Annotating the Human Genome
MANAGING DATA RESOURCES
AI Discovery Template IBM Cloud Architecture Center
Users and Administrators
Presentation transcript:

Connecting Knowledge Silos using Federated Text Mining Guy Singh Senior Manager, Product & Strategic Alliances ©2014 Linguamatics Ltd

Click to edit Master title style External Content Internal Content Structured, semi-structured or unstructured content Separate interfaces to access content Cannot query across the silos, or exchange content Data Silos ©2014 Linguamatics Ltd

Click to edit Master title style ©2014 Linguamatics Ltd Possible Approaches

Click to edit Master title style Integration using Workflow Tools If each data source has an API, can link together using specific tools for each data source Can program particular workflows pulling information together from different data sources Advantages –Can perform complex data manipulation –Can exploit structure in data sources, or use I2E to transform the unstructured data Disadvantages –Workflows are fixed: can’t easily navigate and explore connections between data ©2014 Linguamatics Ltd

Click to edit Master title style Connecting via Linked Data Transform databases to RDF or provide a conversion layer Advantages –Standardizes data format –Can exploit structure in structured data sources –Can use I2E to transform unstructured data into RDF –Can reason with the RDF Disadvantages –Transformations are fixed –Have to predict what information you need from the unstructured text typically pull out a small proportion of the original information ©2014 Linguamatics Ltd

Click to edit Master title style Integrate the data together into a data warehouse –Extract, Transform and Load each data source into a new database Advantages –Allows users to perform a single query across all the content –Can use I2E to pull information out of unstructured text –Can combine with human curation so warehouse contains checked content Disadvantages –ETL can be time consuming and expensive process –Lose information have to predict what information you need from the unstructured text –typically pull out a small proportion of the original information transformation of discrete fields can lose finer distinctions Using a Data Warehouses ©2014 Linguamatics Ltd

Click to edit Master title style Use I2E to make data available for search, navigation, linking –Keep data in original format without any data loss –I2E queries become the conversion layer, dynamically transforming data into the format we want when we need it –Ontologies convert between different identifiers, or different languages –Configurable: just change the queries Use other methods when require their strengths –RDF for reasoning with results –Workflow tools for complex data analysis and manipulation –Data warehouses for curated data Federated Text Mining for Data Silos ©2014 Linguamatics Ltd

Click to edit Master title style ©2014 Linguamatics Ltd Road to Federated Text Mining Federated Text Mining Data Normalization Merge Results Link the Content Servers

Click to edit Master title style 9 Data Normalisation – Virtual Indexes Pathology Reports Index Journal Abstracts Index Virtual Index

Click to edit Master title style 10 Data Normalisation – Document Structure Pathology Reports Journal Abstracts

Click to edit Master title style 11 Data Normalisation - Entities Journal Abstracts Pathology Reports Combined (Normalized)

Click to edit Master title style ©2014 Linguamatics Ltd Road to Federated Text Mining Federated Text Mining Data Normalization Merge Results Link the Content Servers

Click to edit Master title style I2E 4.1/4.2: Single Client, Multiple Results I2E Server 2 FDA Drug Labels I2E Server 1 Internal Documents external network internal network ©2014 Linguamatics Ltd Linked server

Click to edit Master title style ©2014 Linguamatics Ltd Road to Federated Text Mining Federated Text Mining Data Normalization Merge Results Link the Content Servers

Click to edit Master title style 15 Each Server supplying separate set of results Content Server 1 Content Server 2 Content Server 3 Content Server 4 Merge into a single set of results

Click to edit Master title style ©2014 Linguamatics Ltd Road to Federated Text Mining Federated Text Mining Data Normalization Merge Results Link the Content Servers

Click to edit Master title style I2E Federated Text Mining ©2014 Linguamatics Ltd17 © Linguamatics Confidential Connected Knowledge Extract and connect data in any format, wherever it resides