Mining Structured vs. Unstructured Data Where is the structure and where did the semantics go? Rahim Yaseen SAP Labs LLC.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Dr. Leo Obrst Information Semantics Command & Control Center July 17, 2007 Ontologies Can't Help Records Management Or Can They?
All Rights Reserved, Copyright © FUJITSU LABORATORIES LTD An approach to KNOW-WHO using RDF Nobuyuki Igata, Hiroshi Tsuda, Isamu Watanabe and Kunio.
XML: Extensible Markup Language
IMPLEMENTATION OF INFORMATION RETRIEVAL SYSTEMS VIA RDBMS.
The Relational Model and Relational Algebra Nothing is so practical as a good theory Kurt Lewin, 1945.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Architecture for Pattern- Base Management Systems Manolis TerrovitisPanos Vassiliadis National Technical Univ. of Athens, Dept. of Electrical and Computer.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
Information Retrieval
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
ICS-FORTH May 25, The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May.
DATABASE MANAGEMENT SYSTEMS BASIC CONCEPTS 1. What is a database? A database is a collection of data which can be used: alone, or alone, or combined /
Attribute Data in GIS Data in GIS are stored as features AND tabular info Tabular information can be associated with features OR Tabular data may NOT be.
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
Decision Support Systems Management Information Systems BUS 391 Barry Floyd.
Knowledge Management in Theory and Practice
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Querying Structured Text in an XML Database By Xuemei Luo.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Structure of IR Systems INST 734 Module 1 Doug Oard.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Metadata Schema for CERIF Andrei Lopatenko Vienna University of Technology
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Managing Semi-Structured Data. Is the web a database?
Knowledge Management in Theory and Practice
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
Visual Document Management Tool Richard Hammond EKM Team Leader U.S. EPA Region 4, Atlanta Kiran Batchu GeoDecisions
1 Instance Store Database Support for Reasoning over Individuals S Bechhofer, I Horrocks, D Turi. Instance Store - Database Support for Reasoning over.
Facilitating Document Annotation Using Content and Querying Value.
Ontology Technology applied to Catalogues Paul Kopp.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
What is a database? (a supplement, not a substitute for Chapter 1…) some slides copied/modified from text Collection of Data? Data vs. information Example:
WHIT 3.0 December 11, 2007 Christopher Pierce and Chimezie Ogbuji
Introduction Multimedia initial focus
XML QUESTIONS AND ANSWERS
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Associative Query Answering via Query Feature Similarity
Chapter 2 Database Environment Pearson Education © 2009.
Moderator: Dina Bitton, SAP Labs Panelists: Laura Haas, IBM
Developing a Data Model
International Marketing and Output Database Conference 2005
Spreadsheets, Modelling & Databases
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
Presentation transcript:

Mining Structured vs. Unstructured Data Where is the structure and where did the semantics go? Rahim Yaseen SAP Labs LLC.

 SAP AG 2006, xuPA Mid-Term Strategy/ Speaker Name / 2internal/confidential Why Mining works for structured data.. For relational data There is no separation of the semantic data model and the logical storage model Both are co-incident in a single data model and the data definition has limited semantics The semantics are captured in the richness of the queries which form well known associations based on expert knowledge of relationships in the data models Data Relational Data Model Queries Reports For relational databases, the data model represents a combination of the data representation specification and its storage as relational data. Sometimes, views can express alternate representational models that differ from the underlying tables structures. Rich semantics are usually expressed in queries and reports which have apriori knowledge of the data models

 SAP AG 2006, xuPA Mid-Term Strategy/ Speaker Name / 3internal/confidential What will it take to mine unstructured data? Why free (text) search is not the answer.. The data has no structural model for which meaningful semantics can be applied As a result, queries have limited semantics and are not rich enough to get the desired outcomes The limiting nature of ad hoc search (vs. the richness of pre-defined queries based on known structure/semantics) limits the relevance of the output Converting unstructured data to structured data is also not the answer.. Applying an ETL like technique to convert data to a structured form is limiting This does not guarantee that all the data of interest can be captured It provides for only a single (fixed) interpretation of such unstructured data Can overlaying a semantic model onto the data be the answer? Extract a semantic (meta) model of interest from the unstructured data Use the structure/semantics of this model to formulate rich search/query E.g., techniques used when searching and comparing products –Relevant attributes from product descriptions are extracted to form a model –These attributes are used to formulate rich searches/queries and comparisons

 SAP AG 2006, xuPA Mid-Term Strategy/ Speaker Name / 4internal/confidential Can Mining work for both structured/unstructured data? A separate logical data (meta) model distinct from the underlying storage model Extracted from the data in a non-intrusive fashion and captured as meta-data Single data representation model can map to multiple storage models Structure and semantics of meta-data help structure queries, search, reports Are embedded tags in the data a possible approach to define ontology structures? Is it feasible to extract such semantic models and can mining based on this perform? Data Multiple Storage Model Simple Semantic (Meta) Data Model Queries Reports Multiple storage models including; relational, XML, text, etc. Queries and Search that can leverage the structure of the data model to specify queries and search that are rich in semantics A simple semantic data representation model for modeling data (structured and unstructured). Meta-data based on ontologies is extracted from the underlying data. Multiple Storage Model Data Storage Model (s)