Contextual Text Cube Model and Aggregation Operator for Text OLAP 2015. 11.

Slides:



Advertisements
Similar presentations
Supervisor : Prof . Abbdolahzadeh
Advertisements

Chapter 5: Introduction to Information Retrieval
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Ch 4: Information Retrieval and Text Mining
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist.
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
Vector Space Model CS 652 Information Extraction and Integration.
Chapter 19: Information Retrieval
By N.Gopinath AP/CSE. Two common multi-dimensional schemas are 1. Star schema: Consists of a fact table with a single table for each dimension 2. Snowflake.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Business Intelligence System September 2013 BI.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
1 LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora Chien-Chung Huang Shui-Lung Chuang Lee-Feng Chien Presented by: Vu LONG.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Business Intelligence Software Business Intelligence A Software at your size.
Semantic Learning Instructor: Professor Cercone Razieh Niazi.
Querying Structured Text in an XML Database By Xuemei Luo.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
1 Data Warehouses BUAD/American University Data Warehouses.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21.
SDMX DATA STRUCTURE DEFINITION SDMX Training BANK INDONESIA SEPTEMBER 2015 YOGYAKARTA, INDONESIA.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Towards Contextual and Structural Relevance Feedback in XML Retrieval Lobna Hlaoua IRIT (Institut de Recherche en Informatique de Toulouse) Equipe SIG-RI.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments CIKM2004 Speaker : Yao-Min Huang Date : 2005/03/10.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
© Copyright 2008 STI INNSBRUCK A Semantic Model of Selective Dissemination of Information for Digital Libraries.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Business Intelligence Training Siemens Engineering Pakistan Zeeshan Shah December 07, 2009.
Term Weighting approaches in automatic text retrieval. Presented by Ehsan.
CS 157B: Database Management Systems II April 10 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
UOS Personalized Search Zhang Tao 장도. Zhang Tao Data Mining Contents Overview 1 The Outride Approach 2 The outride Personalized Search System 3 Testing.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic collaborative web caching Jean-Marc Pierson Lionel Brunie, David Coquil LISI, INSA de LYON
Supervisor : Prof . Abbdolahzadeh
Visual Information Retrieval
Introduction to Information Retrieval and Web Search
Discovering User Access Patterns on the World-Wide Web
Exploiting Synergy Between Ontologies and Recommender Systems
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Search Techniques and Advanced tools for Researchers
Data Warehouse and OLAP
Presented by: Prof. Ali Jaoua
Personal Assistants for the Web: An MIT Perspective
Topics discussed in this section:
Topics discussed in this section:
Web Mining Department of Computer Science and Engg.
Introduction of Week 9 Return assignment 5-2
Introduction to Information Retrieval
Enriching Taxonomies With Functional Domain Knowledge
Data Warehouse and OLAP
Presentation transcript:

Contextual Text Cube Model and Aggregation Operator for Text OLAP

Introduction  Big Text Data  The textual data integration in OLAP analysis and aggregating them to improve the decision-making is a challenge in Business Intelligence systems.  important to define new aggregation techniques appropriate to textual data.  Proposed technique  IR techniques together with OLAP to better analyze textual data and extract their semantics.

Text Warehouse 예

Context definition in Text Warehousing  Document context:  contextual factors of the analyzed documents  topics, domains, metadata such as title, author, date, and etc  User context:  user profile which is based on his query log containing the history of queries  user preferences for the analysis dimensions.

ConteXtual Text cube model: CXT-Cube  Document representation  Term vector  d =  Measure definition  Represents each document d by several vectors of weighted concepts

Text Measures  Measure definition  the vector of weighted concepts of a document d in a vector space dimension specific to Dim r and w ci is the weight assigned to the concept c i.  Example : vectors of dimensions: TOPIC (Dim z ), LOCATION (Dim l ), and TIME (Dim t )

Relevance Propagation  Step 1  compute the weights of the document terms which exist in the concept hierarchy by using the method Term Frequency Tf computed by the following formula: UML Software Engineering Computer Science

Relevance Propagation  Step 2  For each leaf node which has a weight not null, the weights of his ancestors are computed by the formula below: UML Software Engineering Computer Science

Relevance Propagation more important

Text-OLAP based IR

CXT-Cube  For CV data  Fact table: CV documents  Dimension tables  Semantic dimensions: TOPIC, LOCATION  TOPIC : hierarchy of domain skills, ex) data science – computer science - science  LOCATION : ex) city-region-country