Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Slides:



Advertisements
Similar presentations
1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing  Introduction  Data warehousing and OLAP for data mining.
Advertisements

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Disasters and Human Factors Literature Nestor L Osorio Northern Illinois University.
A Robust System Architecture For Mining Semi-structured Data By Aby M Mathew CSE
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Microsoft SQL Server 2012 Analysis Services (SSAS) Reporting Services (SSRS)
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
OLAP OPERATIONS. OLAP ONLINE ANALYTICAL PROCESSING OLAP provides a user-friendly environment for Interactive data analysis. In the multidimensional model,
Chetan Bhirud Raza Mohammad Abinash Sahoo Online Marketing Giant.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
Introduction to Online Journal System (OJS) using Latin America Journals Online (LAMJOL) Sioux Cumming INASP.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
No. 1 Classification and clustering methods by probabilistic latent semantic indexing model A Short Course at Tamkang University Taipei, Taiwan, R.O.C.,
Module 1: Introduction to Data Warehousing and OLAP
Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –
BI Terminologies.
Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
SHIFALI CHOUBEY GISE LAB IITB Decision Support System For Farmers.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
No. 1 Knowledge Acquisition from Documents with both Fixed and Free Formats* Shigeich Hirasawa Department of Industrial and Management Systems Engineering.
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
1 On-Line Analytic Processing Warehousing Data Cubes.
ADVANCED TOPICS IN RELATIONAL DATABASES Spring 2011 Instructor: Hassan Khosravi.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
Centre of Competence on data warehouse Workshop Helsinki Database Cube and Browsing the Cube Mark Rantala.
Knowledge based Question Answering System Anurag Gautam Harshit Maheshwari.
What is OLAP?.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Evaluation of DBMiner By: Shu LIN Calin ANTON. Outline  Importing and managing data source  Data mining modules Summarizer Associator Classifier Predictor.
Title Authors Introduction Text, text, text, text, text, text Background Information Text, text, text, text, text, text Observations Text, text, text,
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Contextual Text Cube Model and Aggregation Operator for Text OLAP
SQL Server Analysis Services Understanding Unified Dimension Model (UDM)
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
A Decision Tree Approach to Cube Construction Patrick Kelly.
No. 1 Classification Methods for Documents with both Fixed and Free Formats by PLSI Model* 2004International Conference in Management Sciences and Decision.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 The Data Warehouse
Introduction to SQL Server Analysis Services
إعداد د/زينب عبد الحافظ أستاذ مساعد بقسم الاقتصاد المنزلي
Data Warehouse and OLAP
خشنه اتره اهورهه مزدا شيوۀ ارائه مقاله 17/10/1388.
RichAnnotator: Annotating rich (XML-like) documents
Motivation and Background
University of Houston-Clear Lake Kaiser Permanente San Jose
Motivation and Background
Magnet & /facet Zheng Liang
Chapter 13 The Data Warehouse
Title Introduction: Discussion & Conclusion: Methods & Results:
Title Goes Here Title Goes Here Title Goes Here Title Goes Here
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
<Add authors and affiliation>
Title Goes Here Title Goes Here Title Goes Here Title Goes Here
Data Warehouse and OLAP
Affiliation/ City/Country/
Presentation transcript:

Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML

Introduction  Author  Martha Mendoza, Erwin Alegria, Manuel Maca, Carlos Cobos, Elizabeth Leon  Location  Information Technology Research Group(GTI), etc. Colombia  Title  Multidimensional analysis model for a document warehouse that includes textual measures  Document Type  Decision Support Systems 72(2015)  Date  February

Contents  Abstract  Analysis Model  Proposed document warehouse model  Multi-dimensional model  Textual measures and aggregation function  OLAP document visualization  Conclusion  Evaluation results 3

Abstract (1/2) 4  Motivation  Business systems are increasingly required to handle substantial quantities of unstructured textual information.  Problem  To manage unstructured text data stored in data warehouses  Approach  The new multi-dimensional analysis model is proposed that includes textual measures as well as a topic hierarchy.  The textual measures that associate the topics with the text documents are generated by Probabilistic Latent Semantic Analysis, while the hierarchy is created automatically using a clustering algorithm.

Abstract (2/2) 5  Result  The model gained an increasing acceptance with use, while the visualization of the model was also well received by users.  Contribution  This paper proposes a multidimensional model that incorporates textual.  The model allows documents to be queried using OLAP operations.

Proposed document warehouse model 6  Four main Processes ② ① ③ ④

Proposed document warehouse model 7  Topic Hierarchy Building ①  Two algorithms process  Cosme(step1)  Modified IGBHSK (Iterative Global-Best Harmony Search K-means algorithm)

8  Topic Hierarchy Building ①  Modified IGBHSK (Iterative Global-Best Harmony Search K-means algorithm) : Three levels Proposed document warehouse model

9  Topic Hierarchy Building ①  IGBHSK algorithm[Ref.#2] for Topic hierarchy Proposed document warehouse model

10  Probabilistic measures calculation ②

11 Proposed document warehouse model

12  ETL(Extract-Transform-Load) ③

Multi-dimensional model 13  Relational DB Schema

Multi-dimensional model 14  Standard dimensions  Document dimension : name, document type  Author dimension : name,  Date dimension : publish date  Location dimension : city, country  Word dimension : all words from the stored document set  Topic dimension : Topic hierarchy  M-M relationships  Author-Group Bridge, Topic-Document-Group Bridge, Topic-Word-Group Bridge  Measures of the fact table and the topic and word dimension bridge tables  Topics_Probab_TM : A average Probability of Topics  Documents_TM : Probabilities of a Document within topics  Word_Probab_TM : Probabilities of a word within topics

Proposed document warehouse model 15  Multidimensional cube building ④

Textual measures and aggregation function 16

Textual measures and aggregation function 17

Textual measures and aggregation function 18

OLAP document visualization 19  Topics_Probab_TM : Document dimension - Type of Document

OLAP document visualization 20  Topics_Probab_TM : Date Dimension - year

OLAP document visualization 21  Topics_Probab_TM : Document type(rows) and year attribute(columns)

OLAP document visualization 22  Topics_Probab_TM : Attribute of year and Document type  Slice – “Journal Article”

OLAP document visualization 23  Topics_Probab_TM : Attribute of year and Document type and author name  Dice operation

OLAP document visualization 24  Document_TM : each Topic and Document

OLAP document visualization 25  Document_TM : each Topic and year and Document

Conclusion - Evaluation results 26  Execution time results

Conclusion - Evaluation results 27  Execution time results

Conclusion - Evaluation results 28  User satisfaction results  Statistical frequency analysis

Conclusion - Evaluation results 29  User satisfaction results  Multivariate analysis

Thank you 30

Proposed document warehouse model 31  Results Cosme : XML file(Metadata)

Proposed document warehouse model 32  Result IGBHSK