Download presentation
Presentation is loading. Please wait.
Published byCamron Campbell Modified over 9 years ago
1
Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1
2
Introduction Author Martha Mendoza, Erwin Alegria, Manuel Maca, Carlos Cobos, Elizabeth Leon Location Information Technology Research Group(GTI), etc. Colombia Title Multidimensional analysis model for a document warehouse that includes textual measures Document Type Decision Support Systems 72(2015) 44-59 Date February 2015 2
3
Contents Abstract Analysis Model Proposed document warehouse model Multi-dimensional model Textual measures and aggregation function OLAP document visualization Conclusion Evaluation results 3
4
Abstract (1/2) 4 Motivation Business systems are increasingly required to handle substantial quantities of unstructured textual information. Problem To manage unstructured text data stored in data warehouses Approach The new multi-dimensional analysis model is proposed that includes textual measures as well as a topic hierarchy. The textual measures that associate the topics with the text documents are generated by Probabilistic Latent Semantic Analysis, while the hierarchy is created automatically using a clustering algorithm.
5
Abstract (2/2) 5 Result The model gained an increasing acceptance with use, while the visualization of the model was also well received by users. Contribution This paper proposes a multidimensional model that incorporates textual. The model allows documents to be queried using OLAP operations.
6
Proposed document warehouse model 6 Four main Processes ② ① ③ ④
7
Proposed document warehouse model 7 Topic Hierarchy Building ① Two algorithms process Cosme(step1) Modified IGBHSK (Iterative Global-Best Harmony Search K-means algorithm)
8
8 Topic Hierarchy Building ① Modified IGBHSK (Iterative Global-Best Harmony Search K-means algorithm) : Three levels Proposed document warehouse model
9
9 Topic Hierarchy Building ① IGBHSK algorithm[Ref.#2] for Topic hierarchy Proposed document warehouse model
10
10 Probabilistic measures calculation ②
11
11 Proposed document warehouse model
12
12 ETL(Extract-Transform-Load) ③
13
Multi-dimensional model 13 Relational DB Schema
14
Multi-dimensional model 14 Standard dimensions Document dimension : name, document type Author dimension : name, email Date dimension : publish date Location dimension : city, country Word dimension : all words from the stored document set Topic dimension : Topic hierarchy M-M relationships Author-Group Bridge, Topic-Document-Group Bridge, Topic-Word-Group Bridge Measures of the fact table and the topic and word dimension bridge tables Topics_Probab_TM : A average Probability of Topics Documents_TM : Probabilities of a Document within topics Word_Probab_TM : Probabilities of a word within topics
15
Proposed document warehouse model 15 Multidimensional cube building ④
16
Textual measures and aggregation function 16
17
Textual measures and aggregation function 17
18
Textual measures and aggregation function 18
19
OLAP document visualization 19 Topics_Probab_TM : Document dimension - Type of Document
20
OLAP document visualization 20 Topics_Probab_TM : Date Dimension - year
21
OLAP document visualization 21 Topics_Probab_TM : Document type(rows) and year attribute(columns)
22
OLAP document visualization 22 Topics_Probab_TM : Attribute of year and Document type Slice – “Journal Article”
23
OLAP document visualization 23 Topics_Probab_TM : Attribute of year and Document type and author name Dice operation
24
OLAP document visualization 24 Document_TM : each Topic and Document
25
OLAP document visualization 25 Document_TM : each Topic and year and Document
26
Conclusion - Evaluation results 26 Execution time results
27
Conclusion - Evaluation results 27 Execution time results
28
Conclusion - Evaluation results 28 User satisfaction results Statistical frequency analysis
29
Conclusion - Evaluation results 29 User satisfaction results Multivariate analysis
30
Thank you 30
31
Proposed document warehouse model 31 Results Cosme : XML file(Metadata)
32
Proposed document warehouse model 32 Result IGBHSK
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.