Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Similar presentations


Presentation on theme: "Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1."— Presentation transcript:

1 Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

2 Introduction  Author  Martha Mendoza, Erwin Alegria, Manuel Maca, Carlos Cobos, Elizabeth Leon  Location  Information Technology Research Group(GTI), etc. Colombia  Title  Multidimensional analysis model for a document warehouse that includes textual measures  Document Type  Decision Support Systems 72(2015) 44-59  Date  February 2015 2

3 Contents  Abstract  Analysis Model  Proposed document warehouse model  Multi-dimensional model  Textual measures and aggregation function  OLAP document visualization  Conclusion  Evaluation results 3

4 Abstract (1/2) 4  Motivation  Business systems are increasingly required to handle substantial quantities of unstructured textual information.  Problem  To manage unstructured text data stored in data warehouses  Approach  The new multi-dimensional analysis model is proposed that includes textual measures as well as a topic hierarchy.  The textual measures that associate the topics with the text documents are generated by Probabilistic Latent Semantic Analysis, while the hierarchy is created automatically using a clustering algorithm.

5 Abstract (2/2) 5  Result  The model gained an increasing acceptance with use, while the visualization of the model was also well received by users.  Contribution  This paper proposes a multidimensional model that incorporates textual.  The model allows documents to be queried using OLAP operations.

6 Proposed document warehouse model 6  Four main Processes ② ① ③ ④

7 Proposed document warehouse model 7  Topic Hierarchy Building ①  Two algorithms process  Cosme(step1)  Modified IGBHSK (Iterative Global-Best Harmony Search K-means algorithm)

8 8  Topic Hierarchy Building ①  Modified IGBHSK (Iterative Global-Best Harmony Search K-means algorithm) : Three levels Proposed document warehouse model

9 9  Topic Hierarchy Building ①  IGBHSK algorithm[Ref.#2] for Topic hierarchy Proposed document warehouse model

10 10  Probabilistic measures calculation ②

11 11 Proposed document warehouse model

12 12  ETL(Extract-Transform-Load) ③

13 Multi-dimensional model 13  Relational DB Schema

14 Multi-dimensional model 14  Standard dimensions  Document dimension : name, document type  Author dimension : name, email  Date dimension : publish date  Location dimension : city, country  Word dimension : all words from the stored document set  Topic dimension : Topic hierarchy  M-M relationships  Author-Group Bridge, Topic-Document-Group Bridge, Topic-Word-Group Bridge  Measures of the fact table and the topic and word dimension bridge tables  Topics_Probab_TM : A average Probability of Topics  Documents_TM : Probabilities of a Document within topics  Word_Probab_TM : Probabilities of a word within topics

15 Proposed document warehouse model 15  Multidimensional cube building ④

16 Textual measures and aggregation function 16

17 Textual measures and aggregation function 17

18 Textual measures and aggregation function 18

19 OLAP document visualization 19  Topics_Probab_TM : Document dimension - Type of Document

20 OLAP document visualization 20  Topics_Probab_TM : Date Dimension - year

21 OLAP document visualization 21  Topics_Probab_TM : Document type(rows) and year attribute(columns)

22 OLAP document visualization 22  Topics_Probab_TM : Attribute of year and Document type  Slice – “Journal Article”

23 OLAP document visualization 23  Topics_Probab_TM : Attribute of year and Document type and author name  Dice operation

24 OLAP document visualization 24  Document_TM : each Topic and Document

25 OLAP document visualization 25  Document_TM : each Topic and year and Document

26 Conclusion - Evaluation results 26  Execution time results

27 Conclusion - Evaluation results 27  Execution time results

28 Conclusion - Evaluation results 28  User satisfaction results  Statistical frequency analysis

29 Conclusion - Evaluation results 29  User satisfaction results  Multivariate analysis

30 Thank you 30

31 Proposed document warehouse model 31  Results Cosme : XML file(Metadata)

32 Proposed document warehouse model 32  Result IGBHSK


Download ppt "Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1."

Similar presentations


Ads by Google