Presentation is loading. Please wait.

Presentation is loading. Please wait.

Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.

Similar presentations


Presentation on theme: "Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid."— Presentation transcript:

1 Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid Joint Project of University of Maryland, Baltimore County Bowie State University Sponsored by Department Of Defense

2 OVERVIEW 1.The system takes text documents as its input 2.Performs semantic analysis on these documents 3.Generates useful ontology 4.Represents it graphically

3 GOAL To build an Ontology utilizing Statistical methods A small amount of user feedback Automation

4 Normalization Latent Semantic Indexing (SVD) Pre-processing Text Document Document Ontology Graph Construction GUI Architecture of DOE

5 INPUT Text documents

6 Pre-processing Read-in text file Extract meaningful terms Count their frequencies

7 Normalization Calculate weight of each term using W i,k = frequency i,k n k Σ frequency j,k j=1 Calculate weight of each term using W i,k = frequency i,k n k Σ frequency j,k

8 Normalization(contd) Calculate normalized weight using W i,k w (i,k) n k sqrt(Σ w 2 (j,k) ) j=1

9 Latent Semantic Indexing(LSI) Statistical method representing documents by statistically independent concepts Based on Singular Value Decomposition (SVD)

10 Singular Value Decomposition (SVD) A technique that decomposes a given matrix into three components – U, S and V.

11 SVD (contd) m x n term-document matrix A, of rank r, can be expressed as the product: A = U * S * V T U is m x r term matrix S is r x r diagonal matrix V is r x n document matrix

12 SVD (contd) Diagonal of S contains singular values of A in the descending order.

13 SVD (contd) A is formed from LSI as follows: A = U S * S S * V sT U S - derived from U removing all but the s columns S S - derived from S removing all but the largest s singular values V sT - derived from V T removing all but the s corresponding rows

14 SVD (contd) USUS V sT A m x n U m x r S r x r V T r x n S

15 Document Ontology Build Concept Nodes and Term Nodes using the document matrix (V) and term matrix (U).

16 Building concept nodes from term matrix(U) A concept node contains information about Concept name Terms that belong to that concept Respective weights of terms in that concept

17 Building concept nodes from term matrix(U) (contd) Naming convention: Generates automatically A hyphenated string of the five most high frequent terms in that concept

18 Building concept nodes from term matrix(U) (contd) A concept node represents a document Each column in U corresponds to a concept node

19 Building term nodes from term matrix(U) A term node contains information about Term name Concepts to which it belongs Its respective weight in each concept

20 Building term nodes from term matrix(U) (contd) Naming convention: Generates automatically Simply named using the term name

21 Building term nodes from term matrix(U) (contd) A term node represents a term Each row in U corresponds to a term node

22 Graph Construction A bipartite graph is constructed with concept nodes and term nodes A concept node is connected to all term nodes that belong to it. A term node is connected to all concept nodes to which it belongs.

23 Graph Construction (contd) Term 1 Concept 1 Concept 2 Term 2 Term 3 Term 4 Term 5

24 Graphical User Interface (GUI)

25 GUI (contd) GUI consists of Concepts list Terms list Display for bipartite graph Display for list of files in ontology

26

27 GUI To view terms related to a concept, user selects that concept from concepts list To view concepts related to a term, user selects that term from terms list

28

29 GUI (contd) To view only terms related to a specific concept: Select that concept from concepts list Select checkbox “Display Selected Ones Only” Result: GUI displays ONLY relations between selected terms and concepts

30 GUI (contd) To view only concepts related to a term: Select that term from terms list Select checkbox “Display Selected Ones Only” Result: GUI displays ONLY relations between selected terms and concepts

31

32 GUI (contd) To highlight relationship between a term and a concept: Select that term or concept from terms or concepts list Click on line connecting term and concept

33

34 GUI – File Operations New Open Save saveAs Close Exit

35

36 GUI – Ontology Updates Add Delete ChangeSVDThreshold changeConcThreshold foldInDoc defaultBuild

37

38 GUI – Ontology Updates Add: Click on Add Select file to be added from file chooser popup menu Choose whether to build now or not If yes document is added and displayed If no GUI remains unchanged

39

40

41

42 GUI – Ontology Updates Delete: Click on Delete Select file to be deleted from file chooser popup menu Choose whether to build now or not If yes document is deleted and displayed If no GUI remains unchanged

43

44

45

46 GUI – Ontology Updates changeSVDThreshold: SVDThreshold controls the largest s singular values that will be selected from S. Default value is 70% i.e. only the singular values higher than 70% of the highest singular value are selected User can change this default value

47

48

49 GUI – Ontology Updates changeConcThreshold: Controls the number of terms related to a concept based upon term weight Default value is 70% i.e. only the terms with weights higher than 70% of the highest term weight are selected User can change this default value

50

51

52 GUI – Ontology Modifications Rename Renames a selected concept DelTerm Deletes a selected term Undo Ignores last modification and returns to the previous state

53

54

55

56

57

58 Future Work To investigate less expensive methods for adding new documents: Fold-In SVD update

59 Future Work Fold-In: A method to add new document(s) to an existing ontology Uses the existing data in document addition process Less expensive process than the regular build method

60 Acknowledgements We express our appreciation to Department Of Defense University of Maryland, Baltimore County Advisors, Bowie State University


Download ppt "Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid."

Similar presentations


Ads by Google