Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic creation of concept map from unstructured text in a flective language Krunoslav Žubrinić, PhD University of Dubrovnik Prof. Damir Kalpić, PhD.

Similar presentations


Presentation on theme: "Automatic creation of concept map from unstructured text in a flective language Krunoslav Žubrinić, PhD University of Dubrovnik Prof. Damir Kalpić, PhD."— Presentation transcript:

1 Automatic creation of concept map from unstructured text in a flective language Krunoslav Žubrinić, PhD University of Dubrovnik Prof. Damir Kalpić, PhD University of Zagreb Faculty of electrical engineering and computing

2 Contents Introduction –Motivation –Goal and hypothesis for research Procedure –Automatic creation of concept map –Creation of thesaurus –Method for creation of the concept map Results –”Gold standard” –Assessment of quality for the selected terms –Assessment of quality for the created concept maps –Usability assessment for the created concept maps Conclusion 2

3 Motivation Visualisation enables a structural view into a larger quantity of data in shorter time. Concept map is a tool for visualisation, successfully used in education and business. Creation of the concept map may often be a problem. –How to recognise important concepts and relationships –A mentor or an initial field map can help. –Concept map can be automatically created from the related documents. –Past research has yielded promising results. 3

4 Goal and hypothesis for research Research goal: –Design and verify a new method for automatic creation of concept map from an unstructured textual document in Croatian, construct a prototype and evaluate the achieved results. Research hypothesis: –From unstructured text in Croatian, using an automatic procedure, a concept map can be created to represent the key elements of the original text. –The hypothesis has been formulated based on insight in results of similar research in creation of concept maps in other languages. 4

5 Automatic creation of concept map 5

6 Creation of thesaurus 6 2. 3. 4. 1.

7 Creation of the thesaurus skeleton 1.Selection of terms: term frequency–inverse document frequency (TF-IDF) 2.Determination of connections: –Apriori algorithm (link of similarity of RT (related term) terms to the terms of the thesaurus seed) –Links within the thesaurus seed (RT, hierarchical links BT/NT (broader term/narrower term), links USE/UF (use/use for)) –Links within WordNet (hierarchical links BT/NT, links USE/UF) 3.Determination of concepts: –Links USE/UF (concept name=USE, terms included in the concept=UF) –Concept weights are calculated using the CF-IDF (concept frequency–inverse document frequency) measu re 7 Excerpt from the created thesaurus

8 Method for creation of the concept map 8 1.2. 3.

9 Method for creation of the concept map 9 4.

10 Detection of non-hierarchical links Based on syntagmatic relationships among words in key sentences. –A key sentence contains at least two concepts. –The subject (S), predicate (P) and object (O) within a sentence are in focus. 10 Rules for processing of simple sentences Normalisation of terms −The most frequent appearance is selected. Complex sentences −From each S-P-O set a proposition is created. −Incomplete sets are not considered. Ideal case In practice, more frequent

11 Method for creation of the concept map 11 4.

12 Detection of hierarchical links 1.Using of taxonomy in thesaurus 2.Lexical dispersion settings 3.Setting for distribution hypothesis 12 Conditional probabilities for mutual appearance of key concept pairs

13 Method for creation of the concept map 13 5.

14 Tree trimming to a given size Given size of the map is 5 concepts 14 T=3,7 T=1,04 T=0,4 T=0,44 weight=3,7+3,7+1,04=8,44 weight=3,7+0,4+0,44=4,54 1 connection 2 connections Unlinked concept   

15 15 Example for a created concept map Uredba o osnivanju Hrvatskoga športskog muzeja (sažetak) Vlada Republike Hrvatske na temelju Zakona o muzejima i Zakona o ustanovama osniva Hrvatski športski muzej kao javnu ustanovu od interesa za Republiku Hrvatsku. Muzej obavlja muzejsku djelatnost vezano uz područje fizičke kulture, tjelovježbe, športa i srodnih područja ljudskoga djelovanja sukladno Zakonu o muzejima. Financijska sredstva za obavljanje djelatnosti muzeja se osiguravaju u državnom proračunu Republike Hrvatske, a muzej može stjecati i vlastita sredstva. Muzejom upravlja ravnatelj koji se imenuje na temelju natječaja na vrijeme od četiri godine i može biti ponovno imenovan na istu dužnost. Danom osnivanja Muzej preuzima zatečenu imovinu, stvari, prava, novac i radnike ustrojstvene jedinice Hrvatskoga športskog muzeja Kineziološkog fakulteta u Zagrebu. Key terms hrvatski športski muzej; vlada republike hrvatske; javna ustanova; zakon o muzejima; ravnatelj

16 Evaluation of results With the described procedure 121 concept maps were created. Quality of all the created maps was evaluated. For evaluation, the so called ”gold standard” was used. 16 3. 1. 2.

17 17 Gold standard Key words and informative-indicative abstract of 121 documents. In preparation participated 12 individuals from the area of science and higher education. –Every document was processed by at least two persons. Evaluated was the sample quality of gold standard. –55 evaluators – everyone evaluated 4 documents = 220 grades. –Created abstracts and selected key words describe the source documents very well.

18 Evaluation of quality of selected key terms 18 Comparison of selected key terms from the text with gold standard key words. Comparison of 3 algorithms (TF-IDF, Apriori i KEA (Keyphrase Extraction Algorithm) with the referential one. Comparison of precision (P), recall (R) and their harmonic mean (F1) measure.

19 Evaluation of quality of the created concept maps Comparison of the map with document abstract. –Questionnaire with 5 Likert type assertions: The concept map contains the most important terms from the document. Concepts are connected correctly. Links among concepts are properly named. Hierarchy of concepts is correct. Concept map is useful for learning the document contents. Respondents evaluated each statement with grades 1-5. –6.854 respondents from science and higher education –538 (7,9%) respondents fully completed their questionnaires –Every respondent evaluated five different randomly chosen maps. –All together were collected 2.690 individual grades. –A single map was evaluated from 7 to 42 respondents. –Excluded from the analysis were the maps evaluated by less than 15 respondents. –115 maps and 2.625 single grades were left over. 19

20 Evaluation of quality of the created concept maps 20

21 Evaluation of quality of the created concept maps 21

22 Evaluation of quality of the created concept maps 22 Most frequent comments: –The quality of created concept maps varies. –Some maps are incomplete, unclear or hierarchically wrong. –Link names should be improved. –Quality maps should be constructed from one type of documents and then should some other type be attempted. –Promising, but still a lot to be done. Are created concept maps good enough for practical application? Connections between the characteristics of the original document, the observed characteristics of the respondents (gender, age and job position) and given grades was carried out using the χ 2 test. Main conclusions: –Length of the original document does not affect the achieved results. –Observed characteristics of the respondents (gender, age and job position) does not affect the results achieved.

23 Assessment of applicability The respondents had to find the answers to the posed questions using three kinds of materials: –The created concept map, abstract, source document. Three questions were selected to be answered using all the three kinds of materials. Two forms of questions: YES/NO expected answers, and multiple expected answers: –Correct answer – 100% of credits, partly correct answer 50% credits, wrong answer – 0 credits. Besides answering the questions, the respondents had to evaluate the difficulty to answer, using a 5 levels scale (0 – 1) (I could not find the answer=0; I could hardly find it=0.25; rather difficult to find=0.5; rather easy to find=0.75; very easy to find=1) Number of credits = correctness * difficulty to find the answer Example: a partly correct answer which was found rather easily: 0,5 * 0,75 = 0,375 23

24 Assessment of applicability Statistical significance of differences among the results by respondents after application of different materials was performed using the t-test for paired samples. The used materials and respondents’ characteristics have statistically significant impact on the results. 24 However, the difference is small and does not bear much practical value Worst results were achieved while using source documents. –Answers difficult to find due to the length of the source text? –Lack of concentration by respondents?

25 Assessment of applicability 25 Concept maps - Ease of information retrieval - Collected grades Abstracts - Ease of information retrieval - Collected grades Source documents - Ease of information retrieval - Collected grades ρ=0,538; P<0,001 ρ=0,573; P<0,001 ρ=0,542; P<0,001

26 Conclusion The following contribution has been achieved: –A new method to create concept maps from unstructured text in a flective (i.e. Croatian) language, combining statistical methods and machine learning methods with terms dictionary and linguistic tools and resources specific for the Croatian language. –New method to determine the hierarchy level of concepts in concept map based on links to other concepts and positions in the document where the concept is present. –Proposal for a semi-automatic procedure to create dictionary of terms in a problem domain to be used for recognition of concepts and concept map links. With application of this procedure dictionary of terms in a selected area was formed. The results achieved using the prototype confirm the stated hypothesis. Future work: –Algorithms improvement, implementation of critical processes, improvement of graphical results presentation and shortening of the processing time. 26


Download ppt "Automatic creation of concept map from unstructured text in a flective language Krunoslav Žubrinić, PhD University of Dubrovnik Prof. Damir Kalpić, PhD."

Similar presentations


Ads by Google