Presentation is loading. Please wait.

Presentation is loading. Please wait.

Technology of Semantic Structuring of the Digital Library Content I.Filozova JINR, Dubna JINR (DUBNA), MAY 18, 2012 III JINR/CERN School of Information.

Similar presentations


Presentation on theme: "Technology of Semantic Structuring of the Digital Library Content I.Filozova JINR, Dubna JINR (DUBNA), MAY 18, 2012 III JINR/CERN School of Information."— Presentation transcript:

1 Technology of Semantic Structuring of the Digital Library Content I.Filozova JINR, Dubna JINR (DUBNA), MAY 18, 2012 III JINR/CERN School of Information Technology on GRID and Advanced Information Systems

2 Contents Current Trends Problematic Situation Research Lines Realization Ideas QA-System on the Logic-Semantic Network Basis Summary

3 CURRENT TRENDS Traditional Publishing  Digital Archive-based approach; Accumulation by the scientific community the expansive digital information arrays → content integration on the metadata level → common Data and Information Spaces; The growth number of institutional repositories in the open access form. Repositories Number — 2 900 Records Number ~ 40,000,000 according to ROAR statistics (ROAR - http://roar.eprints.org)

4 HOW TO FIND

5 PROBLEMATIC SITUATION CREATION OF the EFFECTIVE MECHANISMS FOR the ANSWERS SEARCH TO QUESTIONS IN the DIGITAL INFORMATION FUNDS CREATION OF the EFFECTIVE MECHANISMS FOR the ANSWERS SEARCH TO QUESTIONS IN the DIGITAL INFORMATION FUNDS – ACTUAL PROBLEM FIND the INFORMATION ( INFORMATION SOURCE AND/OR INFORMATION ITSELF) QUESTION (V) ANSWERS SET (Q V ) MECHANISMS METHODS AND MECHANISMS FOR EFFECTIVE SEARCH (SEACRH TECHNOLOGY) DIGITAL INFORMATION FUND (INFORMATION SOURCERS) INFORMATION LAWS ? PERTINENCE (P) Q V = Q V R U Q V N P =

6 RESEARCH LINES (1) Development of the method and mechanism for effective search of the set of the relevant answers to the questions. (2) Technology development for the formation and support of the catalog service of the information fund for providing an efficient search of the answers to the questions. (3) Software development  cataloguer workstation for the structuring of the information fund.

7 REALIZATION IDEAS OF RESEARCH LINES

8 The method basis is a way to describe the scientific and technical information by set of logic-semantic networks Question-Answer-Reaction (LSN QAR). The basis for the search engine are: motion way along LSN, controlled by the user; choice of LSN nodes (questions or answers) based on an ontological model of user question. The basis of the technology is a way of the description of the subject domain by set of the logic-semantic networks "Question-Answer-Reaction. Mechanism of technology is a workstation of the cataloguer (developer LSN QAR)

9 Cognitive Function of the Question Question  a thought query as the interrogative sentence. Answer  a realization of the cognitive function of the question as a new obtained judgment. Question TO DEVELOP THE KNOWLEDGE (TO EXTEND, TO PRODUCE A NEW) TO REFINE THE KNOWLEDGE TO SUPPLEMENT THE KNOWLEDGE Cognitive Indeterminacy UNKNOWN KNOWN

10 Process of Asking Questions and Search Answers Ask Question Find Answer Set Adequacy Question - Answer Search Scope Conformity Rules Search Technology Answer Technology of Conformity Setting Technology of Question Asking The Object and Subject of Research Question Answer Datum Question

11 Formal Structure of Question and Answer The logical structure of the question (Q): QUESTION = {QUESTION THEME (QT), QUESTION CONTENT (QC), QUESTION VOLUME (QV)} The logical structure of the answer (A): ANSWER = {ANSWER THEME (AT), ANSWER CONTENT (AC), ANSWER VOLUME (AV)} The logical structure of the reaction (R): REACTION= {REACTION THEME (RT), REACTION CONTENT (RC), REACTION VOLUME (RV)}

12 Formal Relationship between Question and Answer Question and Answer forms a consistent system if: Question theme is identical with the answer theme; The answer content is not more than question content ( the number of key terms in the question is not less than the number of key terms in the answer and the intersection of the set of the question terms and a set of the answer terms is not empty ); The question volume is not less than the answer volume ( set of the answers to the question on the datum question more than a set of the answers in the search scope ).

13 Logic-Semantic Network Question-Answer-Reaction Logic-semantic network  a set of the questions, answers and relationships between them forming an uniform system. Question  query expressed in the question statement aimed at the development, refinement or supplement of the knowledge. Answer  a realization the cognitive function of the question in the form of the new obtained judgment. Answer must be built in accordance with the content and structure of the asked question. Only in this case, the answer is regarded as relevant. Reaction  a semantic description of the question and answer. Types of reactions: 1. Question Reaction  a description of the datum question (to understand the enviroment and causes of the question and to establish the semantic adequacy with the answer scope). 2. Answer Reaction  a description of the answer scope (to understand the question semantics and relationship with answer).

14 Reaction Example (1) Logical unit Question-Answer-Reaction: Question 1 (Q1). What is a JAVA? Question 1 Reaction 1 (QR11). With respect pronunciation formed two different standards - borrowed from the English / d ʒɑ :və / and traditional «Ява» (on russian), corresponding to the traditional pronunciation of the Java name island. Question 1 Reaction 2 (QR12). Java (Indonesian: Jawa) is an island of Indonesia with a population of 135 million. Square  132 000 k 2 … Question 1 Reaction 3 (QR13). S lide show, photo-collage with the views of Java island.

15 Reaction Example (2) Answer 1 to Question 1 (A11). Java – an object-oriented programming language developed by Sun Microsystems. Reaction 1 of the Answer 1 to the Question 1 (RA11). Why is the language called JAVA? There is a version that language got its name from coffee grown on the same island. As you know, this drink is hot like some programmers. Therefore, a cup of steaming coffee is displayed on logo.

16 Reaction Example (3) Reaction 2 of the Answer 1 to the Question 1 (R2A11). Sun Microsystems, Inc (now part of Oracle Corporation) — U.S. company that produces software and hardware… Answer 2 to Question 1. Java — not only the language itself, but also a platform for development and execution of the applications based on this language.

17 LSN Integrity  Set of "Question-Answer-Reaction" refers to a particular subject domain;  Set of "Question-Answer-Reaction" is hierarchically ordered by the principle "from general to particular";  Questions are placed on an odd level of the hierarchy; Answers  on an even level;  Questions on the i-th level are related only with the answers i +1-th level;  Questions i +1- th level may be related with the answers of i-th level;  Question i-th level semantically related with the answers i +1- th level, if it satisfies the criteria 'A' or 'B'. In the case of satisfying criteria 'A', there is a terminal vertex, and in the case of satisfaction criteria of 'B' from this answer are followed the questions i +2- th level;  Questions that explained by i = 2-level answers (partially or completely covering the subject domain) are placed on the i = 1-th level;  Questions that supplement and refine the i = 2 level answers are placed on the i = 3-rd level.

18 Graph LSN QAR A 22 A 23 Q 31 Q 32 Q 33 Q 34 A 41 A 42 A 43 1 2 3 4 5 6 7 8 9 10 11 12 13 1415 16 17 R 10 R 21 A 21 Q 10 R 23

19 Formal View of Subject Domain Subject domain  area of scientific and practical man activity, characterized by the object and subject of study. Subject of research  problems and tasks associated with the object. SD – Name of the subject domain, Tm i – Name of the i-th theme subject domain, LSNij – Name of the j-th LSN in the i-th theme. Then subject domain is presented as:

20 Navigation on LSN № edge Way 11,4,11 21,4,12 31,5,13 51,5,14 61,6,15 71,6,16 82,7,13 92,7,14 102,8,17 113,9,15 123,9,16 133,10,17 A 22 A 23 Q 31 Q 32 Q 33 Q 34 A 41 A 42 A 43 1 2 3 4 5 6 7 8 9 10 11 12 13 1415 16 17 R 10 R 21 A 21 Q 10 R 23

21 Analysis Method of Scientific Texts The document is studied by the expert in terms of: 1. Semantic matching title and content; 2. Set of filters: Filter 1 (F1) - General Part. F1 includes an analysis of the problem, its history, overview, topicality. Filter 2 (F2) - Author concept. F2 includes new terms introduced by the authors, traditional terms with the author's interpretation, the narrowing semantics. Filter 3 (F3) - Examples and illustrations. To clarify difficult places in the text, reduce the text size under stringent restrictions on the volume. Filter 4 (F4) - The idea of the author. Describes and explains the author's main idea. 3. Formulation of the basic questions, that corresponds to the text.

22 Example of the Scientific Text Markup Belaga V.V., Semchukov P.D., Stetsenko M.S. DEVELOPMENT OF THE ENGINE FOR MULTIMEDIA EDUCATIONAL SOFTWARE // “System analisys in the science and education”, International University nature, society and man «Dubna». — 2009, issue.2. — http://www.sanse.ru/archive/11 (on russian)

23 Markup Fragment Hypotheses: H1: О технологии проектирования программной оболочки (ТППО). H2: В ТППО отмечаются ее особенности, связанные со спецификой образовательных ресурсов. H3: В работе рассматривается структура программной оболочки и стадии ее реализации.... P5. F. Author idea S5. Данная работа имеет целью рассмотрение технической стороны задача создания мультимедийного образовательного продукт. Q1. В чем состоит цель данной работы? RQ1={P1#1;P2#2} A1. Цель работы - рассмотрение технической стороны процесса мультимедийного образовательного продукта P5. #5 RA1=P3#3;P4#4 P6. F. Examples and illustrations S6. Представлена модель процесса обучения по стандарту IEEE P1484. (рис. 1). P7. F. General part S7. Рассмотрены факторы, влияющие на качество образования.

24 Multilayer Related Set of Graphs

25 LSN + Visualisation topicality

26 Summary It’s proposed:  "Catalog Service" creation and support for the funds-corpuses,  Question-Answer Navigator creation that provides such features: - the ability of the refinement and deepening of the understanding the question meaning; - the ability of the refining, deepening, expansion of the knowledge or the obtaining a new knowledge during the answer to question search process. Realization of such "Catalog Service" and Navigator allows to study the DL content by the natural mode for the human: refinement, generalization and obtaining a new knowledge ̶ question-answer mode. The main problem of the proposed question-answer system is a maximal automation of the process of the creation and support of the fund "service catalog".

27 Even the most foolish idea can be implemented masterfully. Leszek Kumor Even the most foolish idea can be implemented masterfully. Leszek Kumor


Download ppt "Technology of Semantic Structuring of the Digital Library Content I.Filozova JINR, Dubna JINR (DUBNA), MAY 18, 2012 III JINR/CERN School of Information."

Similar presentations


Ads by Google