Marcos André Gonçalves Digital Library Research Laboratory

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Ontology Assessment – Proposed Framework and Methodology.
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Knowledge Acquisition and Modelling Concept Mapping.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
ISP 433/533 Week 2 IR Models.
From semantic networks, to ontologies, and concept maps: knowledge tools in digital libraries Marcos André Gonçalves Digital Library Research Laboratory.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
Chapter 2: Algorithm Discovery and Design
Overview of Search Engines
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Digital Libraries Lillian N. Cassel Spring A digital library An informal definition of a digital library is a managed collection of information,
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Visual Semantic Modeling of Digital Libraries Qinwei Zhu, Marcos André Gonçalves, Rao Shen, Edward A. Fox – Virginia Tech,, Blacksburg, VA, USA Lillian.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Chapter 7 Part II Structuring System Process Requirements MIS 215 System Analysis and Design.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
Language = Syntax + Semantics + Vocabulary
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Conceptualizing the research world
Search Engine Architecture
Course Outcomes of Object Oriented Modeling Design (17630,C604)
SysML v2 Formalism: Requirements & Benefits
Unified Modeling Language
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
9/22/2018.
المپیاد علمی دانشجویان پزشکی
Multimedia Information Retrieval
Chapter 2 Database Environment.
MANAGING DATA RESOURCES
Introduction Artificial Intelligent.
File Systems and Databases
Attributes and Values Describing Entities.
MPEG-7 Video Retrieval using Bayesian Networks
ece 627 intelligent web: ontology and beyond
KNOWLEDGE REPRESENTATION
Analysis models and design models
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Database Systems Instructor Name: Lecture-3.
An Introduction to Software Architecture
Magnet & /facet Zheng Liang
Introduction to Information Retrieval
Lecture 8 Information Retrieval Introduction
Algorithms and Problem Solving
Chaitali Gupta, Madhusudhan Govindaraju
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Information Retrieval and Web Design
UML Design for an Automated Registration System
Presentation transcript:

From semantic networks, to ontologies, and concept maps: knowledge tools in digital libraries Marcos André Gonçalves Digital Library Research Laboratory Virginia Tech

Outline Introduction Semantic Networks in Information Retrieval The MARIAN system Digital Library Ontologies Concepts maps: knowledge representation and visualization in DLs

Introduction Experiment how new knowledge representation tools can be used in Digital Libraries Semantic networks Representation, retrieval and inference of DL constructs and relationships Ontologies Formalize, model and generate DLs Concept Maps Visualization tool Supporting collaborative work Transforming information to knowledge creation

Outline Introduction Semantic Networks in Information Retrieval The MARIAN system Digital Library Ontologies Concepts maps: knowledge representation and visualization in DLs

Semantic Networks in DLs: MARIAN Motivation Support rich DL information services which are: Extensible Tailorable Support large, diverse collections of digital objectives which: have complex internal structures are in complex relationships with each other and with other non-library objects such as persons, institutions, and events

Design choices Design choices Objective Examples of use Semantic networks Basic, unified representation of digital library structures Document and metadata structure; hierarchical relationships of classification systems; concept maps Weighting schemes Support IR operations and services; quantitative representation of qualitative properties (similarity, uncertainty, quality) Weighted links representing indexes; multi-field, multi-word, fusion of weighted IR sets; degree of similarity among concepts in different ontologies Object oriented class system Provide common behavior, extensibility, and opportunity for improved performance Shared methods for matching different types of nodes (terms, controlled, free texts) and link topologies; multilingual support and common presentation methods Lazy evaluation Performance; management of large collections Reduced number of search results; enhanced merging algorithms for weighted sets of searching results

Design choices: semantic networks Represent knowledge in patterns of interconnected nodes Graph representation to express knowledge or to support automated systems for reasoning Sowa’s classification: Definitional networks Inheritance hierarchies Assertional networks Assert propositions Implicational networks Implication as the primary relation Executable networks Mechanism to pass messages (tokens, weights) Learning networks Modify internal representations (weights, structure) Ability to measure similarity Hybrid networks

Design choices: MARIAN semantic network occursInAuthor hasAuthor Person term ETD Metadata hasAbstract occursInAbstract id Abstract term hasSubject Subject term occursInAbstract describes ETD Doc hasSection hasParagraph hasChapter term id Section Paragraph occursInSubject Chapter cites term Section Paragraph Paper occursInParagraph … id … term

MARIAN API (Main) ClassMgr termClassMgr nodeClassMgr linkClassMgr unwtdLink wtdLink nGram ClassMgr ClassMgr ClassMgr EnglishRoot SpanishRoot TextClassMgr has* occursIn* ClassMgr ClassMgr ClassMgr ClassMgr controlledText EnglishText SpanishText ChineseText ClassMgr ClassMgr ClassMgr ClassMgr

Architecture and Implementation (cont.) The Search layer Mapping from abstract object description to weighted set of objects Types of search Link activation Search in context Searchers OO search engines Based on fusion Examples: maximizing union searcher, summative union searcher Supported by Tables: short-term memory of elements seen to date, checking each new element to keep or discard Sequencers: take a set of incoming streams of weighted sets and produce single output. Exs: PriQueueSequencer, MergeSequencer.

Architecture and Implementation (cont.) The Search layer 1 OccursIn Abstract Searcher Parser (Morphological matcher) occursInAbstract Digital #2006:60812 Library Abstract #2006:42369 hasTitle E. A . Fox query {#6029:65655:1.00, #6029:989:0.74, … } {#6029:3000:0.85, #6029:65655:0.8 … } #2007:74667 1 2 hasAdvisor Advisor OccursIn Advisor Searcher Summative Union Searcher occursInAdvisor {#6015:65655:0.90, #6015:3000:0.425 #6015:989:0.37, … } 3 {#6031:45634:1.0, #6031:5678:0.9, … } 2 4 4 Summative Union Searcher hasAdvisor Searcher {#6000:856:0.90, #6000:7890:0425, … } hasAbstract Searcher {#6000:54544:1.0, #6000:2987:0.9 #6000:003:0.74, … } 5 5 Final result set 6

Future Work Testing of: Supporting richer networks of relationships Efficiency OO class-model vs. instance level semantic network Lazy evaluation Tables and sequencers Effectiveness with: Structured documents and metadata Fulltext Supporting richer networks of relationships Citation linking Multi-language term relationships

Future Work Support for other types of networks and graph-based digital objects and structures Belief networks Topic/Concept maps Ontologies, classification schemes Supporting multimedia retrieval Supporting for CLIR

Outline Introduction Semantic Networks in Information Retrieval The MARIAN system Digital Library Ontologies Concepts maps: knowledge representation and visualization in DLs

Ontologies for DLs Motivation DLs are an ill-understood phenomena Lack of formal models for DLs Ad-hoc development, interoperability Formal Ontologies for DLs specify relevant concepts – the types of things and their properties – and the semantics relationships that exist between those concepts in a particular domain. use a language with a mathematically well-defined syntax and semantics to describe such concepts, properties, and relationships precisely

5S Model (informally) Digital libraries are complex information systems that: help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)

5S Model Models Examples Objectives Stream Structures Spatial Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata; organization tools Specifies organizational aspects of the DL content Spatial Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending, Details the behavior of DL services Societies Service managers, learners, Teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

5S Model: Mathematical formal theory for DLs Definition Streams Sequences of elements of an arbitrary type Structures Labeled directed graphs Spatial Sets and operations on those sets Scenarios sequences of events that modify states of a computation in order to accomplish some functional requirement. Societies Sets of communities and relationships among them

5S streams structures spaces scenarios societies services digital measurable, measure, probability, vector, topological spaces relation tuple sequence graph sequence state function event 5S grammar streams structures spaces scenarios societies services structured stream structural metadata specification descriptive metadata specification indexing service browsing service searching service digital object hypertext metadata catalog transmission collection digital library (minimal) repository

Ontologies for DLs

Ontologies for DLs Realizations of the theory/ontology Meta-Model for a DL descriptive modeling language: 5SL (JCDL2002) Meta-Model for a DL Visual modeling Tool: 5SGraph (ECDL2003) Meta-Model for an XML Log Standard (ECDL2002, JCDL2003)

Realizations of the theory/ontology 5S Meta-Schema

Realizations of the theory/ontology 5SGraph Interface

Future Work Semantic relationships Taxonomy of services Only “syntactic” ones were defined Constraints and dependencies (in form of axioms) Taxonomy of services Composability, Extensibility Formal definitions of properties of DL models/architectures and proofs Completeness Soundness Equivalence

Outline Introduction Semantic Networks in Information Retrieval The MARIAN system Digital Library Ontologies Concepts maps: knowledge representation and visualization in DLs

Concepts maps: knowledge representation and visualization in DLs Challenges in Visual Interfaces for DLs (Chen & Borner) Supporting collaborative work Transforming information to knowledge creation Hypothesis: Concepts maps can serve as a uniform visual abstraction to provide solutions for these problems. Design of useful visual interface to access, understand, and manage DL content has become an active and challenging field of study. Chaomei Chen and Katy Borner has listed top 10 problems in VI to DLs. One of them is about supporting Collaborative work. Supporting collaborative work is a challenging task in its own right. Translation of collaborative work into a visual process entails overcoming a variety of obstacles. The 2nd motivation is from the need to transform information access to knowledge creation in DLs. Instead of serving as information providers, digital libraries could become knowledge repositories by effectively categorizing, analyzing, and organizing their contents [2]. The third motivation is not like the above 2. It is not from DL user needs, but from DL designers’ needs. It is related to modeling and building of DLs. The current interest from non-experts who wish to build digital libraries (DLs) is strong worldwide. However, since DLs are complex systems, it usually takes considerable time and effort to create and tailor a DL to satisfy specific needs and requirements of target communities/societies. What is needed is a simplified modeling process and rapid generation of DLs. To enable this, DLs can be modeled with descriptive domain-specific languages. A visual tool would be helpful to non-experts so they may model a DL without knowing the theoretical foundations and the syntactic details of the descriptive language. To address those problems, we propose concept mapping tools.

What are concept maps Concept maps are a valuable pedagogical tool Concept maps are tools for organizing and representing knowledge. They include concepts, usually enclosed in circles or boxes of some type, and relationships between concepts or propositions, indicated by a connecting line between two concepts. Another characteristic of concept maps is that the concepts are represented in a hierarchical fashion with the most inclusive, most general concepts at the top of the map and the more specific, less general concepts arranged hierarchically below. Another important characteristic of concept maps is the inclusion of "cross-links." These are relationships (propositions) between concepts in different domains of the concept map.  

Applications: Knowledge organization and creation Collaborative learning GetSmart Experience (JCDL2003) Domain summarization Browsing tool You may already seen the demo of GetSmart, or the long paper presented by Byron Marshall about GetSmart. GetSmart is a DL with support of Cmapping.

Knowledge Repository DL Data Information provider information It is generally agreed by IT practitioners that there exits a continuum of data, information, and knowledge within any enterprise. Data are mostly structured, factual, and often numeric. Information is factual, but unstructured, and in many cases textual. Knowledge is inferential, abstract, and is needed to support business decisions. In addition to the IT view of the data-information-knowledge continuum, other researchers have taken a more academic view. For example, information scientists consider taxonomies, subject headings, and classification schemes as representations of knowledge. Artificial intelligence researchers have long been seeking such ways to represent human knowledge as semantic nets, logic, production systems, and frames. Knowledge derives from information just as information derives from data. For digital library researchers, there is a clear need to transform information access to knowledge creation. Instead of serving as information providers, DLs could become knowledge repositories by effectively categorizing, analyzing, and organizing their contents.

GetSmart Experience (Cont.) Collaborative learning: Group maps When students met to discuss a textbook chapter, each had previously prepared and submitted for grading their individual map. This ensured that they stayed on schedule and kept current with the reading schedule. Then, in class, they joined with 3-5 others to discuss their maps and to prepare a group map.

GetSmart Experience (Cont.) Summarization tool Most of the concept maps the students created using GetSmart were related to the textbooks. Each student complete a concept map using GetSmart from home, for each chapter assigned.

Summarization tool Supplement to document abstracts both for one language and across language ----pilot experiment Group 1(14) Group 2 (14) English papers Original abstract concept map Spanish papers Original abstract plus translated version Original abstract plus machine translated version plus translated concept map We selected 8 short research papers, four in English and four in Spanish. The four English papers were on the topic of digital libraries, and the four Spanish papers were about distance learning. In order to test the hypothesis that CMs can be useful supplement to an abstract, both for one language and across language, we concocted an experiment. Q1 and Q2 are related to the 4 English papers, Q3 and Q4 are related to 4 Spanish papers. The subjects were divided into two groups, 14 in each. Group 1 was only given abstracts, while Group 2 was given both abstracts and concept maps. Subjects were given 4 questions, one at a time, and asked to rank the 4 papers from most to least relevant to that question. For the Spanish papers, all subjects were given the original Spanish language abstract and a machine translation (provided by Altavista’s Babel Fish Translation Service [1]). The subjects in Group 2 also were given concept maps of the original documents, with the nodes and links translated into English. In order to avoid learning and/or fatigue effects, half of each group was first given the English papers, and the other half was first given the Spanish papers. The amount of time a subject took to answer each question was recorded on a stopwatch. For groups 1 and 2, the amount of time spent answering each question was recorded. The time taken for each question, and the total time, was roughly the same for both groups. In particular, the differences were not enough to reject the assumption that the means are equal (p=0.05). The rankings of the papers’ relevance were compared against the expert ranking. Distance for expert ranking, ideally, 0. The students in both groups were asked to rate the perceived effectiveness of the abstracts and concept maps on a 5-point Likert scale. For Group 2, on the English papers, the participants reported that the concept maps were significantly more helpful than the abstracts (p=0.022). For Group 2, on the Spanish papers, the participants reported that the concept maps were significantly more helpful than the abstracts (p=0.001).

Summarization tool (Cont.) Pilot experiment results Group 1(14) average Group 2 (14) average P-value Q1 (English) 1.6631 1.3839 0.527 Q2 (English) 1.6599 1.1310 0.185 Q3 (Spanish) 1.7085 1.1039 0.209 Q4 (Spanish) 1.6815 0.9831 0.030 * Likert (English) N/A 3.6, 4.4 0.022 * 2.7, 4.3 0.001 * We selected 8 short research papers, four in English and four in Spanish. The four English papers were on the topic of digital libraries, and the four Spanish papers were about distance learning. In order to test the hypothesis that CMs can be useful supplement to an abstract, both for one language and across language, we concocted an experiment. Q1 and Q2 are related to the 4 English papers, Q3 and Q4 are related to 4 Spanish papers. The subjects were divided into two groups, 14 in each. Group 1 was only given abstracts, while Group 2 was given both abstracts and concept maps. Subjects were given 4 questions, one at a time, and asked to rank the 4 papers from most to least relevant to that question. For the Spanish papers, all subjects were given the original Spanish language abstract and a machine translation (provided by Altavista’s Babel Fish Translation Service [1]). The subjects in Group 2 also were given concept maps of the original documents, with the nodes and links translated into English. In order to avoid learning and/or fatigue effects, half of each group was first given the English papers, and the other half was first given the Spanish papers. The amount of time a subject took to answer each question was recorded on a stopwatch. For groups 1 and 2, the amount of time spent answering each question was recorded. The time taken for each question, and the total time, was roughly the same for both groups. In particular, the differences were not enough to reject the assumption that the means are equal (p=0.05). The rankings of the papers’ relevance were compared against the expert ranking. Distance for expert ranking, ideally, 0. The students in both groups were asked to rate the perceived effectiveness of the abstracts and concept maps on a 5-point Likert scale. For Group 2, on the English papers, the participants reported that the concept maps were significantly more helpful than the abstracts (p=0.022). For Group 2, on the Spanish papers, the participants reported that the concept maps were significantly more helpful than the abstracts (p=0.001).

Automatic generation Motivation: Automatic concept map is tedious and time-consuming Novices will draw flawed or overly simplistic map Maintain uniformity Technique Term co-occurrence (Gaines & Shaw) Experience shows that drawing concept maps is tedious and time-consuming for domain experts, and that novices will often draw flawed or overly simplistic concept maps. Essentially we need a way to draw concept maps in large numbers while maintaining uniformity, which requires that they be produced automatically. Based on the results of this experiment, we were convinced that concept maps could, at least in theory, be a useful tool for knowledge discovery. We then began work on generating them automatically from text, for both English and Spanish documents. The first work involving the automatic generation of concept maps was by Gaines and Shaw [4]. Their system, called “GNOSIS”, produced concept maps based on term co-occurrence. Our research in automatically generating concept maps has centered on using term co-occurrence. Figure 1 shows an example. The most commonly occurring terms (excluding stopwords) are shown linked to each other. The stronger the connection between the words, the shorter the link will be. Here is an automatically generated concept map of a Spanish document. This one is of the Argentine constitution.

The most commonly occurring terms (excluding stopwords) are shown linked to each other. The stronger the connection between the words, the shorter the link will be. Graphviz from AT&T: Neato ---undirected Dot ---directed

Automatic generation (Cont.) Spanish documents Procedure: Determine part-of-speech for each word Collapse all inflected forms to root form Concatenate noun phrases into one “concept” Remove some stopwords, keep others for use in crosslinks We also have generated concept maps from Spanish documents. For Spanish, we have taken the automatic generation to the next level, which involves using surface natural language knowledge, such as part-of-speech tagging and combining inflected word-forms into one root word. we use nouns, adjectives, and short noun phrases as the nodes, and verbs and prepositions as the link text. Combining this part-of-speech information with the word order information from the original documents we have been able to generate labels for the links as well as the nodes. Figure 2 is an example.

Figure 2. An automatically generated concept map of a Spanish essay on “Cien Años de Soledad” by Gabriel García Márquez.

Since the text associated with nodes and links tends to be single words or short phrases, we have used our machine translator to create English concept maps from the Spanish ones. Figure 3 is the English translation of the previous Spanish concept map. Figure 4. An automatically generated and translated concept map. One problem with this scheme is that is it difficult to automatically determine the directionality of the links. In some cases (such as an ‘Is’ link), the link makes sense both ways, but in others (such as prepositions) it only makes sense in one direction. We are currently working on ways of making the direction of the links more meaningful.

Browsing tools Visual aid to navigate through complex collections of inter-related digital objects Support Multi-hierarchy browsing

We have long understood the idea of creating style sheets to control the formatting and layout of information. Topic Maps introduces the concept of creating style sheets to control knowledge-based information access and navigation. Given a resources collection with 2 classification schemes, develop cross links between 2 schemes and construct an globe XTM include the 2 schemes. The globe XTM can be display in node arc graph. Retrieved resources set is a subset of a given resources collection, which is related to some components of the globe XTM. Those components can construct an XTM which organizes the retrieved resources. User then can manipulate the XTM just like the students make concept maps using GetSmart. He can edit the structure of the graph, or add some selected resources to a category, or move some out. Given several resources collections with different knowledge schemes, an XTM will be dynamically generated according to the ontology of each collection. ) construct an XTM for the ontology of each collection ) generate cross links among those XTMs ) combine those XTMs and return to user in a node-arc graph

We have long understood the idea of creating style sheets to control the formatting and layout of information. Topic Maps introduces the concept of creating style sheets to control knowledge-based information access and navigation. Given a resources collection with 2 classification schemes, develop cross links between 2 schemes and construct an globe XTM include the 2 schemes. The globe XTM can be display in node arc graph. Retrieved resources set is a subset of a given resources collection, which is related to some components of the globe XTM. Those components can construct an XTM which organizes the retrieved resources. User then can manipulate the XTM just like the students make concept maps using GetSmart. He can edit the structure of the graph, or add some selected resources to a category, or move some out. Given several resources collections with different knowledge schemes, an XTM will be dynamically generated according to the ontology of each collection. ) construct an XTM for the ontology of each collection ) generate cross links among those XTMs ) combine those XTMs and return to user in a node-arc graph

Concept Maps’ supports for DL (cont.) Browsing and searching assistant Through our experience with GetSmart project, and our current work on the automatic generation of CM, we found that CMs as visualization tools have great potential for DL applications.

Future Work Improve the quality of automatic created concept maps Create repository of maps Provide services over the repository

Thank you