Knowledge Organization in Digital Libraries (II) Digital Libraries INFO 653 Week 6 Xia Lin College of Information Science and Technology Drexel University.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to.
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
Basics of Knowledge Management ICOM5047 – Design Project in Computer Engineering ECE Department J. Fernando Vega Riveros, Ph.D.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Project 1 Introduction to HTML.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
A Registry for controlled vocabularies at the Library of Congress
Copyright © 2006 Pearson Education, Inc. publishing as Benjamin Cummings. The Literature of Health Education Chapter 9.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
1st Project Introduction to HTML.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Chapter ONE Introduction to HTML.
Digital Library Architecture and Technology
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Thesaurusmanagement Quickstart Introduction. What are controlled vocabularies? organized arrangement of words and phrases used to index content and/or.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Automatic Subject Classification and Topic Specific Search Engines -- Research at KnowLib Anders Ardö and Koraljka Golub DELOS Workshop, Lund, 23 June.
Internet Research Fourth Edition Unit C. Internet Research – Illustrated, Fourth Edition 2 Internet Research: Unit C Browsing Subject Guides.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
XML DTDs and other Alternatives: Vocabulary Markup Language (Voc-ML) Project & Friends Joseph A. Busch Director, Solutions Architecture NetLab and Friends.
Organizing Internet Resources OCLC’s Internet Cataloging Project -- funded by the Department of Education -- from October 1, 1994 to March 31, 1996.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
OpenURL Link Resolvers 101
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
MD9.6 Release: Highlights Increased the character limit for all URL resources to 600 characters. Data_Center/Service_Provider Data_Set_Citation/Service_Citation.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
The Digital Library for Earth System Science: Contributing resources and collections GCCS Internship Orientation Holly Devaul 19 June 2003.
Building a Topic Map Repository Xia Lin Drexel University Philadelphia, PA Jian Qin Syracuse University Syracuse, NY * Presented at Knowledge Technologies.
Introduction to the Semantic Web and Linked Data
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Overviews of the Library of Texas & ZLOT Project Dr. William E. Moen Principal Investigator.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
Project 1 Introduction to HTML.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Taxonomies, Lexicons and Organizing Knowledge
DIGITAL LIBRARY.
Cataloging the Internet
PREMIS Tools and Services
Presentation transcript:

Knowledge Organization in Digital Libraries (II) Digital Libraries INFO 653 Week 6 Xia Lin College of Information Science and Technology Drexel University

Approaches: Keyword Indexing Metadata (bottom-up) Making search engines functional Metadata (bottom-up) Extending traditional subject indexing Classification (Top-down) Using a structured classification frame to provide hierarchical browsing and access. Ontology Approach

Keyword Indexing Highly automated process. Use every meaningful word to index documents. Make search engines functional Make large amount of information accessible.

MetaData Approach Digital Object Identifiers Dublin Core Subject tag Description tag RDF Data model Resource

Classification Approach Use Current Classification Scheme LC Classification Dewey Classification Most projects are not completed A mile wide an inch deep Use ad-hoc classification schemes Yahoo style hierarchical list Use automatic classification

Ontology Approach Ontologies Define not only concepts but also relationships of concepts. Define both links and types of links.

Ontology An ontology is a specification of a conceptualization. An ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. An ontology is a commitment to use the shared vocabulary in a coherent and consistent manner.

Work Force Digital Library Ontology Cases that worked Concepts (taxonomy and ontology) Lessons learned example-of example-of Workforce Programs represents describes Policy and regulation Documents refers-to Projects example-of Info Resources sponsors uses is-part-of describes Government refers-to example-of Guides, Handbooks Document initiates is-related-to write Describes includes Organizations People Presentations example-of sponsors Events (conferences, workshops, ...) Peter Creticos sponsors

Why Develop an Ontology? To enable a machine to use the knowledge in some application. To enable multiple machines to share their knowledge. To help yourself understand some area of knowledge better. To help other people understand some area of knowledge. To help people reach a consensus in their understanding of some area of knowledge.

Ontology and thesaurus Ontology inherits the ideas, purposes, and functions of the thesaurus. Ontology extends relationships among concepts beyond those in thesaurus (NT, BT, RT, Synonyms). Ontology intends to be consumed by both human and machine.

Topic Maps A key component of Semantic Web A new ISO standards ISO 13250 Topic Maps XML-like syntax XML Schema XTM: XML Topic Maps

XTM Topic MAPS XML Topic Maps(XTM) defines an abstract model and XML grammar for topic maps. XTM does not define topic maps at the implementation level. Each implementation may interpret XTM differently or define their own “metadata” with the framework of XTM.

TAO of Topic Maps <topicmap> </topicmap> TOPIC OCCURS topname basename dispname sortname OCCURS ASSOC assocrl facet fvalue addthms </topicmap>

Topic Maps for Knowledge Representation Establishing an associative network between resources which represent concepts Organizing legacy resources into a new information/knowledge space, by relating them to topics, and associating those topics, in a structured way Enabling disparate sets of information resources to be used together, by interrelating them using a unifying conceptual framework

Topic Map Implementation Why is topic map implementation hard? There are no “magic” solutions for content representation. It is labor-intensive and involves many manual activities to create a complete TAO. There are no good tools for topic map creation. XML is not designed to let end-users work directly on objects contained in a XML file.

Topic Maps and Thesaurus Different Directions of indexing Thesaurus: assign descriptors to documents Topic maps: associate occurrences to terms Different structures Thesaurus: mainly a hierarchy plus some cross-references Topic Maps: more link types

ALL Together – Libraries Keyword indexing Classification Thesaurus Metadata Knowledge Organizing Ontology XML RDF Topic Maps Semantic Web

Personal Research Projects Explore solutions to make knowledge organizing practical Knowledge Class KEPT Knowledge Middleware

Knowledge Class Purposes to customize knowledge organization and access, to supplement and complement existing devices for Web users, and to explore the possibility of combining existing methods of knowledge organization with advanced Web technology.

Knowledge Class Design Principles balance of browsing and searching balance of manual indexing and automatic indexing balance of personal (topical) information space and the whole web space

Knowledge Class Three components an organizing framework a dynamic web interface Search strategies for each term

Knowledge Class Features A hierarchical structure of subject terms constructed on classification principles Multiple levels of knowledge organization --Expandable and contractible branches of the hierarchy to allow varying levels of depths, Static links to remote resources and related sites or pages Dynamic links to target information through search engines such as Google, AltaVista, InfoSeek, Yohoo!, and Lycos, etc. Coded search strategies for terms Use of scope terms for classes and for branches

Knowledge Class Features Referral links among terms within a knowledge class and potentially among knowledge classes to assist cross reference. Instant switch among search engines available over the Web to allow access of a variety of resources covered by different search engines.

A Knowledge Class for Digital Libraries Developed by students two years ago

Yahoo Categories: References – Libraries – Digital Libraries: Cataloging Electronic Resources@ Conferences (5) Electronic Literature@ Electronic Theses and Dissertations (ETDs) (14) Metadata@ Organizations (2) Projects and Collections (33)

IFLA page: Resources and Projects Cataloguing & Indexing of Electronic Resources Electronic Text & Journal Archives Metadata Resources

Digital Libraries: a Selected Resource Guide Overview and general resources Project planning & management Architecture Technology Standards and guidelines Archiving & Preservation Metadata Intellectual property rights.

Northern Light folders Digital Libraries Special collections Conferences dlib.org dlib.org.ar uh.edu rutgers.edu stanford.edu stfx.ca vt.edu uni-trier.de ucla.edu Class notes & Assignments all others...

Digital libraries by William Y. Arms: Table of Contents 1 Libraries, Technology, and People 2 The Internet and the World Wide Web 3 Libraries and Publishers 4 Innovation and Research 5 People, Organizations, and Change 6 Economic and Legal Issues 7 Access Management and Security 8 User Interfaces and Usability 9 Text 10 Information Retrieval and Descriptive Metadata 11 Distributed Information Discovery 12 Object Models, Identifiers, and Structural Metadata 13 Repositories and Archives 14 Digital Libraries and Electronic Publishing Today

Practical Digital Libraries: Books, Bytes, and Bucks by Michael Lesk 1. Evolution of Libraries 2. Text Access Methods 3. Images of Pages 4. Multimedia Storage and Access 5. Knowledge Representation Methods 6 Distribution 7 Usability and Retrieval Evaluation 8 Collections and Preservation 9 Economics 10 Intellectual Property Rights 11 International Activities 12 Future: Ubiquity, Diversity, Creativity, and Public Policy

How do I build a Thesaurus Use existing dictionaries and thesauri to decide on the terms and their relationships. Collect a set of representative documents and try to index them; take the set of indexing terms as your preliminary list. Review and organize the preliminary term set: decide on preferred terms and make Use references from the variants and synonyms; build hierarchical and associative relationships among the preferred terms. Produce a draft list, test and revise.

Scope terms Each knowledge class can have one scope term to limit the search scope: Technology -- will be searched by technologies AND “digital libraries” in the kclass of Digital Libraries. Each branch of knowledge class can have one scope term: Issues – in Technology branch will be search by “Issues and Technology and digital libraries”

Data Format –first year --, mutual funds, mutual-funds Investment-trusts Unit-trusts, http://www.brill.com, 1 1. Hierarchical level 2. Display term 3. Search term (synonyms) 4. URL 5. Search strategy code

Second year-- Last Year’s student project <topicmap title="Digital Libraries"> <topic id="General Resources" type="Main category"> <topic id="Bibliography"> <topname> <basename>Bibliography</basename> <dispname>Bibliography</dispname> <sortname></sortname> </topname> <occurs> </occurs> <topic id="IFLA bibliography" type="reference"> <basename>IFLA bibliography</basename> <dispname>IFLA bibliography</dispname> <occurs> type="website" href="http://www.ifla.org/II/diglib.htm" </occurs> </topic>

Third year: Visual Editing

Search Strategy key word search: 0 search term + branch scope term + class scope term 1 search term + class scope term 2 search term only Phrase search: 3 search term (as a phrase) +branch scope term + class scope term 4 search term (as a phrase) + class scope term 5 search term (as a phrase) Hierarchical search: 6 search term +its all the children + branch scope term + class scope term 7 search term +its all the children +class scope term 8 search term +its all the children No search: 9 No search No link for this display term; Label only Search terms+ display term: 10 same as 0 except display term also adds to the query 11 same as 1 except display term also adds to the query 12 … …

Digital Libraries General Resources Technology Projects Indexing & Cataloging Knowledge representation Metadata Resources Collections and Repositories Digital Preservation Economic and legal issues Intellectual Property Rights People and organizations

Next Version Convert to XML Use topic map standards Improve the editing tool

Next Integration: KEPT RDF-ISO Standards OAI protocol Knowledge-Enabled Personalization Tool (KEPT) Knowledge Repository Topic Map Editor Information Resources Drag and drop Relational Database Thesauri Ontologies Topic maps ……. Hierarchical Generator Co-occurrence Mapping Web Browser Schema XML XML XSLT Searching/ Browsing Interface Search engines XML Application Server HTTP Server

New Interface Search: Primary Source: TopicMap Recycling ERIC Thesaurus TopicMap ERIC Thesaurus ERIC Database Secondary Source: MeSH Related Terms: Conservation (Environment) Depleted Resources Ecology Natural Resources Pollution Recycling Solid Wastes Waste Disposal Waste Water Wastes Water Treatment Broader Terms: Sanitation Co-occurrence Terms: Environmental Education Waste Disposal Conservation (Environment) Science Education Natural Resources Solid Wastes Ecology Pollution Learning Activities Higher Education Wastes Instructional Materials Conservation Education Energy Environment MeSH Terms matched “Pollution”: Air Pollution Air Pollution, Indoor Indoor Air Pollution Air Pollution, Radioactive Environmental Pollution Pollution, Environmental Tobacco Smoke Pollution Air Pollution, Tobacco Smoke Environmental Pollution, Tobacco Smoke Environmental Smoke Pollution, Tobacco Environmental Tobacco Smoke Pollution Water Pollution Thermal Water Pollution Water Pollution, Thermal Water Pollution, Chemical Chemical Water Pollution Water Pollution, Radioactive Recycling Ecology Wastes Waste Water Waste disposal Pollution Air pollution Water pollution Indoor pollution Energy Natural Resources Water Power Conservation Education Attitudes Motivations ……

Next Level: Building a Knowledge Middleware

The Knowledge Middleware A centralized repository that integrates diverse knowledge structures A set of mapping tools and protocols for crosswalks among various thesauri; A dynamic knowledge base for semantic neighborhoods that uses term occurrences and co-occurrences A web-based authoring and editing tool for building personalized topic maps from existing knowledge structures in the repository A visual search interface for content-base searching with the help of knowledge structures in the repository.

A semantic map for “Digital Libraries” in INSPEC database

Conclusions Knowledge Organizing is one of the major challenges of Digital Libraries. There are increasing demand for formalized (marked up) knowledge. There are increasing tools and specification for subject access (or knowledge access) to the Web and to Digital libraries.

References Xiao, Y. (1994). Facet Classification: A consideration of its features as a paradigm of knowledge organization. Knowledge Organization 21(2), pp. 64-68. Bies, W. (1996). Thinking with the help of images: on the metaphors of knowledge organization. Knowledge Organization 23(1), pp. 3-8. Huth, M. (1995). Symbolic and sub-symbolic knowledge organization in the computational theory of mind. Knowledge Organization 22(1), 10 - 17.