Union Catalog and Knowledge Engineering for TELDAP Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow.

Slides:



Advertisements
Similar presentations
Catherine Worrall Slide Library Co-ordinator, University College Falmouth.
Advertisements

Using Reference Sources Fleet RISD. Why Use Reference Sources? Reference Sources provide an overview of a subject at the beginning of the research.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Features and Uses of a Multilingual Full-Text Electronic Theses and Dissertations (ETDs) System Yin Zhang Kent State University Kyiho Lee, Bumjong You.
Reference 2.0: Using New Web Technologies to Enhance Public Service Texas Library Association Conference April 17, 2008 Stephen F. Austin State University’s.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Introducing Symposia : “ The digital repository that thinks like a librarian”
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Demonstration of repositories Fedora (Flexible Extensible Digital Object Repository Architecture) Marie Lagerwall MIDESS Partners Meeting February 9, 2007.
Overview of Search Engines
Europeana: Europe's Digital Library, Museum and Archive Ashley Carter and Dana Sagona.
NOBLE Digital Library. How does it work? The NOBLE Digital Library uses the DSpace platform. Image files and metadata are imported into DSpace using.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
Educause October 29, 2001 A GEM of a Resource: The Gateway to Educational Materials Copyright Nancy Virgil Morgan, This work is the intellectual.
1 Open-source platform for accessible content management Museo & Web CMS.
The attic & the parlor CHM collections & exhibitions overview May 5, 2006 Kirsten Tashev VP Collections & Exhibitions.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Publishing Digital Content to a LOR Publishing Digital Content to a LOR 1.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Culture & Sport Science & Technology: iMus – Israeli Museums System Public web portal
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
‘The Universal Catalogue’ a cultural sector viewpoint David Dawson Senior Policy Adviser (Digital Futures) Museums, Libraries and archives Council.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Producción de Sistemas de Información Agosto-Diciembre 2007 Sesión # 8.
NARA’s New Authority Sources: Authority Files and Thesauri in ARC C. Jerry Simmons Authority Team Leader, Lifecycle Coordination Staff National Archives.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Introduction to Omeka. What is Omeka? - An Open Source web publishing platform - Used by libraries, archives, museums, and scholars through a set of commonly.
MTA SZTAKI Department of Distributed Systems The problems of persistent identifiers in the context of the National Digital Data Archives of Hungary András.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Introduction to metadata
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
Evidence from Metadata INST 734 Doug Oard Module 8.
E-Heritage and the VU Semantic Web group Guus Schreiber Computer Science VU University Amsterdam.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
ALA Annual Meeting Claire Cocco Global Product Manager CONTENTdm Users Group June 30th, 2008.
Video Active Presentation Agenda: –Demonstration of videoactive.eu Frontend and Backend fiatifta.dk Copenhagen September 2008.
Topic Maps for Cultural Heritage Collections Conal Tuohy Senior Developer New Zealand Electronic Text Centre
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
An Application Profile and Prototype Metadata Management System for Licensed Electronic Resources Adam Chandler Information Technology Librarian Central.
The TERENA-OER Portal Eli Shmueli IUCC- Israeli-Inter Universities Communication Center MEITAL- Inter-University Center for e-Learning
introductionwhyexamples What is a Web site? A web site is: a presentation tool; a way to communicate; a learning tool; a teaching tool; a marketing important.
Taiwan Experience in Digital Archive : A Brief Introduction to Digitization Procedures Guideline Pengsheng Chiu Associate Research Fellow, Institute of.
Organization of Information LSIS Summer II (2005)
Global Rangelands Data Entry Guidelines March 23, 2015.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
TRIG: Truckee River Info Gateway Dave Waetjen Graduate Student in Geography Information Center for the Environement (ICE) University of California, Davis.
Professional Development Programme: Design and Development of Institutional Repository Using DSpace Nipul G Shihora INFLIBNET Centre Gandhinagar
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Introduction to Metadata
American Library Association Online Resource Center
Introduction to Semantic Metadata & Semantic Web
Cataloging the Internet
Metadata to fit your needs... How much is too much?
Introduction of KNS55 Platform
Health On-Line Patient Education Web Site
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Márton Németh – László Drótos How to catalogue a web archive?
Metadata supported full-text search in a web archive
Presentation transcript:

Union Catalog and Knowledge Engineering for TELDAP Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow Research Center for Information Technology Innovation & Institute of Information Science, Academia Sinica

Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective

Introduction The integration and management of digital contents has become an important issue as the amount of digital contents produced from different projects and institutions increases rapidly. The goal of our project is to achieve optimized preservation, retrieval, and presentation of digital collections.

Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective

What is the union catalog ? It is a catalog and portal for all digital collections of TELDAP. It is an integrated platform for browsing and searching entire digital contents of TELDAP. Metadata provides core descriptions and licensing information of each digital collection.

Browsing by topics Search by keywords Home Page of Union Catalog

Some improved functions for IR Keyword suggestion Keyword extension Recommendation of related collections

Keyword suggestion

Keyword extension

Digital Image Recommendation of related collections Hyperlink to database Metadata Citation Social networking service Licensing Information

Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective

Metadata models for different types of objects Archived digital items Union catalog metadata model- Dublin core+ Web sites DCCAP (Dublin Core Collections Application Profile) Fields for internal used only ― Unique Identifier, Format, Evaluation, Cataloging History Documents Document metadata-Dublin core

13 Over 4 million digital items and still increasing ElementDefinition Title A name given to the resource Creator An entity primarily responsible for making the content of the resource Subject and Keywords The topic of the content of the resource Description An account of the content of the resource Publisher An entity responsible for making the resource available Contributor An entity responsible for making contributions to the content of the resource Date A date associated with an event in the life cycle of the resource Resource Type The nature or genre of the content of the resource Format The physical or digital manifestation of the resource Resource Identifier An unambiguous reference to the resource within a given context Source A Reference to a resource from which the present resource is derived Language A language of the intellectual content of the resource Relation A reference to a related resource Coverage The extent or scope of the content of the resource Rights Management Information about rights held in and over the resource Metadata for digital items :

14

Metadata for websites Over 690 websites and still increasing Metadata – DCCAP (Dublin Core Collections Application Profile) – To Combine the standard with our requirements: 19 data fields

The Website Homepage Picture URL, Project Information Type, Name, Author, Subject, Description, Language, Item Type, Target Archived Information: URL, time, authorization Copyright, Purpose, Other Information Figure: Social networking service

Uses of Metadata Search collections by matching keyword and features Provide basic information of each collection Dynamic categorization Provide information to compute similarity or relatedness of two collections Extract keywords

(1) Chinese Keyword Search  Keyword+(Features)  Synonyms, hyponyms  Matched Collections  Collections+Weights  Display Results Keyword Extension AAT- Taiwan &Teldap Thesauru s Keyword Matching Ranking Filtering Keyword Dictionary

English Keyword Search English Keyword+ (Features) Translations, Synonyms, Hyponyms Matched Collections Collections+Weights Display Results Keyword Translation & Extension AAT- Taiwan &Teldap Thesaurus Keyword Matching Ranking Filtering Keyword Dictionary

Ranking Algorithm  Rank Value(item)= W1* Association(Keyword, item) + W2*Quality(item) –Association(Keyword, item)=W1*Topical Similarity(Topic(keyword), Topic(item)) + W2*Importance of relation (Keyword, item) –Quality(item) =W1* Image quality (item) + W2*Qualification of provider (item) + W3*Metadata (item) Topical Similarity(Topic(keyword), Topic(item)) = Ontology Distance(Topic(keyword), Topic(item)) Importance of relation (Keyword, item) = W1*Keyword- from Value + W2*Mutual Information (keyword, Topic(item)) Keyword-from Value= 1 if keyword is contained in title(item) 0.5 if keyword is contained in description(item) Mutual Information (keyword, Topic(item))= P(Keyword, Topic(item))/{P(Keyword)*P(Topic(item))}

Algorithm for Recommending Related Collections  i-th Item Vector= {Topic, Institute, Keyword1,Keyword2,….}  Similar(i-th item, j-th item)= W1*Topic Similar(i-th item, j-th item)+ W2* Institute Similar(i-th item, j-th item)+ Weight(Keyword1) *Delta(Keyword1) + Weight(Keyword2) * Delta(Keyword2)+…; where Delta(Keyword1) = 1 if Keyword1 of i-th item is also keyword of j-th item; otherwise 0;  Recommendation= Similar(i-th item, j-th item)+ Evaluation(j-th item)

(2) Dynamic categorization User-oriented categorization General, elementary school students, high school students, researchers, …etc. Topical-based categorization Archaeology, painting, animal, plant, document, …etc. Functional-based categorization Research, education, business, technology,… Categorization based on institutions Academia Sinica, Taiwan U., Palace museum,…

(3) Multi-purposes of Core IR System and Databases  Teldap –Whole collections –Searched by institutes, domains, and media types (documents, images, videos, and web sites) –Monolingual  Digital Shop –Whole collections or only fine arts –General search and searched by licensing types –Rely on multilingual thesaurus Taiwan Academy – Fine arts Searched by institutes and domains – Multilingual – Rely on multilingual thesaurus

Figure: Digitalarchives.tw

Purpose: Education Target: Elementary school student, Junior high school student, Teacher… Purpose: Creative applications Purpose: Academic research Subject: Animal, Archaeology, Anthropology… Digitalarchives.tw

Figure: Taiwan Academy

Categorization based on institutions Topical-based categorization Taiwan Academy

Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective

Plans of making knowledge structures for TELDAP Construct metadata models for different objects. Establish hyperlinks between contexts and objects. Develop keyword extraction tools. Design automatic tagging tools. Construct TELDAP ontology and thesaurus. Art & Architecture Thesaurus by Getty Chinese WordNet

(1) Metadata models for different objects Digital collections – Union catalog metadata model- Dublin core+ Web sites – DCCAP (Dublin Core Collections Application Profile) – Public fields – Private fields Unique Identifier, Format, Evaluation, Cataloging History Documents – Document metadata-Dublin core

(2) Create keyword dictionary  Extract from metadata  Collect from Google search terms  By social tagging  Manually collect while tag hyperlinks

Lexical Entry of Keyword Dictionary  Keyword id  Keyword  Synset id  Hypernym id  Hyponym id  Features  Related Collections + Association Strengths

(2) Establish hyperlinks between contents and objects Identify keywords in contents. Tag keywords with related object hyperlinks.

Develop hyperlink tagging tools Word segmentation tools – Resolve word segmentation ambiguities and identify keywords. – CKIP word segmentation system:

Develop hyperlink tagging tools TELDAP keyword dictionary – Extract keywords from metadata and establish object-keyword relations. Extract text from XML data for each object. The text are classified by topics, titles, descriptions, authors, locations, eras etc. From each class of text file extract keywords by automatic word segmentation, keyword extraction, and manual post editing. – Current dictionary contains more than 120,000 Keywords.

Prototype system for hyperlink taggerhyperlink tagger Identify and select keywords from the input text

Prototype system for hyperlink tagger Produce text with hyperlinks

Prototype system for hyperlink tagger Hyperlinks point to the related digital collections

(3) Construct TELDAP ontology and thesaurus Establish association links between Chinese keywords and Getty AAT. Merge TELDAP keywords with Chinese AAT.

AAT Browsing trees of Taiwan Academy

AAT subject search of Taiwan Academy

Recommendation of related items

Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective

Future Perspective Technology development – Construct multi-lingua thesauri – Getty AAT. – Maintain the TELDAP keyword-and-object relation database. – Construct name authority files, gazetteers, and universal calendars. – Design hyperlink taggers and keyword extension tools. – Design an authoring tool which provides hyperlinks of keyword related digital contents automatically. – Design knowledge-based content retrieval system.

Future Perspectives Content enrichment – Within TELDAP : Standardize object metadata model and data format. Provide object metadata in controlled vocabulary. Write scripts and stories for different topics with Wiki- like knowledge structure. Enrich the digital collections. Establish hyperlinks between text books and TELDAP collections. – Extend the knowledge sources : e.g. Wikipedia