Download presentation
Presentation is loading. Please wait.
Published byRudolf Miles Modified over 9 years ago
1
Bruxelles, 2006-03-10 Computer Aided Document Indexing System (CADIS) with Eurovoc Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing University of Zagreb bojana.dalbelo@fer.hr Marko Tadić Faculty of Humanities and Social Sciences University of Zagreb marko.tadic@ffzg.hr
2
Bruxelles, 2006-03-10 Project AIDE idea for a project September 2004, conference at JRC, Ispra interdisciplinary collaboration of 3 institutions Croatian Information Documentation Referral Agency (HIDRA) Department of Electronics, Microelectronics, Computer and Intelligent Systems (ZEMRIS) Faculty of Electrical Engineering and Computing University of Zagreb Institute of Linguistics (ZZL) Faculty of Humanities and Social Sciences University of Zagreb
3
Bruxelles, 2006-03-10 AIDE – collaborating institutions HIDRA collecting, processing, providing public access and promotion of the official documentation of the Republic of Croatia coordinator Maja Cvitaš, M.A. ZEMRIS research in the field of artificial intelligence, neural networks, machine learning, data and text mining coordinators prof. Bojana Dalbelo Bašić and Jan Šnajder ZZL computational linguistic research and building language technologies for Croatian coordinator prof. Marko Tadić
4
Bruxelles, 2006-03-10 AIDE – project objective Development of intelligent system for automatic indexing of the official documentation of the Republic of Croatia with descriptors from Eurovoc thesaurus
5
Bruxelles, 2006-03-10 AIDE – how? automatic indexing, how? program which “learns to index” Joint Research Center of EC (JRC), Ispra, Italy at least 10,000 manually indexed documents 3-5 descriptors per document 10-15 documents per descriptor indexed documents stored in XML format Steinberger (2003) compiling a corpus of Croatian indexed documents for machine learning of automatic indexing with Eurovoc descriptors situation with Croatian documentation in 2004. there were only few hundreds of documents indexed manual indexing: painfully slow
6
Bruxelles, 2006-03-10 AIDE – how? how could we speed up the manual indexing? plan: to develop a workstation for computer aided document indexing conduct the research and development of algorithms in the field of computational linguistics/language technologies insert that knowledge in the workstation and turn it into Computer Aided Document Indexing System (CADIS)
7
Bruxelles, 2006-03-10 CADIS: two windows Document window Eurovoc browser window
8
Bruxelles, 2006-03-10 Document Window
9
Bruxelles, 2006-03-10
10
CADIS features Enhanced user interface list of descriptors appearing in document
11
Bruxelles, 2006-03-10 CADIS features Descriptors and non-descriptors marked in document
12
Bruxelles, 2006-03-10 CADIS features Lists of n-grams
13
Bruxelles, 2006-03-10 CADIS features Integration of corpus analysis greyed n-grams are statistically relevant in the corpus
14
Bruxelles, 2006-03-10 CADIS features Manual marking of significant n-grams — important step towards automatic indexing
15
Bruxelles, 2006-03-10 Eurovoc browser window
16
Bruxelles, 2006-03-10 Further development CADIS for other languages? already for Croatian and English usable for other languages without linguistic module cooperation needed with respective language technology experts for development of linguistic module for other languages partners for EU project proposals for the next step AIDE research on machine learning and text-mining use that knowledge to turn the workstation into an intelligent system for Automatic Indexing of Documents with Eurovoc establishing the publicly accessible service for automatic indexing of the official documentation of the Republic of Croatia
17
Bruxelles, 2006-03-10 http://textmining.zemris.fer.hr
18
Bruxelles, 2006-03-10 Conclusion CADIS is unique in Europe Web info at: HIDRA: www.hidra.hr/hidra/aide/aide.htmwww.hidra.hr/hidra/aide/aide.htm ZEMRIS: textmining.zemris.fer.hrtextmining.zemris.fer.hr for download contact: bojana.dalbelo@fer.hr
19
Bruxelles, 2006-03-10 Computer Aided Document Indexing System (CADIS) with Eurovoc Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing University of Zagreb bojana.dalbelo@fer.hr Marko Tadić Faculty of Humanities and Social Sciences University of Zagreb marko.tadic@ffzg.hr
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.