Bruxelles, 2006-03-10 Computer Aided Document Indexing System (CADIS) with Eurovoc Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing.

Slides:



Advertisements
Similar presentations
Chapter 09 AI techniques in different game genres (Puzzle/Card/Shooting)
Advertisements

The Education Reform Initiative of South Eastern Europe-an instrument for regional cooperation Towards a European Qualification Framework for Lifelong.
1 Automatic Indexing with the EuroVoc Thesaurus Enabling Cross-lingual Search Marie Francine Moens Katholieke Universiteit Leuven, Belgium Frane Šarić
EC PHARE project EU IMPACT - Academic network for communicating integration impacts in Croatia Effective networking: Case study EU IMPACT: Academic network.
JRC-Ispra, , Slide 1 Introduction – Presentation of the Programme Ralf Steinberger Addressing the Language Barrier Problem in the Enlarged EU Automatic.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Sarajevo, October 17,   The institution of the Unit was in line with the strategy adopted by the Italian Government in the Region and its work.
CYPRUS UNIVERSITY OF TECHNOLOGY Internal Evaluation Procedures at CUT Quality Assurance Seminar Organised by the Ministry of Education and Culture and.
Prof. Vincenzo Cuomo Italian National Council of Research Institute of Methodologies for Environmental Analysis Topic DRS 19 Communication technologies.
K.U. Leuven Leuven Morphological Normalization and Collocation Extraction Jan Šnajder, Bojana Dalbelo Bašić, Marko Tadić University of Zagreb.
IAEA International Atomic Energy Agency United Nations Library and Information Network for Knowledge Sharing (UN-LINKS) September 2013, Geneva.
Department of Mathematics and Computer Science
Multilingual multimedia thesaurus for conservation and restoration collaborative networked model of construction Lucijana Leoni University of Dubrovnik.
NATIONAL PROJECT ACTIVITIES, EVALUATION , Ankara EUROFACE CONSULTING, CZECH REPUBLIC.
By Tracey Windley and Jasper Nance Professor Herb Hess Kevin Buck 2006 ASEE Annual Conference 6/19/2006 Instant Data Gathering, Processing, and Display.
1 Welcome & Overview 2 nd Annual Workshop “What are National Security Threats?” Kathleen D. Morrison Co-Director, JTAC Professor of Anthropology Director,
Faculty of Computer Science © 2006 CMPUT 605March 31, 2008 Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition.
Distributed Collaborations Using Network Mobile Agents Anand Tripathi, Tanvir Ahmed, Vineet Kakani and Shremattie Jaman Department of computer science.
5 th AMICAL Conference 25 – 28 May 2008 Blagoevgrad, Bulgaria Open Source Applications at AUCA Learning, Teaching and Collaboration.
Revitalizing radical social work in 21st century: practical opportunities for social change Ana Miljenović, prof. Nino Žganec Study Centre for Social Work,
Recent international developments in Energy Statistics United Nations Statistics Division International Workshop on Energy Statistics September 2012,
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
University of Zagreb Faculty of Mechanical Engineering and Naval Architecture Power Engineering Department WEB-MOB kick-off meeting Saranda, September.
Automatic translation quality control using Eurovoc descriptors Marko Tadić, Božo Bekavac
Cooperation with NGOs with local and regional authorities and scientific institutions in project implementation RES eurofunded Domagoj Vidakovic, M.A,
CROATIA in figures... IP LibCMASS 2011 Contract № 2011-ERA-IP-7 SULSIT, Sofia 4-17 Sept. Ana Gabrijela Blažević - Ivica Čevis - Davor Ferković - Ida Indir.
Introduction to Computer and Programming CS-101 Lecture 6 By : Lecturer : Omer Salih Dawood Department of Computer Science College of Arts and Science.
Idea During the last decade, teachers/trainers of science and technology subjects have been faced with an extensive use of computer based approach in.
Title of the Poster. “Digital library services and their impact with reference to a developing country: The case of the Faculty of Health Sciences library,
Priorities in the Study of Information Sciences Faculty of Humanities and Social Sciences, University of Zagreb, Croatia Ph.D. Sanja Seljan, associate.
Leuven, Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty.
Glyn Williams Sheffield International Development Network University of Sheffield.
Pržno, Republic of Montenegro 8 October 2007 TRANSLATION FOR EU ACCESSION TRANSLATION FOR EU ACCESSION Jasminka Novak, Head of Service Independent Service.
K.U. Leuven Leuven Morphological Normalization and Collocation Extraction Jan Šnajder, Bojana Dalbelo Bašić, Marko Tadić University of Zagreb.
Structure of Study Programmes
CSR NATIONAL PLATFORM. CSR Platform EU project „Support to National CSR Platforms” (4 countries ) 01/07/2010 – 01/07/2012 Partners : HRPSOR, HUP, HGK,
Location of JSI EuropeSlovenia Micro-location of JSI Department of Knowledge Technologies Jožef Stefan Institute Ljubljana.
Project meeting Zagreb Computer Aided Document Indexing for Accessing Legislation Joint Flemish-Croatian project 5th project meeting Zagreb.
JRC-Ispra, , Slide 1 Next Steps / Technical Details Bruno Pouliquen & Ralf Steinberger Addressing the Language Barrier Problem in the Enlarged.
Workshop on the Implementation of EU Criteria on Gender Equality Podgorica, November 2013 Sara Slana European Institute for Gender Equality.
Structure of Study Programmes Bachelor of Computer Science Bachelor of Information Technology Master of Computer Science Master of Information Technology.
Encourage and Support University Teachers to Use ICT in Education Sonja Prišćan, Daliborka Pašić.
2XML Marko Tadić Department of linguistics, Faculty of philosophy, University of Zagreb ( Tübingen,
Kingdom of Saudi Arabia Ministry of Higher Education Al-Imam Muhammad ibn Saud Islamic University College of Computer and Information Sciences Types of.
11th International Conference on Interactive Computer aided Learning September 24 –25, 2008, Villah, Austria EVLM pilot project - European challenges in.
European Co-ordination Project on COllaborative DEmand and Supply NETworks What partners mission tools and expected results... CO-DESNET is.
The Implementation of the Bologna Process at the University of Dubrovnik.
Data mining education What’s cookin’ ? Maja Skrjanc.
Sergey Gromov Yulia Krasilnikova Vladimir Polyakov (NRTU MISIS, Moscow) KNOWLEDGE BASE CREATION FOR NATIONAL NANOTECHNOLOGY NETWORKS «CONSTRUCTIONAL NANOMATERIALS»
ICT TOOLS AND SOCIETY INVOLVEMENT AMONG THE EUPAN NETWORK HIGHLIGHTS FROM THE SURVEY RESULTS TANYA CHETCUTI AND MARCO FICHERA - WORKSHOP EUROPEAN COMMISSION.
Compiling, processing and accessing the collection of legal regulations of the Republic of Croatia T. Didak Prekpalaj, T. Horvat, D. Miletić, D. Mokriš.
SEND – ILL Service Online Marina Mayer, Rudjer Boskovic Institute Library, Croatia Alen Vodopijevec, Rudjer Boskovic Institute Library, Croatia.
Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  What is engineer,  What is Computer Engineering  The topics in Computer Engineering You will learn: 2.
Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič.
1 ARTIFICIAL INTELLIGENCE Gilles BÉZARD Version 3.16.
Comparison of using the Emilyo and MilUNI portals to share information among the schools LoD7 Annual Meeting 30 September 2015 Hana VLACHOVÁ – Erasmus+
Electrical Engineering
Artificial Intelligence
Machine Intelligence & Data Science
Organization and Knowledge Management
Thai AGROVOC Ontology Base for Agricultural Information Retrieval
Workshop Aims & Objectives
CADIAL search engine at INEX
University of Modena and Reggio Emilia
User Interface(UI) Developer Skills & Responsibilities.
MANAGING KNOWLEDGE FOR THE DIGITAL FIRM
Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty of Electrical Engineering.
IPA 2008 FF RAC FADN project (TWL) Zagreb, July 2012
ITS 2.0 Enriched Terminology Annotation Showcase
European Masters Program Language & Communication Technologies
Presentation transcript:

Bruxelles, Computer Aided Document Indexing System (CADIS) with Eurovoc Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing University of Zagreb Marko Tadić Faculty of Humanities and Social Sciences University of Zagreb

Bruxelles, Project AIDE  idea for a project  September 2004, conference at JRC, Ispra  interdisciplinary collaboration of 3 institutions  Croatian Information Documentation Referral Agency (HIDRA)  Department of Electronics, Microelectronics, Computer and Intelligent Systems (ZEMRIS) Faculty of Electrical Engineering and Computing University of Zagreb  Institute of Linguistics (ZZL) Faculty of Humanities and Social Sciences University of Zagreb

Bruxelles, AIDE – collaborating institutions  HIDRA  collecting, processing, providing public access and promotion of the official documentation of the Republic of Croatia  coordinator Maja Cvitaš, M.A.  ZEMRIS  research in the field of artificial intelligence, neural networks, machine learning, data and text mining  coordinators prof. Bojana Dalbelo Bašić and Jan Šnajder  ZZL  computational linguistic research and building language technologies for Croatian  coordinator prof. Marko Tadić

Bruxelles, AIDE – project objective Development of intelligent system for automatic indexing of the official documentation of the Republic of Croatia with descriptors from Eurovoc thesaurus

Bruxelles, AIDE – how?  automatic indexing, how?  program which “learns to index”  Joint Research Center of EC (JRC), Ispra, Italy  at least 10,000 manually indexed documents  3-5 descriptors per document  documents per descriptor  indexed documents stored in XML format  Steinberger (2003)  compiling a corpus of Croatian indexed documents for machine learning of automatic indexing with Eurovoc descriptors  situation with Croatian documentation in  there were only few hundreds of documents indexed  manual indexing: painfully slow

Bruxelles, AIDE – how?  how could we speed up the manual indexing?  plan:  to develop a workstation for computer aided document indexing  conduct the research and development of algorithms in the field of computational linguistics/language technologies  insert that knowledge in the workstation and turn it into Computer Aided Document Indexing System (CADIS)

Bruxelles, CADIS: two windows Document window Eurovoc browser window

Bruxelles, Document Window

Bruxelles,

CADIS features  Enhanced user interface  list of descriptors appearing in document

Bruxelles, CADIS features  Descriptors and non-descriptors marked in document

Bruxelles, CADIS features  Lists of n-grams

Bruxelles, CADIS features  Integration of corpus analysis  greyed n-grams are statistically relevant in the corpus

Bruxelles, CADIS features  Manual marking of significant n-grams — important step towards automatic indexing

Bruxelles, Eurovoc browser window

Bruxelles, Further development  CADIS for other languages?  already for Croatian and English  usable for other languages without linguistic module  cooperation needed with respective language technology experts for development of linguistic module for other languages  partners for EU project proposals for the next step  AIDE  research on machine learning and text-mining  use that knowledge to turn the workstation into an intelligent system for Automatic Indexing of Documents with Eurovoc  establishing the publicly accessible service for automatic indexing of the official documentation of the Republic of Croatia

Bruxelles,

Bruxelles, Conclusion  CADIS is unique in Europe  Web info at:  HIDRA:  ZEMRIS: textmining.zemris.fer.hrtextmining.zemris.fer.hr  for download contact:

Bruxelles, Computer Aided Document Indexing System (CADIS) with Eurovoc Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing University of Zagreb Marko Tadić Faculty of Humanities and Social Sciences University of Zagreb