Presentation is loading. Please wait.

Presentation is loading. Please wait.

ISPRA 2004 Automatic Eurovoc indexing an Experiment in the Czech Parliament Anna Lhotská, Václav Sklenář Office of the Chamber of Deputies, Parliament.

Similar presentations


Presentation on theme: "ISPRA 2004 Automatic Eurovoc indexing an Experiment in the Czech Parliament Anna Lhotská, Václav Sklenář Office of the Chamber of Deputies, Parliament."— Presentation transcript:

1 ISPRA 2004 Automatic Eurovoc indexing an Experiment in the Czech Parliament Anna Lhotská, Václav Sklenář Office of the Chamber of Deputies, Parliament of the Czech Republic

2 ISPRA 2004 History of the Czech version 1993 - preliminary translation of the second version into Czech 1995 - Czech version - edition 3.0 07/2003 - Czech version - edition 4.0. 07/2004 - Czech version edition 4.1 The Czech Eurovoc is fully compatible with other official language versions

3 ISPRA 2004 Application of Eurovoc in the Information System of Parliament library database - aRL database of petitions - Lotus Notes intellectual indexing of parliamentary documents multilingual searching in parliamentary documentation

4 ISPRA 2004 Manual indexing/1 all parliamentary documents that are publicly accessible in full text in an electronic form via Internet are indexed intellectually with Eurovoc terms. At present it represents 3.500 documents retrospective indexing of older materials continues, great number of older documents still remains not indexed

5 ISPRA 2004 Manual indexing/2 document types - bills, budgets, agreements, agendas, parliamentary questions classification - 127 Eurovoc Microthesaury indexing - descriptors from Eurovoc - multilingual searching indexing - descriptors from complementary thesaurus - searching in Czech language only

6 ISPRA 2004 Automatic indexing tool/1 SELECTION of terms from document text STOP-WORDS LIST (negative dictionary) LEMMATIZER - set of rules for grammatical alterations COMPARISION of a set of basic-form words from a text with Eurovoc terms

7 ISPRA 2004 Automatic indexing tool/2 MULTIWORD EXPRESSIONS recognition NON-DESCRIPTOR/DESCRIPTOR transformation ABSOLUTE FREQUENCY WEIGHTING

8 ISPRA 2004 Limitations of the automatic indexing tool insufficient term weighting system term location within the text (non-structured texts) insufficient recognition of multiword expressions lack of automatic non-descriptor/descriptor proposals.

9 ISPRA 2004 Thank you for your attention Anna Lhotská, lhotska@psp.cz Václav Sklenář, sklenar@psp.cz


Download ppt "ISPRA 2004 Automatic Eurovoc indexing an Experiment in the Czech Parliament Anna Lhotská, Václav Sklenář Office of the Chamber of Deputies, Parliament."

Similar presentations


Ads by Google