Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying Text Classification in Conference Management: Some Lessons Learned Andreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber.

Similar presentations


Presentation on theme: "Applying Text Classification in Conference Management: Some Lessons Learned Andreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber."— Presentation transcript:

1 Applying Text Classification in Conference Management: Some Lessons Learned Andreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber

2

3 Overview  Conference Management Systems  Classification & Clustering  Case Studies  ECDL 2005  ECR  Conclusions

4 Conference Management Systems  Set of tools to support conference workflow  Basic support for paper submission & review collection  Many tasks for further automation  Selection of the program committee  Topic assignment of submission  Paper to reviewer assignment  Support in review generation  Poster arrangement  Post-conference access to papers

5 Classification & Clustering  Topic assignment of submission  Problem: authors uncertain about precise topic assignment (conference terminology)  Solution: support by automatic assignment  Method: ATC based on abstracts  Poster arrangement & Post-conference access to papers  Problem: topic based arrangement  Solution: clustering  Method: SOM & Mnemonic SOM

6 ATC for topic assignment  Train model based on previous conferences  Abstract submission  Automatic assignment  Confirmation

7 Clustering for organization  Arrange posters thematically  Non-rectangular SOMs reflecting conference site  Mnemonic SOMs simplify post-conference paper access

8 Overview Conference Management Systems  Classification & Clustering  Case Studies  ECDL 2005  ECR  Conclusions

9 ECDL 2005 – ATC data  English abstracts of previous ECDL conferences  Topics of the conference call -> defined seven categories  Pre-processing (removing all numbers, punctuation marks, special characters, transformation to lower case)  tfidf-weighting  4,141 unique terms  IG of 3,460 top ranked terms average - accuracy over all category is 58.60%

10 ECDL – training data class-IDclass descriptionsum 1 Concepts of Digital Libraries, Concepts of Documents and Metadata 34 2 System Architectures, Open Archives, Collection Building, Integration and Interoperability 40 3 Information Retrieval, Information Organization, Search and Usage 67 4 User Studies, System Evaluation, Personalization, User Interfaces and User Centered Design 50 5Digital Preservation, Web Archiving and Long Term Access12 6Digital Library Applications and Case Studies65 7 Multimedia, Mixed Media, Audio, Video, 3D and non-traditional Objects 43 sum over the selected abstracts311

11 ECDL 2005 – classification results class-ID1234567totalrecallF1F1 11122.1180.130.17 21171....190.890.77 313266.2.380.680.69 4..421.21280.750.71 5113..1170.00 6.312.121190.630.65 7......331.000.60 precision0.250.680.700.680.000.670.43

12 ECDL 2005 – SOM data  Poster and Paper Organization:  full text of accepted posters of ECDL 2005  term selection based on minimal word length and document frequencies  30 posters - 569 terms  Post-conference access  71 papers and posters – 5,654 terms

13 ECDL 2005 – SOM

14 ECDL 2005 – SOM (2)

15 Overview Conference Management Systems  Classification & Clustering  Case Studies  ECDL 2005  ECR  Conclusions

16 ECR - Data  Abstracts of the ECR: European Congress for Radiology  Training set: ECR 2003 & 2004 - 1,952 documents  Test set: ECR 2005 - 924 documents  Same steps as for the ECDL data  Resulting in 14,887 unique terms  IG: 5,720 top ranked terms, average accuracy over all categories of 73.57%

17 ECR – training data class-IDclass description20032004sum 1Abdominal and Gastrointestinal160119279 2Breast8059139 3Cardiac70 140 4Chest6070130 5Computer Applications30 60 6Contrast Media403979 7Genitourinary7060130 8Head and Neck40 80 9Interventional Radiology130117247 10Musculoskeletal9080170 11Neuro9099189 12Pediatric304070 13Physics in Radiology40 80 14Radiographers10 20 15Vascular6970139 sum over the selected abstracts10099431952

18 ECR 2005 – classification results class-ID123456789101112131415totalrecallF1F1 11111.1222.212.1.11260.880.79 2161......1...6..690.880.87 31.73......1..3.2800.910.86 46.5491...3.1...5700.700.77 522.310.....3.7.3300.330.43 612..1.262.121.1.3490.530.61 75....138.532.3.1580.660.73 84..1.248224.2.1300.270.39 9242.13..9922...51200.830.81 1022.111..260512.1780.770.78 111.11...14.6421.4790.810.73 124.1....1111011..1300.370.50 13.13......12.39.2480.810.68 142...1....3...2.80.250.40 152.4..1.13...1.37490.760.64 precision0.720.860.820.860.630.720.830.730.800.790.670.790.591.000.56

19 Conclusions  Quality is proportional to amount of training documents  Structure of the classes (overlapping?)  The bulk of submissions can be dealt with automatically  May be used for session assignment  Arrange poster & papers thematically  Easy to memorize & find

20 Questions? E-Commerce Competence Center Donau-City-Strasse 1 1220 Vienna Austria Phone:+43/1/522 71 71-20 Fax: +43/1/522 71 71-71 Internet:http://www.ec3.at/ E-Mail:office@ec3.at

21

22 ECDL 2005


Download ppt "Applying Text Classification in Conference Management: Some Lessons Learned Andreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber."

Similar presentations


Ads by Google