Download presentation
Presentation is loading. Please wait.
Published byBrenda Tate Modified over 9 years ago
1
Applying Text Classification in Conference Management: Some Lessons Learned Andreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber
3
Overview Conference Management Systems Classification & Clustering Case Studies ECDL 2005 ECR Conclusions
4
Conference Management Systems Set of tools to support conference workflow Basic support for paper submission & review collection Many tasks for further automation Selection of the program committee Topic assignment of submission Paper to reviewer assignment Support in review generation Poster arrangement Post-conference access to papers
5
Classification & Clustering Topic assignment of submission Problem: authors uncertain about precise topic assignment (conference terminology) Solution: support by automatic assignment Method: ATC based on abstracts Poster arrangement & Post-conference access to papers Problem: topic based arrangement Solution: clustering Method: SOM & Mnemonic SOM
6
ATC for topic assignment Train model based on previous conferences Abstract submission Automatic assignment Confirmation
7
Clustering for organization Arrange posters thematically Non-rectangular SOMs reflecting conference site Mnemonic SOMs simplify post-conference paper access
8
Overview Conference Management Systems Classification & Clustering Case Studies ECDL 2005 ECR Conclusions
9
ECDL 2005 – ATC data English abstracts of previous ECDL conferences Topics of the conference call -> defined seven categories Pre-processing (removing all numbers, punctuation marks, special characters, transformation to lower case) tfidf-weighting 4,141 unique terms IG of 3,460 top ranked terms average - accuracy over all category is 58.60%
10
ECDL – training data class-IDclass descriptionsum 1 Concepts of Digital Libraries, Concepts of Documents and Metadata 34 2 System Architectures, Open Archives, Collection Building, Integration and Interoperability 40 3 Information Retrieval, Information Organization, Search and Usage 67 4 User Studies, System Evaluation, Personalization, User Interfaces and User Centered Design 50 5Digital Preservation, Web Archiving and Long Term Access12 6Digital Library Applications and Case Studies65 7 Multimedia, Mixed Media, Audio, Video, 3D and non-traditional Objects 43 sum over the selected abstracts311
11
ECDL 2005 – classification results class-ID1234567totalrecallF1F1 11122.1180.130.17 21171....190.890.77 313266.2.380.680.69 4..421.21280.750.71 5113..1170.00 6.312.121190.630.65 7......331.000.60 precision0.250.680.700.680.000.670.43
12
ECDL 2005 – SOM data Poster and Paper Organization: full text of accepted posters of ECDL 2005 term selection based on minimal word length and document frequencies 30 posters - 569 terms Post-conference access 71 papers and posters – 5,654 terms
13
ECDL 2005 – SOM
14
ECDL 2005 – SOM (2)
15
Overview Conference Management Systems Classification & Clustering Case Studies ECDL 2005 ECR Conclusions
16
ECR - Data Abstracts of the ECR: European Congress for Radiology Training set: ECR 2003 & 2004 - 1,952 documents Test set: ECR 2005 - 924 documents Same steps as for the ECDL data Resulting in 14,887 unique terms IG: 5,720 top ranked terms, average accuracy over all categories of 73.57%
17
ECR – training data class-IDclass description20032004sum 1Abdominal and Gastrointestinal160119279 2Breast8059139 3Cardiac70 140 4Chest6070130 5Computer Applications30 60 6Contrast Media403979 7Genitourinary7060130 8Head and Neck40 80 9Interventional Radiology130117247 10Musculoskeletal9080170 11Neuro9099189 12Pediatric304070 13Physics in Radiology40 80 14Radiographers10 20 15Vascular6970139 sum over the selected abstracts10099431952
18
ECR 2005 – classification results class-ID123456789101112131415totalrecallF1F1 11111.1222.212.1.11260.880.79 2161......1...6..690.880.87 31.73......1..3.2800.910.86 46.5491...3.1...5700.700.77 522.310.....3.7.3300.330.43 612..1.262.121.1.3490.530.61 75....138.532.3.1580.660.73 84..1.248224.2.1300.270.39 9242.13..9922...51200.830.81 1022.111..260512.1780.770.78 111.11...14.6421.4790.810.73 124.1....1111011..1300.370.50 13.13......12.39.2480.810.68 142...1....3...2.80.250.40 152.4..1.13...1.37490.760.64 precision0.720.860.820.860.630.720.830.730.800.790.670.790.591.000.56
19
Conclusions Quality is proportional to amount of training documents Structure of the classes (overlapping?) The bulk of submissions can be dealt with automatically May be used for session assignment Arrange poster & papers thematically Easy to memorize & find
20
Questions? E-Commerce Competence Center Donau-City-Strasse 1 1220 Vienna Austria Phone:+43/1/522 71 71-20 Fax: +43/1/522 71 71-71 Internet:http://www.ec3.at/ E-Mail:office@ec3.at
22
ECDL 2005
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.