Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones,

Similar presentations


Presentation on theme: "Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones,"— Presentation transcript:

1 Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones, William Cameron, GuoFang Teng, and Lillian (“Boots”) Cassel

2 Why Study the Syllabus Genre? ► Educational resource ► Importance to the educational community  Educators  Students  Self-learners ► Thanks to NSF DUE grant 5328255 (personalization support for NSDL)

3 Where to look for a specific syllabus? ► ► Non-standard publishing mechanisms:   Instructor’s website   CMSs (courseware management systems, e.g., Sakai)   Catalogs ► ► Limited access outside the university ► Search on the Web  Many non-relevant links in search results

4 Syllabus Library ► Bootstrapping  Identify true syllabi from search results  Store in a repository  Develop tools & applications ► Scaling up  Encourage contributions from educational communities

5 An Essential Step towards Syllabus Library: Classification ► Classification Objects:  Potential syllabi in Computer Science: search on the Web, using syllabus keywords, only in the educational domains ► Class Definition ► Feature Selection ► Model Selection ► Training and Testing

6 Four Classes Noise

7 Full Syllabus

8 Partial Syllabus

9 Entry Page

10 Noise

11 Syllabus Components ► ► course code ► ► title ► ► class time& location ► ► offering institution ► ► teaching staff ► ► course description ► ► objectives ► web site ► prerequisite ► textbook ► grading policy ► schedule ► assignment ► exam and resources

12 Features ► 84 Genre-specific Features   the occurrences of keywords   the positions of keywords, and   the co-occurrences of keywords and links ► ► A series of keywords for each syllabus component

13 Classification Models ► Discriminative Models  Support Vector Machines (SVM)  SMO-L:  SMO-L: Sequential Minimal Optimization, accelerating the training process of SVM  SMO-P: SMO with a polynomial kernel ► Generative Models  Naïve Bayes (NB)  NB-K: Applying kernel methods to estimate the distribution of numeric attributes in NB modeling

14 Evaluation ► Training corpus: 1020 out of the 8000+ potential syllabi ► All in HTML, PDF, PostScript, or Text ► Manual tagging on the training corpus  Unanimous agreement by three co-authors ► Evaluation strategy: ten-fold cross validation ► Metrics: F 1 (an overall measure of classification performance)

15 Results w. random set Best items are in purple boxes. Acc tr : Classification accuracy on the training set.

16 Results (Cont’d) ► SVM outperforms NB regarding our syllabus classification on average. ► All classifiers fail in identifying the partial syllabus class. ► The kernel settings for NB are not helpful in the syllabus classification task. ► Classification accuracy on training data is not that good.

17 Future Work ► Feature selection  Add general feature selection methods on text classification  e.g., Document Frequency, Information Gain, and Mutual Information  Hybrid: combine our genre-specific features with the general features

18 Future Work (Cont’d) ► Syllabus Library  Welcome to http://doc.cs.vt.edu http://doc.cs.vt.edu  Share your favorite course resources – not limited to the syllabus genre. ► Information Extraction  Semantic search ► Personalization

19 Summary ► Towards a syllabus library  Starting from search results on the web  Classification of the search results for true syllabi ► SVM is a better choice for our syllabus classification task. ► Towards an educational on-line community around the syllabus library

20 Q & A


Download ppt "Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones,"

Similar presentations


Ads by Google