Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang
Demo
Goals of the project - construct a database of UIUC courses across all departments ultimately creating a centralized knowledgebase about each course. - augment the database by drawing relations between courses both within and between departments and further by finding similarities among courses outside of the University of Illinois.
DATA SOURCE Course Catalog Book Store Webpages Other Universities PHP script JAVA script AgentIDE Heritrix WEKA DATABASE Basic Course Info Book Info Course homepage Keywords Related Courses Query by Course Name Instructor Description … PHP Architecture
Web Crawling Wget, AgentIDE and Heritrix Parsers Python and Java Learning Tools WEKA Website Design PHP and MySQL Tools used
Tasks finished Data Mining – Basic course information Similar course recommendation Prerequisite course list Recommended book information Learning – Clustering Classification
Keywords Pull from course descriptions Remove uninformative/common words
Keywords (contd.)
Search Search by name, instructor, or content Clean up search string “cs125” becomes “CS 125” “real-time” becomes “real time realtime” Split search string into individual words and query database for word matches Score and rank results by match frequencies and keyword informativeness scores Look at distribution of scores and display the top results
Classification NBTree Classifier Training set: 34 instances Test set: 38 instances Attributes: 17 Accuracy % Precision Recall F-Measure -.947
Clustering Cobweb Clustering Algorithm Instances: 20 Attributes: 112 Number of clusters: 17 Incorrectly clustered instances: 7.0 (i.e. 35%)