Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones,

Slides:



Advertisements
Similar presentations
Support.ebsco.com Nursing Reference Center Tutorial.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Course and Syllabus Design Dr. Marie Norman Teaching Consultant and Research Associate Eberly Center for Teaching Excellence
Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
A step-by-step tutorial Created by Henry Liu Auckland Libraries Make a start. Chinese Digital Community.
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
Ensembles in Adversarial Classification for Spam Deepak Chinavle, Pranam Kolari, Tim Oates and Tim Finin University of Maryland, Baltimore County Full.
Speeding up multi-task learning Phong T Pham. Multi-task learning  Combine data from various data sources  Potentially exploit the inter-relation between.
Sakai Overview ITS Teaching and Learning Interactive Aurora Collado January 10, 2008.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital.
Storyboards School Library Web Site for Handheld Presentation PLK YU LEE MO FAN MEMORIAL SCHOOL Christy Chan.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Flash talk by: Aditi Garg, Xiaoran Wang Authors: Sarah Rastkar, Gail C. Murphy and Gabriel Murray.
From Information Literacy to Scholarly Identity: Effective Pedagogical Strategies for Social Bookmarking EDUCAUSE 07 - Deborah Everhart, Adjunct Assistant.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
CS 5604 Spring 2015 Classification Xuewen Cui Rongrong Tao Ruide Zhang May 5th, 2015.
A Hybrid Model to Detect Malicious Executables Mohammad M. Masud Latifur Khan Bhavani Thuraisingham Department of Computer Science The University of Texas.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
World Bank’s Strategic Environmental Assessment Toolkit STRATEGIC ENVIRONMENTAL ASSESSMENT IN AFRICA REGIONAL WORKSHOP ADDIS ABABA, ETHIOPIA, June 16-19,2008.
Tutorial 1: Getting Started with Adobe Dreamweaver CS4.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Group 2 R 李庭閣 R 孔垂玖 R 許守傑 R 鄭力維.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Developing a Concept Extraction Technique with Ensemble Pathway Prat Tanapaisankit (NJIT), Min Song (NJIT), and Edward A. Fox (Virginia Tech) Abstract.
System for Administration, Training, and Educational Resources for NASA SATERN Overview for Users December 2009.
Digital Library Syllabus Uploader Will Cameron CSC 8530 October 19, 2006 Project Presentation 2.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
1 SIGIR 2004 Web-page Classification through Summarization Dou Shen Zheng Chen * Qiang Yang Presentation : Yao-Min Huang Date : 09/15/2004.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
Teaching AIS OnlineAIS Educator’s Conference Teaching AIS Online Training Session For AIS Educator’s Conference June 24, 2005.
Spam Detection Ethan Grefe December 13, 2013.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Student Records Training Team
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
The Digital Library for Earth System Science: Contributing resources and collections GCCS Internship Orientation Holly Devaul 19 June 2003.
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
ProjFocusedCrawler CS5604 Information Storage and Retrieval, Fall 2012 Virginia Tech December 4, 2012 Mohamed M. G. Farag Mohammed Saquib Khan Prasad Krishnamurthi.
Syllabus Design and Resources, Part 1
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Automating Readers’ Advisory to Make Book Recommendations for K-12 Readers by Alicia Wood.
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
Digital Library Syllabus Uploader Will Cameron CSC 8530 Fall 2006 Presentation 1.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
ANGEL Penn State’s Course Management System Created by PSY Office of C&IS.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Big Data Processing of School Shooting Archives
Source: Procedia Computer Science(2015)70:
Classifying enterprises by economic activity
Information Retrieval
Kingdom of Saudi Arabia
Deep SEARCH 9 A new tool in the box for automatic content classification: DS9 Machine Learning uses Hybrid Semantic AI ConTech November.
Presentation transcript:

Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones, William Cameron, GuoFang Teng, and Lillian (“Boots”) Cassel

Why Study the Syllabus Genre? ► Educational resource ► Importance to the educational community  Educators  Students  Self-learners ► Thanks to NSF DUE grant (personalization support for NSDL)

Where to look for a specific syllabus? ► ► Non-standard publishing mechanisms:   Instructor’s website   CMSs (courseware management systems, e.g., Sakai)   Catalogs ► ► Limited access outside the university ► Search on the Web  Many non-relevant links in search results

Syllabus Library ► Bootstrapping  Identify true syllabi from search results  Store in a repository  Develop tools & applications ► Scaling up  Encourage contributions from educational communities

An Essential Step towards Syllabus Library: Classification ► Classification Objects:  Potential syllabi in Computer Science: search on the Web, using syllabus keywords, only in the educational domains ► Class Definition ► Feature Selection ► Model Selection ► Training and Testing

Four Classes Noise

Full Syllabus

Partial Syllabus

Entry Page

Noise

Syllabus Components ► ► course code ► ► title ► ► class time& location ► ► offering institution ► ► teaching staff ► ► course description ► ► objectives ► web site ► prerequisite ► textbook ► grading policy ► schedule ► assignment ► exam and resources

Features ► 84 Genre-specific Features   the occurrences of keywords   the positions of keywords, and   the co-occurrences of keywords and links ► ► A series of keywords for each syllabus component

Classification Models ► Discriminative Models  Support Vector Machines (SVM)  SMO-L:  SMO-L: Sequential Minimal Optimization, accelerating the training process of SVM  SMO-P: SMO with a polynomial kernel ► Generative Models  Naïve Bayes (NB)  NB-K: Applying kernel methods to estimate the distribution of numeric attributes in NB modeling

Evaluation ► Training corpus: 1020 out of the potential syllabi ► All in HTML, PDF, PostScript, or Text ► Manual tagging on the training corpus  Unanimous agreement by three co-authors ► Evaluation strategy: ten-fold cross validation ► Metrics: F 1 (an overall measure of classification performance)

Results w. random set Best items are in purple boxes. Acc tr : Classification accuracy on the training set.

Results (Cont’d) ► SVM outperforms NB regarding our syllabus classification on average. ► All classifiers fail in identifying the partial syllabus class. ► The kernel settings for NB are not helpful in the syllabus classification task. ► Classification accuracy on training data is not that good.

Future Work ► Feature selection  Add general feature selection methods on text classification  e.g., Document Frequency, Information Gain, and Mutual Information  Hybrid: combine our genre-specific features with the general features

Future Work (Cont’d) ► Syllabus Library  Welcome to  Share your favorite course resources – not limited to the syllabus genre. ► Information Extraction  Semantic search ► Personalization

Summary ► Towards a syllabus library  Starting from search results on the web  Classification of the search results for true syllabi ► SVM is a better choice for our syllabus classification task. ► Towards an educational on-line community around the syllabus library

Q & A