prerequisite chain learning and the introduction of LectureBank

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Yansong Feng and Mirella Lapata
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Chapter 5: Introduction to Information Retrieval
Farag Saad i-KNOW 2014 Graz- Austria,
Improved TF-IDF Ranker
LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A N) Supervisor: Dongyuan Lu Aobo Tao Chen 1.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Scalable Text Mining with Sparse Generative Models
Using Social Networking Techniques in Text Mining Document Summarization.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
CSE 185 Introduction to Computer Vision Pattern Recognition.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.
Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, Yueheng Sun SIGIR’08 Speaker: Yi-Ling Tai Date: 2009/02/09 Finding Question-Answer Pairs from Online.
LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A N) Supervisor: Dongyuan Lu 1.
Deep Questions without Deep Understanding
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
DeepWalk: Online Learning of Social Representations
A Simple Approach for Author Profiling in MapReduce
Semi-Supervised Clustering
A Brief Introduction to Distant Supervision
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Semantic Processing with Context Analysis
Relation Extraction CSCI-GA.2591
A Deep Learning Technical Paper Recommender System
Regularized risk minimization
Natural Language Processing of Knee MRI Reports
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Classifying enterprises by economic activity
Restrict Range of Data Collection for Topic Trend Detection
Applying Key Phrase Extraction to aid Invalidity Search
Distributed Representation of Words, Sentences and Paragraphs
Prepared by: Mahmoud Rafeek Al-Farra
Department of Computer Science University of York
Text Categorization Assigning documents to a fixed set of categories
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Word embeddings based mapping
CSCI 5832 Natural Language Processing
Word embeddings based mapping
TutorialBank: Using a Manually-Collected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation Alexander R. Fabbri, Irene Li, Prawat.
Resource Recommendation for AAN
Prepared by: Mahmoud Rafeek Al-Farra
How To Extend the Training Data
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
DATA-Intensive systems Department of computer science
Natural Language Processing Is So Difficult
Topic: Semantic Text Mining
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Presented by Nick Janus
Presentation transcript:

prerequisite chain learning and the introduction of LectureBank How should I learn: prerequisite chain learning and the introduction of LectureBank Irene Li, Alexander R. Fabbri, Robert R. Tung, and Dragomir Radev 27, Apr, 2018 LILY Workshop

Introduction Learn a new concept as a student. Conditional Random Field Figure: Prerequisite Relation Learning for Concepts in MOOCs

Related Works RefD (Reference Distance ) models the relationship of two concepts by considering how differently they refer to each other, such as asymmetry and irreflexivity. Where w(ci,A) weights the importance of ci to A; r(ci,A) is an indicator showing whether ci refers to A, which could be links in Wikipedia, mentions in books or citations. Liang, Chen, et al. "Measuring prerequisite relations among concepts." Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015.

Approach Step 1: concept representation Clustering methods, LDA topic modelling. Topic Labelling: find out which topic is it (keyword rank or TF-IDF) Step 2: prerequisite relationship learning Binary Classifiers: NB, LR, SVM, etc Graph Convolutional Network (GCN): exciting TutorialBank: rich contents. Word embeddings and document embeddings. LectureBank: key words and phrases. Graph-based method.

Preliminary Results Gensim doc2vec on TutorialBank. Concatenate two concept vectors directly. 210 pairs (7/3, train/test) Method Precision Recall F1 Logistic Regression 0.720 0.758 0.739 Naive Bayesian 0.593 0.5634 0.613 SVM 0.624 0.569 0.596 Two vectors: K-L Div Language models: NLTK distribution (similarity metric) ←---TODO List NewAAN Slide-taxonomy (not in our taxonomy) Use the current AAN as training data + Drago’s lectures: classify Info from Adjacency slide + same presentation. Pre-req: orders. Harmonic functions (every slide is a node, define similarity, spectral method) Label propagation AAN Lectures. AAN Introduction semi-supervised learning

LectureBank Lecture Notes from 26 courses Total number of slides 527 Total number of tokens: 997074 Total sentence number: 354218 Total page number: 21344, average token per page : 46.7145

Concept Extraction and annotations Annotate prerequisite relations of concept pairs (~25%): sparse! LectureBank Concepts + TutorialBank concepts (~210 concepts) LectureBank concepts extraction: header section -> preprocess -> re- ranking-> list of concepts Up to now: 998 concepts in total (210 from TutorialBank): a lot! To annotate 25% of the pairs: ~166550 (?)

Graph-based approach on LectureBank Find out the topic of each slide page ? Harmonic Functions, Label Propagation Other topic.. Word vectors close Word vectors far ? close ? far close

References Schlichtkrull, Michael, et al. "Modeling Relational Data with Graph Convolutional Networks." arXiv preprint arXiv:1703.06103 (2017). Moody, Christopher E. "Mixing dirichlet topic models and word embeddings to make lda2vec." arXiv preprint arXiv:1605.02019 (2016). Pan, Liangming, et al. "Prerequisite relation learning for concepts in MOOCs." Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2017. Zhu, Xiaojin, Zoubin Ghahramani, and John D. Lafferty. "Semi-supervised learning using gaussian fields and harmonic functions." Proceedings of the 20th International conference on Machine learning (ICML-03). 2003.

irene.li@yale.edu ireneli.eu Thanks irene.li@yale.edu ireneli.eu