Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Slides:

Advertisements

Similar presentations

Speed dating Classification What you should know about dating Stephen Cohen Rajesh Ranganath Te Thamrongrattanarit.

Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Supervised Learning Recap

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Author : Zhen Hai, Kuiyu Chang, Gao Cong Source : CIKM’12 Speaker : Wei Chang Advisor : Prof. Jia-Ling Koh ONE SEED TO FIND THEM ALL: MINING OPINION FEATURES.

Sentiment Analysis An Overview of Concepts and Selected Techniques.

Joint Sentiment/Topic Model for Sentiment Analysis Chenghua Lin & Yulan He CIKM09.

Generative Topic Models for Community Analysis

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Introduction to Data Mining Engineering Group in ACL.

Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine.

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

Open IE and Universal Schema Discovery Heng Ji Acknowledgement: some slides from Daniel Weld and Dan Roth.

A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

A Language Independent Method for Question Classification COLING 2004.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.

Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

Topic Modeling using Latent Dirichlet Allocation

Analysis of Bootstrapping Algorithms Seminar of Machine Learning for Text Mining UPC, 18/11/2004 Mihai Surdeanu.

Semi-automatic Product Attribute Extraction from Store Website

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.

Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Personality Classification: Computational Intelligence in Psychology and Social Networks A. Kartelj, School of Mathematics, Belgrade V. Filipovic, School.

Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi

Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者：郝柏翰 2013/05/23.

Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Sentiment analysis algorithms and applications: A survey

School of Computer Science & Engineering

J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009

Machine Learning Week 1.

CLSciSumm-2018 What to submit Task Framework Task 1A Task 1B

Introduction Task: extracting relational facts from text

Resource Recommendation for AAN

KnowItAll and TextRunner

Presentation transcript:

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC

Understanding Research Communities Consider following questions  What are the key applications studied by the community?  What applications have matured enough to be used as a technique of other applications?  What methods were developed to solve a particular problem? In this paper  Extract concepts from scientific papers A concept is a cluster of possible mentions {svm, support vector machines, maximal margin classifiers,…}  Analyze computational linguistic research by answering above questions 2

Outline Computational Approach  Concept Mention Extraction  Citation-Context based Concept Clustering Evaluation of Algorithms Understanding Computational Linguistic Research 3

Concept Mention Extraction Identify and categorize mentions of concepts (Gupta and Manning, 2011)  TECHNIQUE and APPLICATION “We apply support vector machines on text classification.”  Unsupervised Bootstrapping algorithm (Yarowsky, 1995; Collins and Singer, 1999) The proposed algorithm 1. Extract noun phrases (Punyakanok and Roth, 2001) 2. For each category, initialize a decision list by seeds. 3. For several rounds, 1. Annotate NPs using the decision lists. 2. Extract top features from new annotated phrases, and add them into decision lists. 4

Paper1…………………………………… support vector machine………………... …………………………………………… ………………………………………. c4.5…….. Paper2…………………………………… svm-based classification………………….………………………………… decision_trees………….…….…………… …………………… Paper4…………………………………… maximal_margin_classifiers……………… …………………….……………………… ………………………………………….. Paper3.…………………………………… …………………………………….. svm….…………………………………….………………………………………… ………… (Cortes,1995) (Quinlan,1993) (Vapnik,1995) (Quinlan,1993) (Cortes,1995) (Quinlan,1993) (Vapnik,1995) (Quinlan,1993) (Cortes,1995) c4.5 decision trees support vector machine svm-based classification svm maximal margin classifiers Citation-Context Based Concept Clustering (CitClus) Cluster mentions into semantic coherent concepts 1.Group concept mentions by citation context 2.Merge clusters based on lexical similarity between mentions in the clusters

Outline Computational Approach  Concept Mention Extraction  Citation-Context based Concept Clustering Evaluation of Algorithms Understanding Computational Linguistic Research 6

Evaluation of Mention Extraction ACL Anthology Network Corpus (Radev et al., 2009) Training data: 11,005 abstracts Test data: 474 abstracts (Gupta and Manning 2011) 7 Approach TechniqueApplication Pre.Rec.F1Pre.Rec.F1 GM Our approach

Evaluation of Concept Clustering Manually cluster the extracted mentions from 1000 full text papers.  CitClus: the proposed approach  LexClus: group the concept mentions by lexical similarity CitClus groups  “maximal entropy classifier” and “logistic classifier”  “topic modeling” and “latent dirichlet allocation” 8 ApproachTechniqueApplication LexClus CitClus

Outline Computational Approach  Concept Mention Extraction  Citation-Context based Concept Clustering Evaluation of Algorithms Understanding Computational Linguistic Research 9

Trends Analysis 10 CitClus LexClus LDA The emergence of SVM The emergence of Topic modeling Topic modeling is high in 90’s, because LDA cannot generate a tight enough cluster for a specific concept

Predictive Quality For a concept, predict the number of papers in a year, given the number of papers in the previous three years Linear regression over every three consecutive years The better the grouping of mentions into coherent concept is, the more stable the trend graph is. 11 ApproachSVMDecision Tree Topic Modeling Sentiment Analysis LexClus CitClus

Relations Between Concept Categories For a given concept, calculate the ratio between number of application mentions and technique mentions. Three concepts in ACL community  Support vector machines, Machine translation, POS tagging 12 SVM, #app/#tech MT, #tech/#app POS tagging, #tech/#app

Relations Between Concept Categories For a given application, what techniques have been applied to it. 13 Machine translation Named entity recognition Phrase-based and MERT Decision Tree Decision Tree disappears CRF

Conclusion This work proposed algorithms for identifying, categorizing and clustering mentions of scientific concepts. These tools can provide rather deep understanding and useful insight of research communities. 14 Named entity recognition