Download presentation
Presentation is loading. Please wait.
Published byLenard Tucker Modified over 9 years ago
1
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20155 Consider following questions What are the key applications studied by the community? What applications have matured enough to be used as a technique of other applications? What methods were developed to solve a particular problem? In this paper We extract concepts (Techniques & Applications) from scientific papers, where, A concept is a cluster of possible mentions (e.g., {svm, support vector machines, maximal margin classifiers,…} We analyze computational linguistic research by answering aforementioned questions. Identify and categorize mentions of concepts (Gupta and Manning, 2011) TECHNIQUE and APPLICATION “We apply support vector machines on text classification.” Unsupervised Bootstrapping algorithm (Yarowsky, 1995; Collins and Singer, 1999) The proposed algorithm 1. Extract noun phrases (Punyakanok and Roth, 2001) 2. For each category, initialize a decision list with seeds. 3. For several rounds, i. Annotate NPs using the decision list. ii. Extract top features from new annotated phrases, and add them into decision list. Cluster mentions into semantic coherent concepts 1. Group concept mentions that share a citation context 2. Merge clusters based on lexical similarity between mentions in the clusters to form the final clustering This paper studies the importance of identifying and categorizing scientific concepts as a way to achieve a deeper understanding of the research literature of a scientific community. To reach this goal, we propose an unsupervised bootstrapping algorithm for identifying and categorizing mentions of concepts. We then propose a new clustering algorithm that uses citations' context as a way to cluster the extracted mentions into coherent concepts. Our evaluation of the algorithms against gold standards shows significant improvement over state-of-the-art results. More importantly, we analyze the computational linguistic literature using the proposed algorithms and show four different ways to summarize and understand the research community which are difficult to obtain using existing techniques. ACL Anthology Network Corpus (Radev et al., 2009) Training data: 11,005 abstracts Test data: 474 abstracts (Gupta and Manning 2011) Approach TechniqueApplication Pre.Rec.F1Pre.Rec.F1 GM 201130.546.736.927.657.537.3 Our approach48.248.848.544.047.345.6 Manually cluster the extracted mentions from 1000 full text papers. LexClus: group the concept mentions by lexical similarity CitClus groups “maximal entropy classifier” and “logistic classifier” “topic modeling” and “latent dirichlet allocation” ApproachTechniqueApplication LexClus1.721.62 CitClus1.281.49 For a given concept, calculate the ratio between number of application mentions and technique mentions. Three concepts in ACL community SVM always serves as a technique, because # technique mentions >> # application mentions Machine Translation is an important application since application mentions >> # technique mentions The rise of POS tagging indicates its maturity. (Shift from an application to a technique) SVM, #app/#tech Machine Translation, #tech/#app POS tagging, #tech/#app For a given application, what techniques have been applied to it, and how does it change with time. Plot trends of 4 concepts in ACL community and compare the trends obtained from 3 different clustering algorithm CitClus: the proposed citation-context based clustering LexClus: only using lexical similarity to do clustering cannot group all possible expressions of a given concept LDA: SVM Topic modeling The curve of topic modeling is already high in 90’s, because LDA cannot generate tight enough clusters the represent specific concepts ApproachSVM Decision Tree Topic Modeling Sentiment Analysis LexClus0.970.830.730.48 CitClus0.520.37 0.46 This work proposed algorithms for identifying, categorizing and clustering mentions of scientific concepts. These tools can provide rather deep understanding and useful insight into research communities. For a concept, predict the number of papers in a year, given the number of papers in the previous three years Linear regression over every three consecutive years Relative errors The better the grouping of mentions into coherent concept is, the more stable the trend graph is. ApproachSVM Decision Tree Topic Modeling Sentiment Analysis LexClus0.970.830.730.48 CitClus0.520.37 0.46 Machine translation Named entity recognition The rise of Phrase- based and MERT Decision Tree was a popular method CRF
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.