This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Slides:



Advertisements
Similar presentations
Speed dating Classification What you should know about dating Stephen Cohen Rajesh Ranganath Te Thamrongrattanarit.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Supervised Learning Recap
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Author : Zhen Hai, Kuiyu Chang, Gao Cong Source : CIKM’12 Speaker : Wei Chang Advisor : Prof. Jia-Ling Koh ONE SEED TO FIND THEM ALL: MINING OPINION FEATURES.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Joint Sentiment/Topic Model for Sentiment Analysis Chenghua Lin & Yulan He CIKM09.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Distributed Representations of Sentences and Documents
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Information Retrieval in Practice
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
A Language Independent Method for Question Classification COLING 2004.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
CSC 594 Topics in AI – Text Mining and Analytics
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Data Mining and Text Mining. The Standard Data Mining process.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Sentiment analysis algorithms and applications: A survey
School of Computer Science & Engineering
On Dataless Hierarchical Text Classification
Vincent Granville, Ph.D. Co-Founder, DSC
Machine Learning Week 1.
CSE 635 Multimedia Information Retrieval
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Clinical Data
KnowItAll and TextRunner
Presentation transcript:

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20155  Consider following questions  What are the key applications studied by the community?  What applications have matured enough to be used as a technique of other applications?  What methods were developed to solve a particular problem?  In this paper  We extract concepts (Techniques & Applications) from scientific papers, where,  A concept is a cluster of possible mentions (e.g., {svm, support vector machines, maximal margin classifiers,…}  We analyze computational linguistic research by answering aforementioned questions.  Identify and categorize mentions of concepts (Gupta and Manning, 2011)  TECHNIQUE and APPLICATION “We apply support vector machines on text classification.”  Unsupervised Bootstrapping algorithm (Yarowsky, 1995; Collins and Singer, 1999)  The proposed algorithm 1. Extract noun phrases (Punyakanok and Roth, 2001) 2. For each category, initialize a decision list with seeds. 3. For several rounds, i. Annotate NPs using the decision list. ii. Extract top features from new annotated phrases, and add them into decision list.  Cluster mentions into semantic coherent concepts 1. Group concept mentions that share a citation context 2. Merge clusters based on lexical similarity between mentions in the clusters to form the final clustering This paper studies the importance of identifying and categorizing scientific concepts as a way to achieve a deeper understanding of the research literature of a scientific community. To reach this goal, we propose an unsupervised bootstrapping algorithm for identifying and categorizing mentions of concepts. We then propose a new clustering algorithm that uses citations' context as a way to cluster the extracted mentions into coherent concepts. Our evaluation of the algorithms against gold standards shows significant improvement over state-of-the-art results. More importantly, we analyze the computational linguistic literature using the proposed algorithms and show four different ways to summarize and understand the research community which are difficult to obtain using existing techniques.  ACL Anthology Network Corpus (Radev et al., 2009)  Training data: 11,005 abstracts  Test data: 474 abstracts (Gupta and Manning 2011) Approach TechniqueApplication Pre.Rec.F1Pre.Rec.F1 GM Our approach  Manually cluster the extracted mentions from 1000 full text papers.  LexClus: group the concept mentions by lexical similarity  CitClus groups  “maximal entropy classifier” and “logistic classifier”  “topic modeling” and “latent dirichlet allocation” ApproachTechniqueApplication LexClus CitClus  For a given concept, calculate the ratio between number of application mentions and technique mentions.  Three concepts in ACL community  SVM always serves as a technique, because # technique mentions >> # application mentions  Machine Translation is an important application since application mentions >> # technique mentions  The rise of POS tagging indicates its maturity. (Shift from an application to a technique) SVM, #app/#tech Machine Translation, #tech/#app POS tagging, #tech/#app  For a given application, what techniques have been applied to it, and how does it change with time.  Plot trends of 4 concepts in ACL community and compare the trends obtained from 3 different clustering algorithm  CitClus: the proposed citation-context based clustering  LexClus: only using lexical similarity to do clustering cannot group all possible expressions of a given concept  LDA: SVM Topic modeling The curve of topic modeling is already high in 90’s, because LDA cannot generate tight enough clusters the represent specific concepts ApproachSVM Decision Tree Topic Modeling Sentiment Analysis LexClus CitClus  This work proposed algorithms for identifying, categorizing and clustering mentions of scientific concepts.  These tools can provide rather deep understanding and useful insight into research communities.  For a concept, predict the number of papers in a year, given the number of papers in the previous three years  Linear regression over every three consecutive years  Relative errors  The better the grouping of mentions into coherent concept is, the more stable the trend graph is. ApproachSVM Decision Tree Topic Modeling Sentiment Analysis LexClus CitClus Machine translation Named entity recognition The rise of Phrase- based and MERT Decision Tree was a popular method CRF