CLSciSumm-2018 What to submit Task Framework Task 1A Task 1B

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Florida International University COP 4770 Introduction of Weka.
AIME03, Oct 21, 2003 Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Presentor:
Evaluation of Decision Forests on Text Categorization
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.
Christine Preisach, Steffen Rendle and Lars Schmidt- Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Germany Relational.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.
Ensemble Learning: An Introduction
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Who would be a good loanee? Zheyun Feng 7/17/2015.
SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
Web-page Classification through Summarization D. Shen, *Z. Chen, **Q Yang, *H.J. Zeng, *B.Y. Zhang, Y.H. Lu and *W.Y. Ma TsingHua University, *Microsoft.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Event-Centric Summary Generation Lucy Vanderwende, Michele Banko and Arul Menezes One Microsoft Way, WA, USA DUC 2004.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
TRANS: T ransportation R esearch A nalysis using N LP Technique S Hyoungtae Cho, Melissa Egan, Ferhan Ture Final Presentation December 9, 2009.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Kaggle Competition Rossmann Store Sales.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Extractive Summarisation via Sentence Removal: Condensing Relevant Sentences into a Short Summary Marco Bonzanini, Miguel Martinez-Alvarez, and Thomas.
Team: flyingsky Reporter: YanJie Fu & ChuanRen Liu Institution: Chinese Academy of Sciences.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Experience Report: System Log Analysis for Anomaly Detection
Recommendation in Scholarly Big Data
Detecting Semantic Concepts In Consumer Videos Using Audio Junwei Liang, Qin Jin, Xixi He, Gang Yang, Jieping Xu, Xirong Li Multimedia Computing Lab,
Advanced data mining with TagHelper and Weka
A Deep Learning Technical Paper Recommender System
Basic machine learning background with Python scikit-learn
Asymmetric Gradient Boosting with Application to Spam Filtering
Machine Learning Week 1.
CIKM Competition 2014 Second Place Solution
CIKM Competition 2014 Second Place Solution
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
Instance Based Learning
Learning Literature Search Models from Citation Behavior
SVM Based Learning System for F-term Patent Classification
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Junheng, Shengming, Yunsheng 11/09/2018
Unsupervised Machine Learning: Clustering Assignment
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

NJUST @ CLSciSumm-2018 What to submit Task Framework Task 1A Task 1B Shutian Ma; Heng Zhang; Jin Xu; Chengzhi Zhang Department of Information Management, Nanjing University of Science and Technology, Nanjing, China, 210094 What to submit Task Framework Voting Weights of Precision, Recall and F1-Oriented 3-Classifiers System Voting System Classifiers Voting Weight Precision- oriented SVM (RBF) 0.3116 Recall- oriented 0.2192 F1- oriented 0.2565 DT 0.2617 0.5236 0.4233 LG 0.4268 0.2572 0.3202 For Task 1A, Multi-classifiers using Voting System, another 10 running results based on single classifiers. New Classifier: XGBOOST, an efficient and scalable implementation of gradient boosting framework Task 1A Feature Selection Decision tree, Logistic regression, SVM (Linear, RBF), XGBOOST Multi-Classifiers + Voting systems Average F1 when using Precision-Oriented 3-Classifiers Voting System Average F1 when using Precision-Oriented 4-Classifiers Voting System Task 1B Examples when Utilizing Rules to Expand Labeled Citation Text Manual Dictionary LLDA XGBOOST Manual dictionary + LLDA POS Dictionary + LLDA Average F1 when using Recall-Oriented 3-Classifiers Voting System Average F1 when using Recall-Oriented 4-Classifiers Voting System LLDA Assume each identified facet is a topic label and that each citation sentence is a mixture of the expert-assigned topics that can be learned. +LLDA strategy Use dictionary-labeled testing data to be testing data for prediction. Dictionary strategy Find the best order of judging facets POS dictionary POS results are VB and JJ, meet frequency threshold. Fixed Features + Selected Features Iteratively evaluate a candidate subset of selected features set Task 2 Average F1 when using F1-Oriented 3-Classifiers Voting System Average F1 when using F1-Oriented 4-Classifiers Voting System Group sentences into clusters based on its similarity with different parts of abstract. Extract sentence based on score from each cluster and combine into summary. Average F1 when #Negative/#Positive is 1 Average F1 when #Negative/#Positive is 2 Average F1 when #Negative/#Positive is 3 For Task 1B, we select the specific order according to F1. Manual Dictionary, we choose top 3. LLDA strategy, we pick the top 4. XGBOOST strategy, we also select top 4. People write summaries starting with some fixed phrases, such as “this paper”, “in this paper” or “we”. Meanwhile, the last sentence are usually about results or conclusions. Average F1 when #Negative/#Positive is 4 Average F1 when #Negative/#Positive is 5 Average F1 when #Negative/#Positive is 6 Acknowledgement 𝑆𝑐𝑜𝑟𝑒 𝑖 =2.5 𝑆 𝐽𝑎𝑐𝑐𝑎𝑟𝑑 +2.5 𝑆 𝐼𝐷𝐹 +2.5 𝑆 𝑇𝐹𝐼𝐷𝐹 +1.25 𝑆 𝐿𝑒𝑛𝑔𝑡ℎ +1.25 𝑆 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 Deepest gratitude goes to my greatest teammate. Voting Weights of Precision, Recall and F1-Oriented 4-Classifiers System Voting System Classifiers Voting Weight Precision-oriented SVM (Linear) 0.2160 Recall- oriented 0.0699 F1- oriented 0.0954 SVM (RBF) 0.2443 0.2039 0.2320 DT 0.2051 0.4870 0.3829 LG 0.3346 0.2392 0.2897 Tools Contact Information Shutian Ma: mashutian0608@hotmail.com Heng Zhang: 525696532@qq.com Jin Xu: xujin@njust.edu.cn Chengzhi Zhang: zhangcz@njust.edu.cn Porter Stemmer Gensim, Word2Vec and Doc2Vec model Scikit-learn, Classifiers LDA and XGBOOST are applied with Python package