The Computational Linguistics Summarization Pilot TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang.

Slides:



Advertisements
Similar presentations
Yansong Feng and Mirella Lapata
Advertisements

How to Create an Annotated Bibliography. What is an annotated bibliography?  It gives an account of the research that has been done on given topic 
Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Automatic summarization Dragomir R. Radev University of Michigan
Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:
Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
A Joint Model of Text and Aspect Ratings for Sentiment Summarization Ivan Titov (University of Illinois) Ryan McDonald (Google Inc.) ACL 2008.
iOpener Workbench: Tools for Rapid Understanding of Scientific Literature Cody Dunne, Ben Shneiderman, Bonnie Dorr & Judith Klavans {cdunne, ben,
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Evaluating Visual and Statistical Exploration of Scientific Literature Networks Robert Gove 1,3, Cody Dunne 1,3, Ben Shneiderman 1,3, Judith Klavans 2,
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.
ACL 2011 Debrief Lin Ziheng 1. Portland 2 Pride parade 3.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
Citances and What should our UI look like? Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from Genentech.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Chris Luszczek Biol2050 week 3 Lecture September 23, 2013.
WING Anthology Project Min-Yen Kan 24 April 2015.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Search and Information Extraction Lab IIIT Hyderabad.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Scientific Paper. Elements Title, Abstract, Introduction, Methods and Materials, Results, Discussion, Literature Cited Title, Abstract, Introduction,
Research Proposal Research Question Background Literature Search –Citations Experiment Expected results Timeline Budget.
Logical Structure Recovery in Scholarly Articles with Rich Document Features Minh-Thang Luong, Thuy Dung Nguyen and Min-Yen Kan.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
ENGL 1020: Interest Areas Cultural Expression Civic Experience Scientific Exploration.
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
Authors: Prabha Yadav, Hoa T Dang, Anita de Waard, Lucy Vanderwende, Kevin B. Cohen Biomed Summarization With Citation Sentences.
What is an Annotated Bibliography? First, what is an annotation?  More than just a brief summary of an article, book, Web site etc.  It combines summary.
27-31 May 2008LREC 2008 (Marrakech, Morocco)1 The ACL ARC Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics.
TREC-CHEM The TREC Chemical IR Track Mihai Lupu 1, John Tait 1, Jimmy Huang 2, Jianhan Zhu 3 1 Information Retrieval Facility 2 York University 3 University.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Which Journal to Publish in and How Barbara Gastel, MD, MPH Professor, Texas A&M University Knowledge Community Editor, AuthorAID.
1 Language, Science and Data Science Kathleen McKeown Department of Computer Science Columbia University.
Blog Summarization We have built a blog summarization system to assist people in getting opinions from the blogs. After identifying topic-relevant sentences,
MedKAT Medical Knowledge Analysis Tool December 2009.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
The Annotated Bibliography MLA Style. What is an Annotated Bibliography? An annotated bibliography is a summary, evaluation, and reflection of each source.
Citation Provenance FYP/Research Update WING Meeting 28 Sept 2012 Heng Low Wee 1/5/
User Interface Design for a Large-Scale Computer Science Research Digital Library Min-Yen Kan Department of Computer Science National University of Singapore.
Ian F. C. Smith Writing a Journal Paper. 2 Disclaimer / Preamble This is mostly opinion. Suggestions are incomplete. There are other strategies. A good.
CiteSearch: Multi-faceted Fusion Approach to Citation Analysis Kiduk Yang and Lokman Meho Web Information Discovery Integrated Tool Laboratory School of.
By Asma Alkhamis. A citation style is used to give the reader immediate information about sources cited in the text. This guide provides an overview of.
Principals of Research Writing. What is Research Writing? Process of communicating your research  Before the fact  Research proposal  After the fact.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Labeling protein-protein interactions Barbara Rosario Marti Hearst Project overview The problem Identifying the interactions between proteins. Labeling.
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
7 Web 2.0 Basics: Free Web Tools to Make Work Easier.
Ontology-Based Argument Mining and Automatic Essay Scoring Nathan Ong, Diane Litman, Alexandra Brusilovsky University of Pittsburgh First Workshop on Argumentation.
Min’s Research Update WING Group Meeting Min’s research direction NL Work at Stanford.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Finding, Reading, & Citing Scientific Papers. Types of scientific literature? Primary literature ~ current, specialized, with data research articles,
Publication Pattern of CA-A Cancer Journal for Clinician Hsin Chen 1 *, Yee-Shuan Lee 2 and Yuh-Shan Ho 1# 1 School of Public Health, Taipei Medical University.
DO NOW: Go to my section of the team web page, and open up the document titled “Sample Essay with MLA Formatting, Signal Phrases, and Parentheticals.
Working with Scholarly Articles
Annotated, Working Bibliography
How to create an effective PowerPoint presentation
What to write and how to write it!
APA STYLE GUIDELINE.
Muthu Kumar Chandrasekaran Kokil Jaidka Philipp Mayr
CSE 635 Multimedia Information Retrieval
MANUSCRIPT WRITING TIPS, TRICKS, & INFORMATION Madison Hedrick, MA
Presentation transcript:

The Computational Linguistics Summarization Pilot TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang Technological University † Dept. of Computer Science, National University of Singapore * Web, IR / NLP Group ‡, National University of Singapore

Scientific Document Summarization I have an abstract. I am done! Photo Credits Dennis JarvisPhoto Credits Dennis Jarvis 2 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Outline Citation based extractive summaries Facetted summaries Automatic literature review CL development corpus Annotation TAC 2015: CL-Summ track Acknowledgements 3 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Scientific Document Summarization: G rowth in # publications. 4 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Scientific Document Summarization Abstracts –Authors’ own summary. Citation summary –Scientific community creates summaries of research papers while they cite a paper but… Facetted summaries – Capture all aspects of a paper. 5 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

TAC Biomedsumm Track - The Computational Linguistics Pilot Task 6 Citation summary & facets Image credits Ken AmmiImage credits Ken

Structured Abstract: Common in Medicine, Biomed, Bioinformatics domains Facetted summaries 7 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Facets & Argumentative zones 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 8

Scientific Document Summarization Citation based extractive summaries Scope of Citation Qazvinian, V., & Radev, D. R. “Identifying non-explicit citing sentences for citation-based summarization” (ACL, 2010) Abu-Jbara, Amjad, and Dragomir Radev. "Reference scope identification in citing sentences.” (ACL, 2012) Coherence Abu-Jbara, Amjad, and Dragomir Radev. "Coherent citation-based summarization of scientific papers.” (ACL 2011) 9 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Scientific Document Summarization & Automatic Literature Review 10 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

11 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Scientific Document Summarization & Automatic Literature Review

Free to access at: 12 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

SciSumm Corpus 10 reference papers or topics randomly sampled from the ACL ARC corpus. Upto 10 citing papers per reference paper including those outside ACL ARC. 13 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Annotation pipeline 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 14 AUTOMA TIC SUM SCI DOC SUMM ……. …… ……. …… ……. …… ……. …… Annotation! Post Processing to Biomedsumm format: 1.Scripts from U. Colorado (Prabha) 2.Sentence segmented version from U.Mich (Rahul) OCR & section parse ParsCit ‘s: SectLabel module

3 annotators in all. Released data has one gold standard annotation per topic or reference paper. Discourse facet has a minor change from Biomedsumm’s categories. Annotating the SciSumm corpus 15 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Task 1A: For each citance, identify the spans of text (cited text spans) in the RP that most accurately reflect the citance Tasks 16 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Reference Paper (RP) Citing papers. Citing text is called citance

Tasks Task 1B: For each cited text span, identify what facet of the paper it belongs to, from a predefined set of facets. 17 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Reference Paper (RP) Mark the cited text in RP and provide its facet. Citing papers. Citing text is called citance

Evaluation Small corpus: 10 fold cross validated evaluation over the 10 documents. Task 1a scored by overlap with citances. Task 1b scored by overlap with reference text spans. TAC Biomedsumm Track - The Computational Linguistics Pilot Task 18

Task & evaluation: highlights First corpus in the CL that incorporates prior research findings on citation based summaries. 10 teams from 5 different countries participated in the evaluation. 19 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Limitations No gold standard summaries yet OCR errors: We hope to have corrected them manually. But mainly, we need more annotated data! 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 20

TAC 2015: CL-Summ shared task Plans to rollout a full-fledged official shared task for the CL corpus. 20 training topics 10 test topics 3 annotations per summary. 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 21

TAC 2015: We need you help! We seek support from –summarization community in general and –CL community in particular to provide manpower for annotating the corpus Great to have all participating teams contribute! 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 22

Acknlowledgements Hoa Dang, NIST Lucy Vanderwende, MSR All Biomedsumm track participants. This research is partially supported by CSIDM 23 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Questions? Thank you!