The Computational Linguistics Summarization Pilot TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang.

The Computational Linguistics Summarization Pilot task @ TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang Technological University † Dept. of Computer Science, National University of Singapore * Web, IR / NLP Group ‡, National University of Singapore

Scientific Document Summarization I have an abstract. I am done! Photo Credits Dennis JarvisPhoto Credits Dennis Jarvis @flickr@flickr 2 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Outline Citation based extractive summaries Facetted summaries Automatic literature review CL development corpus Annotation TAC 2015: CL-Summ track Acknowledgements 3 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Scientific Document Summarization: G rowth in # publications. 4 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Scientific Document Summarization Abstracts –Authors’ own summary. Citation summary –Scientific community creates summaries of research papers while they cite a paper but… Facetted summaries – Capture all aspects of a paper. 5 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

TAC Biomedsumm Track - The Computational Linguistics Pilot Task 6 Citation summary & facets Image credits Ken AmmiImage credits Ken Ammi @flickr

Structured Abstract: Common in Medicine, Biomed, Bioinformatics domains Facetted summaries 7 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Facets & Argumentative zones 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 8

Scientific Document Summarization Citation based extractive summaries Scope of Citation Qazvinian, V., & Radev, D. R. “Identifying non-explicit citing sentences for citation-based summarization” (ACL, 2010) Abu-Jbara, Amjad, and Dragomir Radev. "Reference scope identification in citing sentences.” (ACL, 2012) Coherence Abu-Jbara, Amjad, and Dragomir Radev. "Coherent citation-based summarization of scientific papers.” (ACL 2011) 9 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Scientific Document Summarization & Automatic Literature Review 10 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

11 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Scientific Document Summarization & Automatic Literature Review

Free to access at: http://acl-arc.comp.nus.edu.sg/ 12 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

SciSumm Corpus 10 reference papers or topics randomly sampled from the ACL ARC corpus. Upto 10 citing papers per reference paper including those outside ACL ARC. 13 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Annotation pipeline 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 14 AUTOMA TIC SUM SCI DOC SUMM ……. …… ……. …… ……. …… ……. …… Annotation! Post Processing to Biomedsumm format: 1.Scripts from U. Colorado (Prabha) 2.Sentence segmented version from U.Mich (Rahul) OCR & section parse ParsCit ‘s: SectLabel module

3 annotators in all. Released data has one gold standard annotation per topic or reference paper. Discourse facet has a minor change from Biomedsumm’s categories. Annotating the SciSumm corpus 15 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Task 1A: For each citance, identify the spans of text (cited text spans) in the RP that most accurately reflect the citance Tasks 16 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Reference Paper (RP) Citing papers. Citing text is called citance

Tasks Task 1B: For each cited text span, identify what facet of the paper it belongs to, from a predefined set of facets. 17 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Reference Paper (RP) Mark the cited text in RP and provide its facet. Citing papers. Citing text is called citance

Evaluation Small corpus: 10 fold cross validated evaluation over the 10 documents. Task 1a scored by overlap with citances. Task 1b scored by overlap with reference text spans. TAC Biomedsumm Track - The Computational Linguistics Pilot Task 18

Task & evaluation: highlights First corpus in the CL that incorporates prior research findings on citation based summaries. 10 teams from 5 different countries participated in the evaluation. 19 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

Limitations No gold standard summaries yet OCR errors: We hope to have corrected them manually. But mainly, we need more annotated data! 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 20

TAC 2015: CL-Summ shared task Plans to rollout a full-fledged official shared task for the CL corpus. 20 training topics 10 test topics 3 annotations per summary. 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 21

TAC 2015: We need you help! We seek support from –summarization community in general and –CL community in particular to provide manpower for annotating the corpus Great to have all participating teams contribute! 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 22

Acknlowledgements Hoa Dang, NIST Lucy Vanderwende, MSR All Biomedsumm track participants. This research is partially supported by CSIDM 23 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Questions? Thank you!

The Computational Linguistics Summarization Pilot TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang.

Similar presentations

Presentation on theme: "The Computational Linguistics Summarization Pilot TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Computational Linguistics Summarization Pilot TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang.

Similar presentations

Presentation on theme: "The Computational Linguistics Summarization Pilot TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang."— Presentation transcript:

Similar presentations

About project

Feedback