Download presentation
Presentation is loading. Please wait.
Published byHilary Gibbs Modified over 9 years ago
1
The Computational Linguistics Summarization Pilot task @ TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang Technological University † Dept. of Computer Science, National University of Singapore * Web, IR / NLP Group ‡, National University of Singapore
2
Scientific Document Summarization I have an abstract. I am done! Photo Credits Dennis JarvisPhoto Credits Dennis Jarvis @flickr@flickr 2 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015
3
Outline Citation based extractive summaries Facetted summaries Automatic literature review CL development corpus Annotation TAC 2015: CL-Summ track Acknowledgements 3 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015
4
Scientific Document Summarization: G rowth in # publications. 4 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015
5
Scientific Document Summarization Abstracts –Authors’ own summary. Citation summary –Scientific community creates summaries of research papers while they cite a paper but… Facetted summaries – Capture all aspects of a paper. 5 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015
6
TAC Biomedsumm Track - The Computational Linguistics Pilot Task 6 Citation summary & facets Image credits Ken AmmiImage credits Ken Ammi @flickr
7
Structured Abstract: Common in Medicine, Biomed, Bioinformatics domains Facetted summaries 7 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015
8
Facets & Argumentative zones 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 8
9
Scientific Document Summarization Citation based extractive summaries Scope of Citation Qazvinian, V., & Radev, D. R. “Identifying non-explicit citing sentences for citation-based summarization” (ACL, 2010) Abu-Jbara, Amjad, and Dragomir Radev. "Reference scope identification in citing sentences.” (ACL, 2012) Coherence Abu-Jbara, Amjad, and Dragomir Radev. "Coherent citation-based summarization of scientific papers.” (ACL 2011) 9 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015
10
Scientific Document Summarization & Automatic Literature Review 10 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015
11
11 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Scientific Document Summarization & Automatic Literature Review
12
Free to access at: http://acl-arc.comp.nus.edu.sg/ 12 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015
13
SciSumm Corpus 10 reference papers or topics randomly sampled from the ACL ARC corpus. Upto 10 citing papers per reference paper including those outside ACL ARC. 13 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015
14
Annotation pipeline 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 14 AUTOMA TIC SUM SCI DOC SUMM ……. …… ……. …… ……. …… ……. …… Annotation! Post Processing to Biomedsumm format: 1.Scripts from U. Colorado (Prabha) 2.Sentence segmented version from U.Mich (Rahul) OCR & section parse ParsCit ‘s: SectLabel module
15
3 annotators in all. Released data has one gold standard annotation per topic or reference paper. Discourse facet has a minor change from Biomedsumm’s categories. Annotating the SciSumm corpus 15 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015
16
Task 1A: For each citance, identify the spans of text (cited text spans) in the RP that most accurately reflect the citance Tasks 16 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Reference Paper (RP) Citing papers. Citing text is called citance
17
Tasks Task 1B: For each cited text span, identify what facet of the paper it belongs to, from a predefined set of facets. 17 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Reference Paper (RP) Mark the cited text in RP and provide its facet. Citing papers. Citing text is called citance
18
Evaluation Small corpus: 10 fold cross validated evaluation over the 10 documents. Task 1a scored by overlap with citances. Task 1b scored by overlap with reference text spans. TAC Biomedsumm Track - The Computational Linguistics Pilot Task 18
19
Task & evaluation: highlights First corpus in the CL that incorporates prior research findings on citation based summaries. 10 teams from 5 different countries participated in the evaluation. 19 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015
20
Limitations No gold standard summaries yet OCR errors: We hope to have corrected them manually. But mainly, we need more annotated data! 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 20
21
TAC 2015: CL-Summ shared task Plans to rollout a full-fledged official shared task for the CL corpus. 20 training topics 10 test topics 3 annotations per summary. 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 21
22
TAC 2015: We need you help! We seek support from –summarization community in general and –CL community in particular to provide manpower for annotating the corpus Great to have all participating teams contribute! 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 22
23
Acknlowledgements Hoa Dang, NIST Lucy Vanderwende, MSR All Biomedsumm track participants. This research is partially supported by CSIDM 23 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Questions? Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.