Using Social Networking Techniques in Text Mining Document Summarization
Using Social Networking Techniques in Summarization Definition: Text Document Summarization is a task of extraction thematic or topically important sentences from document(s). Points: A traditional Summarization steps can be given as: 1.Identify the signature terms. 2.Rank the sentences in the document or document set based upon their weight. 3.Choose the most highly ranked sentences. Use of Social Networking Techniques: Social Networking based techniques helps in identifying signature terms and ranking the sentences. E.g. 1.Text Rank (Mihalcea and Tarau, 2004) 2.Degree centrality (Erkan and Radev, 2004) 3.LexRank with threshold (Erkan and Radev, 2004) 4.Continuous LexRank (Erkan and Radev, 2004)
Text Rank Figure: Representing Text as Graph
Text Rank
Degree centrality (Erkan and Radev, 2004)
LexRank
Weighted LexRank
Test Your Understanding Point out the major differences between Text Rank and Weighted LexRank ? (Note: just differentiate between ranking schemes) Can you prove the correctness of LexRank’s ranking formula ? (Hint: read/watch video about correctness of page rank based equation)
References Mihalcea, Rada and Paul Tarau TextRank:Bringing order into texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411. Günes Erkan, Dragomir R. Radev: LexPageRank: Prestige in Multi-Document Text Summarization. EMNLP 2004: