Topic Extraction From Turkish News Articles Anıl Armağan Fuat Basık Fatih Çalışır Arif Usta.

Slides:



Advertisements
Similar presentations
News and Blog Analysis with Lydia Steven Skiena Dept. of Computer Science SUNY Stony Brook
Advertisements

Seher Acer, Başak Çakar, Elif Demirli, Şadiye Kaptanoğlu.
2015 SLA IT Webinar Using Analytics to Understand Social Media Activity Michelle Chen School of Information San José State University February 4 th, 2015.
1 Multi-topic based Query-oriented Summarization Jie Tang *, Limin Yao #, and Dewei Chen * * Dept. of Computer Science and Technology Tsinghua University.
A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID
Final Project Presentation Name: Samer Al-Khateeb Instructor: Dr. Xiaowei Xu Class: Information Science Principal/ Theory (IFSC 7321) TOPIC MODELING FOR.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.
H YPERLINKING DIGITAL LIBRARIES ON THE WEB Juan Camilo Zapata ITEC – 810 Supervisor Robert Dale 1.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics.
Siemens Big Data Analysis GROUP 3: MARIO MASSAD, MATTHEW TOSCHI, TYLER TRUONG.
Some studies on Vietnamese multi-document summarization and semantic relation extraction Laboratory of Data Mining & Knowledge Science 9/4/20151 Laboratory.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
SaariStory: A framework to represent the medieval history of Saarland Michael Barz, Jonas Hempel, Cornelius Leidinger, Mainack Mondal Course supervisor:
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Assisting cloze test making with a web application Ayako Hoshino ( 星野綾子 ) Hiroshi Nakagawa ( 中川裕志 ) University of Tokyo ( 東京大学 ) Society for Information.
CS 6604 Middle Term Report Computational Linguistics PJ -Explore Correlation between Newswires and Twitter by Tianyu Geng, Wei Huang, Ji Wang, and Xuan.
Custom driven scientific information extraction from digital libraries using integrated text mining services Betim Çiço, Adrian Besimi, Visar Shehu 14th.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu.
Concept-Based Feature Generation and Selection for Information Retrieval OFER EGOZI, EVGENIY GABRILOVICH AND SHAUL MARKOVITCH.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.
Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches John HannonJohn Hannon, Mike Bennett, Barry SmythBarry Smyth.
BioSumm A novel summarizer oriented to biological information Elena Baralis, Alessandro Fiori, Lorenzo Montrucchio Politecnico di Torino Introduction text.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Natural language processing tools Lê Đức Trọng 1.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Author : Stamatina Thomaidou, Konstantinos Leymonis, and Michalis Vazirgiannis.
Automatic Identification of Pro and Con Reasons in Online Reviews Soo-Min Kim and Eduard Hovy USC Information Sciences Institute Proceedings of the COLING/ACL.
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Evgeniy Gabrilovich and Shaul Markovitch
 Goal recap  Implementation  Experimental Results  Conclusion  Questions & Answers.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1.
Do Summaries Help? A Task-Based Evaluation of Multi-Document Summarization Kathleen McKeown, Rebecca J. Passonneau David K. Elson, Ani Nenkova, Julia Hirschberg.
Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.
Link Distribution on Wikipedia [0407]KwangHee Park.
TRANS: T ransportation R esearch A nalysis using N LP Technique S Hyoungtae Cho, Melissa Egan, Ferhan Ture Final Presentation December 9, 2009.
LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source:
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Reading a Newspaper Article An Interactive Guide to Understanding.
Spam Detection Kingsley Okeke Nimrat Virk. Everyone hates spams!! Spam s, also known as junk s, are unwanted s sent to numerous recipients.
Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty.
Text Similarity: an Alternative Way to Search MEDLINE James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami and Harold R. Garner Translational Research.
Link Distribution on Wikipedia [0422]KwangHee Park.
Collaborative Deep Learning for Recommender Systems
Making Sense of Large Volumes of Unstructured Responses K. M. P. N. Jayathilaka Department of Statistics University of Colombo.
Measuring Monolinguality
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Sentiment analysis algorithms and applications: A survey
Final Course Project Presentation
A Deep Learning Technical Paper Recommender System
Power of Social Media Analytics
Title of the presentation
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Multi-Dimensional Data Visualization
Team 7 → Final Presentation
Sentiment Analysis In Student Learning Experience By Obinna Obeleagu
Sentiment Analysis In Student Learning Experience By Obinna Obeleagu
Summarizing Use the following slides in order to organize your understanding of the article. After filling in the graphic organizer, then write your summary.
Presentation transcript:

Topic Extraction From Turkish News Articles Anıl Armağan Fuat Basık Fatih Çalışır Arif Usta

Agenda Introduction Motivation and Goal Topic Extraction and Extraction Based Summarization Defining the Most Important Sentence Work Done Future Work Conclusion

Introduction Increasing Volume of Online Data To be Up to Date Turkish News

Motivation and Goal Topic Extraction, News Summarization, Text Mining Getting Familiar with Text Mining Tools Turkish, as an Agglutinative language A novel system that summarizes Turkish News on daily basis

Topic Extraction and Extraction Based Summarization Summarization Techniques Extraction-Based Abstraction-Based Maximum Entropy Based Summarization Aided Summarization Extraction Based Summarization Topic Extraction LDA Top K Words

Defining Most Important Sentence In extraction based summarization: Combining the extracted topics as summary requires NLP. Therefore, we select the sentence, that represents the document best. Which one is the best?

Defining Most Important Sentence First Step: Find term based importance If the tf-idf value of a term represents importance of a term. Sum tf-idf values of terms in a sentence: Higher the summation, more important the sentence is. Second Step: More attack on sentences Sentences that are at the begining and at the end of documents, Sentences that contains numerical attributes, Are tend to be more important.

Defining Most Important Sentence Third Step: Eliminating junk terms Applying just first and second step, might return a sentence which is too long and all terms contained are junk. Therefore, we will find Top-K words. Eliminate words with respect to them. Apply first and second step after elimination. To find Top-K words: We applied LDA(Latent Dirichlet Allocation), found 100 topics For each topic we selected top 5 words In total we have top 500 words

Work Done Parse the data. Preprocess the data, apply stemming, stop word removal, typo fixing. Used Zemberek. Apply LDA and define top 500 words. Used MALLET.

Future Work Eliminate terms w.r.t top 500 words. Find tf-idf value of each term in the dataset. Find total sum of tf-idf values of terms for each sentence in each document. Define most important sentence in each document. Create a user interface.

Future Work

Conlusion Develop a Novel Summarization System of News Work on Turkish Data