CS 6604 Middle Term Report Computational Linguistics PJ -Explore Correlation between Newswires and Twitter by Tianyu Geng, Wei Huang, Ji Wang, and Xuan.

Slides:



Advertisements
Similar presentations
News and Blog Analysis with Lydia Steven Skiena Dept. of Computer Science SUNY Stony Brook
Advertisements

Product Review Summarization Ly Duy Khang. Outline 1.Motivation 2.Problem statement 3.Related works 4.Baseline 5.Discussion.
WEB MINING. Why IR ? Research & Fun
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Integrated Digital Event Web Archive and Library (IDEAL) and Aid for Curators Archive-It Partner Meeting Montgomery, Alabama Mohamed Farag & Prashant Chandrasekar.
SEM II : Marketing Research
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Information | Analytics | Expertise SOCIAL MEDIA INTELLIGENCE Practical Strategies for Using Social Media to Enhance Security AUGUST 2014 © 2014 IHS IHS.
Topic Extraction From Turkish News Articles Anıl Armağan Fuat Basık Fatih Çalışır Arif Usta.
IVITA Workshop Summary Session 1: interactive text analytics (Session chair: Professor Huamin Qu) a) HARVEST: An Intelligent Visual Analytic Tool for the.
WEBQUEST Let’s Begin TITLE AUTHOR:. Let’s continue Return Home Introduction Task Process Conclusion Evaluation Teacher Page Credits This document should.
CIS392Semester Projects1 CIS392 Text Processing, Retrieval, and Mining Overview of Semester Projects.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
TwitterSearch : A Comparison of Microblog Search and Web Search
Some studies on Vietnamese multi-document summarization and semantic relation extraction Laboratory of Data Mining & Knowledge Science 9/4/20151 Laboratory.
Rui Yan, Yan Zhang Peking University
Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Emerging Topic Detection on Twitter (Cataldi et al., MDMKDD 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences
THE NEW INTELLIGENT HOME CONTROL SYSTEM BASED ON THE DYNAMIC AND INTELLIGENT GATEWAY Beijing Key Laboratory of Intelligent Telecommunications Software.
Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Put the Lesson Title Here A webquest for xth grade Designed by Put your You may include graphics, a movie, or sound to any of the slides. Introduction.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
University of Economics Prague Information Extraction (WP6) Martin Labský MedIEQ meeting Helsinki, 24th October 2006.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Comparative Text Mining Q. Mei, C. Liu, H. Su, A. Velivelli, B. Yu, C. Zhai DAIS The Database and Information Systems Laboratory. at The University of.
CPE 432: Computer Design Research Project.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
MODEL ADAPTATION FOR PERSONALIZED OPINION ANALYSIS MOHAMMAD AL BONI KEIRA ZHOU.
Augmenting Focused Crawling using Search Engine Queries Wang Xuan 10th Nov 2006.
Politics and Social media: The Political Blogosphere and the 2004 U.S. election: Divided They Blog Crystal: Analyzing Predictive Opinions on the Web Swapna.
ICT-enabled Agricultural Science for Development Scenarios, Opportunities, Issues by ICTs transforming agricultural science, research & technology generation.
Shoveling tweets: An analysis of the microblogging engagement of traditional news organizations Marcus Messner Maureen Linke Asriel Eford School of Mass.
Using Linguistic Analysis and Classification Techniques to Identify Ingroup and Outgroup Messages in the Enron Corpus.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
A RESEARCH SUPPORT SYSTEM FRAMEWORK FOR WEB DATA MINING Jin Xu, Yingping Huang, Gregory Madey Department of Computer Science and Engineering University.
How to Write an Abstract Gwendolyn MacNairn Computer Science Librarian.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
1 Researchers : 10 altogether; 3 from Denmark, 2 from Finland, 2 from Norway, and 3 from Sweden. Scientific board: professor Antero Malin, Finland (project.
Alan P. Janssen, MSPH Health Communication Specialist National Centers for Immunization and Respiratory Diseases Centers for Disease Control and Prevention.
Communicating Crisis & Risk Messages that Motivate Resilience Brooke Liu, Ph.D. Associate Professor of Communication & Director of the Risk Communication.
Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty.
Central Idea and Objective Summary. Central “Main” Idea and Detail Main Idea- the topic and controlling point of a paragraph; what the paragraph is about.
Using Social Media to Enhance Emergency Situation Awareness
Market Intelligence Analysis
Community Leaders Toolkit: Contents Preview
Memory Standardization
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
中国计算机学会学科前沿讲习班:信息检索 Course Overview
Central Idea and Objective Summary
Computational Linguistics PJ
Restrict Range of Data Collection for Topic Trend Detection
The Team Ernesto Cortes Kipp Dunn Sar Gregorczyk Alex Schmidt
NRV Tweets Midterm Presentation VT CS4624, Blacksburg, VA
NETWORK-BASED MODEL OF LEARNING
Computational Imaging and Display Project Title
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
A Network Science Approach to Fake News Detection on Social Media
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
PURE Learning Plan Richard Lee, James Chen,.
Sentiment Analysis In Student Learning Experience By Obinna Obeleagu
Sentiment Analysis In Student Learning Experience By Obinna Obeleagu
CS565: Intelligent Systems and Interfaces
Put the Lesson Title Here
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Presentation transcript:

CS 6604 Middle Term Report Computational Linguistics PJ -Explore Correlation between Newswires and Twitter by Tianyu Geng, Wei Huang, Ji Wang, and Xuan Zhang March, 6th, 2014 Blacksburg, VA Client:Mohamed Magdy Farag

●Introduction ●Solution ●Progress Table of Contents

1. Summarize info in news and tweets 2. Explore correlation between news & tweets 3. Mine opinions in tweets Problem to Solve Mainstream NewsTweets Key points in news? Relation between news & tweets? Major attitudes of audience? Motivation: Much news, much tweets, little connection… Objects:

1.Fetch text from news & tweets respectively 2.Preprocess texts: stemming, stop-word… 3.Extract events from news Event: [Topic, Named entities(who, what, where, when)] 4.Map tweets to events (correlation model) 5.Mine major opinions around events Solution Overview

Who What When Where Topic Event 2 Solution: Link Tweets to Events Who What When Where Topic Event 1 Pre- processor LDA NER Pre- processor NER Correlation analyzer

Progress: Text Extraction

Progress: Tweets Analysis Topic 0 PKeyword 0.028nuclear 0.018weapons 0.015iranian 0.011stop 0.010menlu iran 0.008executions Topic 1 PKeyword 0.027iran 0.019time 0.015now 0.013opposition 0.012u 0.012feb 0.011seeking Topic 2 PKeyword 0.018iran 0.017via help 0.011killed 0.010sanctions 0.009syria 0.009guards Topic 3 PKeyword 0.054camp 0.053liberty 0.020never 0.015martin 0.014kobler 0.013mr 0.012adequate We ran LDA on a sample of the #Iran collection from IDEAL ●50,000 tweets ●Feb 13, :58:30 ~ Feb 15, :00:03 ●4 topics

Progress: News Analysis Iran Sanction Obama Who: Obama warn When: N/A Where: Iran Event 1 UN urge human Who: UN right What: N/A When: N/A Where: Iran Event 2 News Articles Pre- processor LDA NER HTML Parser Text Extractor Iran What: N/A Dataset: 1) 2762 news about “Iran Election”. --Only news titles used for topic modeling 2) News articles from CTRnet PJ Tools: GibbsLDA, Stanford NLP

Future Work: Opinion Mining Who What When Where Topics Event 2 Who What When Where Topics Event 1 Opinion 1 Opinion 2 … Opinion N Event 1 Opinion 1 Opinion 2 … Opinion M Event 2 Opinion Mining

Appendix

1.Analysis between Tweets and News Articles ●Fact: News providers report events earlier, but Twitter contains more details ●Algorithm: LDA, cosine similarity, sentiment analysis 1.Summary based on Templates ●Systems: SUMMARIST, Artequakt, etc. ●Topic signature is used for selecting summarizing sentences ●Using Apple Pie Parser, GATE and WordNet for knowledge extraction Literature Review

The Characteristics of HTML

[1] Petrovic S, Osborne M, McCreadie R, et al. Can twitter replace newswire for breaking news[C]//Seventh International AAAI Conference on Weblogs and Social Media [2] Balahur A, Tanev H. Detecting Event-Related Links and Sentiments from Social Media Texts[J]. ACL 2013, 2013: 25. [3] Lobzhanidze A, Zeng W, Gentry P, et al. Mainstream media vs. social media for trending topic prediction-an experimental study[C]//Consumer Communications and Networking Conference (CCNC), 2013 IEEE. IEEE, 2013: [4] Introne J E, Drescher M. Analyzing the flow of knowledge in computer mediated teams[C]//Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 2013: [5] Hovy E, Lin C Y. Automated text summarization and the SUMMARIST system[C]//Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, Association for Computational Linguistics, 1998: Reference

[6] Lin C Y, Hovy E. The automated acquisition of topic signatures for text summarization[C]//Proceedings of the 18th conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 2000: [7] Luhn H P. The automatic creation of literature abstracts[J]. IBM Journal of research and development, 1958, 2(2): [8] Alani H, Kim S, Millard D E, et al. Automatic ontology-based knowledge extraction from web documents[J]. Intelligent Systems, IEEE, 2003, 18(1): [9] El Hamali S, Nouali O, Nouali-Taboudjemat N. Knowledge extraction by Internet monitoring to enhance crisis management[R]. CERIST, Reference