Presentation by: ABHISHEK KAMAT ABHISHEK MADHUSUDHAN SUYAMEENDRA WADKI

Slides:



Advertisements
Similar presentations
WEB MINING. Why IR ? Research & Fun
Advertisements

Chapter 5: Introduction to Information Retrieval
Improved TF-IDF Ranker
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Text Mining with Machine Learning.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Recommender systems Ram Akella November 26 th 2008.
Scalable Text Mining with Sparse Generative Models
Information Retrieval
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
An Effective Fuzzy Clustering Algorithm for Web Document Classification: A Case Study in Cultural Content Mining Nils Murrugarra.
Text mining.
IR Systems and Web Search By Sri Harsha Tumuluri (UNI: st2653)
The identification of interesting web sites Presented by Xiaoshu Cai.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
1 CS 430: Information Discovery Lecture 9 Term Weighting and Ranking.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Chapter 6: Information Retrieval and Web Search
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A Ralph Grishman NYU.
Introduction to String Kernels Blaz Fortuna JSI, Slovenija.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
1 Data Mining: Text Mining. 2 Information Retrieval Techniques Index Terms (Attribute) Selection: Stop list Word stem Index terms weighting methods Terms.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Spam Detection Kingsley Okeke Nimrat Virk. Everyone hates spams!! Spam s, also known as junk s, are unwanted s sent to numerous recipients.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
3: Search & retrieval: Structures. The dog stopped attacking the cat, that lived in U.S.A. collection corpus database web d1…..d n docs processed term-doc.
Automated Information Retrieval
CSCE 590 Web Scraping – Information Extraction II
Queensland University of Technology
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
A Straightforward Author Profiling Approach in MapReduce
Clustering of Web pages
Queries Over Graph Data: Presidential Election
Map Reduce.
MID-SEM REVIEW.
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Multi-Dimensional Data Visualization
Information Retrieval
Presented by: Prof. Ali Jaoua
From frequency to meaning: vector space models of semantics
Chapter 5: Information Retrieval and Web Search
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
From Unstructured Text to StructureD Data
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Presentation by: ABHISHEK KAMAT ABHISHEK MADHUSUDHAN SUYAMEENDRA WADKI Using text mining on State of the union addresses to gain political insights Presentation by: ABHISHEK KAMAT ABHISHEK MADHUSUDHAN SUYAMEENDRA WADKI

Introduction Mining the data to find interesting patterns, useful insights, customer data and their relationship - data mining Text mining - aims at finding useful insights from the dataset comprised of text data. Examples: Sentiment analysis. Google : Search engine Facebook, Instagram : Hashtags Text mining on State of the union addresses to gain political insights Project findings (trends and issues) on interactive dashboards.

Big Problem Text mining involves writing programs to analyze the text data to retrieve something useful from the data Approaches: Bag of Words: Uses the entire collection of words that constitute the text to determine the sentiment TFIDIF: Calculates the word frequency that is relative to the total word count of the document. (Except stop words)

Small Problem Using text mining algorithms to extract the political insights from the presidential addresses of “State of the Union” speeches of every president since 1790. Project these insights and trends in interactive dashboards. Find a correlation between the most frequent words that appear in the presidential state of the union addresses and the trends in the issues facing our country. Emphasis on a particular word in a speech, implies some important trend or issue in that year.

Proposed Solution No ready dataset that we can use. Python scraper using library “Beautiful Soup”(scraping the state of the union website) Clean the data Hadoop’s map reduce platform Determines the word frequency of each word per year. Divides the entire data into key value pairs We use this information to deduce the trend of topics in that year’s presidential state of the union address.

Proposed Solution - Dashboards Store the mined data in a database and then project it on to various dashboards. Planning on D3.js or Chart JS Few plans to implement dashboards: Changes in trends between two presidents who served consecutively. Change of trends in a single president’s entire term. Determine major trends over a period of time

Data analysis and experimental work plan to evaluate the proposed solution No dedicated training and test set. Measuring effectiveness - comparing the results of our model with the major events in the history Ex: 9/11 attack of 2001 Mr. Donald J Trump. Speeches - See the trends related to borders, security, wall, Mexicans, Muslims etc These aspects show how well the dashboards reflect these results.

Related Work Dimensions and features. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Dimensions and features. Beyond TFIDF Weighting for Text Categorization in the Vector Space Model. How to weigh the word ? An Improved Feature Space for Sentiment Analysis. congressional bill - approvals.

Related Work (Contd…) Stemming and its effects on TFIDF Ranking. Why stemming ? Word Isolation. Refinement of TF-IDF Schemes for Web Pages using their Hyperlinked Neighboring Pages. Better classification. An improved TF-IDF approach for text classification Confidence, language independent.

Conclusion How would text mining algorithms in extracting the political insights help ? Who would use them Journalists, politicians.