Kiran Garimella.  News  Scientific papers  Email  Search Queries  Twitter ◦ Gender ◦ Relationships ◦ Migration ◦ Politics.

Slides:



Advertisements
Similar presentations
Presentation at Society of The Query conference, Amsterdam November 13-14, 2009 (original title: Learning from Google: software design as a methodology.
Advertisements

By Constanza Lermanda G.. Topic: Electronic NewspaperDuration of the lesson: 50 minutesGrade Level: 12th.
Twitter – what is it? The School District of Haverford Township |
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Focused Crawling in Depression Portal Search: A Feasibility Study Thanh Tin Tang (ANU) David Hawking (CSIRO) Nick Craswell (Microsoft) Ramesh Sankaranarayana(ANU)
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
1 I256: Applied Natural Language Processing Marti Hearst Nov 8, 2006.
Scalable Text Mining with Sparse Generative Models
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
IMSS005 Computer Science Seminar
1 The BT Digital Library A case study in intelligent content management Paul Warren
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Chris Luszczek Biol2050 week 3 Lecture September 23, 2013.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Similar Document Search and Recommendation Vidhya Govindaraju, Krishnan Ramanathan HP Labs, Bangalore, India JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
United Nations Economic Commission for Europe Statistical Division Data Initiatives: The UNECE Gender Database and Website Victoria Velkoff On behalf of.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
CSC 594 Topics in AI – Text Mining and Analytics
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Artificial Intelligence Techniques Internet Applications 4.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Ricardo EIto Brun Strasbourg, 5 Nov 2015
Uncovering Social Spammers: Social Honeypots + Machine Learning
Kiran Garimella, Ingmar Weber, and Sonya Dal Cin
Information Organization: Overview
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Lesson 6: Databases and Web Search Engines
Personalized Social Image Recommendation
Search Techniques and Advanced tools for Researchers
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
Lesson 6: Databases and Web Search Engines
PolyAnalyst Web Report Training
Information Organization: Overview
Introduction to Search Engines
Presentation transcript:

Kiran Garimella

 News  Scientific papers   Search Queries  Twitter ◦ Gender ◦ Relationships ◦ Migration ◦ Politics

 I’m a..  Just kidding!

 Link structure  Connected text  Hidden structure/patterns  This talk ◦ Summarizing scientific articles ◦ Political trends from search queries ◦ Romantic relationship breakups on Twitter

 Motivation ◦ Not many existing systems ◦ Completely different from news document summarization ◦ Many topics ◦ Strong citation network ◦ Precise structure  Introduction  Related work  Experiments, etc. 9

10 Irrelevant Sentences Relevant Sentences Categories Aim Own Background Contrast Other New paper Model Categorized Sentences Final Summary Papers

 Manually annotating is a very tedious and difficult job  Final summary depends on the classification accuracy  Summary might depend on the training data 11

 Make use of the strong citation network 12

 Page Rank?

Paper A Paper B X1 X2 X3 X4 X1 X Paper C X1 X5 X7 Citations

search classify Citation 1 Citation 2 Extracted Citation Sentences Topics +ve -ve Summary Sentences from X Paper to be Summarized (X)‏ 15

Contains the negative points of a paper too. Different view points covered. Can be useful to create a survey. Did not work Not many negative statements made Difficult to classify as positive or negative 16

17 Example:

 Split text into sentences?, paragraphs?  Text tiling to the rescue  A technique for automatically subdividing texts into multi-paragraph units that represent passages, or subtopics.

19 Various Machine Learning approaches have been proposed for chunking. (a,b,c,d) Chunking is a widely used technique in Natural language processing. Under the same shallow structure.. Step I – Extract text tiles Step II – Cluster cited papers

20 Various Machine Learning approaches have been proposed for chunking. (a,b,c,d) Step III - Extract keywords from text tiles Step IV – Search for keywords in the clusters obtained in Step II Step V – Rank relevant sentences and present to the user

User Search Paper Viewing Module Search Module Text tiling Module Generate Text Tiles Cluster cited papers Extract Context Clustering Module Rank Sentences Ranking Module Citation Sentences Summary Presentati on Module Link: bin/summarization/summarizer.html Pipeline 21

Left leaning blogs (387)Right leaning blogs (644) From Benkler and Shaw “A tale of two blogospheres” (2010) and Wonkosphere Blog Directory

Use self-provided age and gender and ZIP- derived estimates People clicking on right-leaning blogs: – Are older (50 vs. 45 years) – Are more male (63% vs. 55%) – Are more white (81% vs. 78%) – More likely to study at La Sapienza (92.3% vs. 11.4%) All these trends agree with voters‘ demographics

“huffingtonpost.com” is left-leaning  a left-leaning vote for “pizza is a vegetable” Aggregate votes across all clicks on political blogs to compute overall leaning From Blogs to Queries v L = left-clicks for query V L = total left clicks

Some background first

 Largest known knowledge repository  Covers wide range of domains  Manually tagged hierarchical categorization system  Frequently updated  Well built link structure  Categories ◦ Pages  Links

31

32

Examples using Wikipedia mapping for 6 months of data, July 4, 2011 – January 8, queries for Wikipedia entity “Patient Protection & Affordable Care Act” obama healthcare bill text (.91)who pays for obamacare (.04) obama health care privileges (.83)obamacare reaches the supreme court (.09) is affordable care act unconstitutional (.78) is obamacare constitutional (.16) queries for Wikipedia category “Occupy” who started occupy wall street (.94)occupy wall street rape (.09) we are the 99% (.91)occupy movement violence (.25) occupy movement supporters (.78)crime in occupy movement (.44)

``cost obama trip to india‘‘ Mapping Queries to Statements 364 distinct queries mapped to true facts 574 distinct queries mapped to false facts

 Small pieces of text, which may not give a lot of information, can be enhanced using external knowledge sources.

* Fake profiles 28 hour snapshot of Twitter from July 2013.

Nov 4, 2013 Feb 23, 2014 (BREAKUP) Apr 24, 2014 Tweets, mutual friendships and profile information collected every week. Nov 11, 2013 Nov 25, 2013 Data collected for 24 weeks. ……

Before breakupAfter breakup

Source: After?

Before breakupAfter breakup

 Don’t breakup and fight publicly  Word clouds as an easy source to get an overview

 Use entity extraction on the abstracts.  Co-occurring entities might indicate something.  Create an entity co-occurrence graph.

@gvrkiran