Resurrecting My Revolution Using Social Link Neighborhood in Bringing Context to the Disappearing Web Hany SalahEldeen & Michael Nelson Resurrecting My.

Slides:



Advertisements
Similar presentations
December 3, 2009 Alan Knecht President/Founder Knechtology Inc. Search Engine Optimization & PR.
Advertisements

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
Presented By: Omofonmwan Nelson. Agenda:  Twitter  Benefits of Twitter  Tweet  Tweeter Services  Geographical Distribution  Conclusion.
How to make the most of your website: It’s one of your best marketing, branding, awareness tools.
1 Working with Social Media in Research Settings Victoria Wade Careers Consultant.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Skills: use common abbreviations, shorten URLs, writing tweets, use #hashtag search Twitter Concepts: application program interface (API),
Eric Sieverts University Library Utrecht IT Department Institute for Media & Information Management (Hogeschool van Amsterdam)
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Information persistence on the Web Judit Bar-Ilan The Hebrew University and Bar-Ilan University and Bluma Peritz The Hebrew University.
Social Networking: Adoption and Impact on Libraries and Information Centers By Kristin Falls, Jonathan Gazdecki, Shannon McDermitt, and Catherine Sossi.
GETTING BUTTS INTO THE SEATS. SOCIAL MEDIA FACTS As of tomorrow Facebook will be 10 years old and has an estimated 1.3 BILLION users Facebook StatisticsData.
2015 Hany SalahEldeen Dissertation Defense1 Detecting, Modeling, & Predicting User Temporal Intention in Social Media Hany M. SalahEldeen Doctor of Philosophy.
Search Engines and their Public Interfaces: Which APIs are the Most Synchronized? Frank McCown and Michael L. Nelson Department of Computer Science, Old.
What is SEO? Making your site’s content easy to find through external search engines such as Google, Yahoo! and Bing.
How to make the most of your website: It’s one of your best marketing, branding, awareness tools.
An Introduction to Content Management. By the end of the session you will be able to... Explain what a content management system is Apply the principles.
TwitterSearch : A Comparison of Microblog Search and Web Search
Tweet, Tweet, Tweet… Tweeting Assignments & Discussions Kara Damm, Technology Integration Specialist.
. Social Media for Social Good. Where Are We Now? Does Your Club Have a Website? Does Your Club Have a Facebook Page? Do you have a personal Facebook.
#TweetElizabeth Elizabeth Rose Twitter.com/1ElizabethRose.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Scott Ainsworth, Ahmed AlSum, Hany SalahEldeen, Michele C. Weigle, Michael L. Nelson Old Dominion University, USA {sainswor, aalsum, hany, mweigle,
# FollowMe! A Guide on How to Use Twitter By Amanda L. Walker, M.A. Recruiting Coordinator.
Ian Reeves. A few facebook facts  1bn+ active global users  250m+ mobile users  31m UK users  130 friends on average per user  Average user creates.
Marketing Yourself Using Social Media Mandy Gross Communications Services Manager.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Microblogs: Information and Social Network Huang Yuxin.
Twitter & Election 2012 By Katherine Johnson. Romney vs. Obama Followers: 1,186,658 Tweets: 1,184 Retweets: 3,628 Followers:20,192,254 Tweets: 6,343 Retweets:
Content Curation Scott Stevens This work is licensed under a Creative CommonsAttribution-ShareAlike 4.0 International License.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Jargon Busters Presented by Katie Munton and Natalie Dawson.
Social Media 101 An Overview of Social Media Basics.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
SEO Who knew 3 letters could mean so much?. What is SEO? Search Engine Optimization (SEO) is the practice of improving and promoting a web site in order.
Evaluation of the NSDL and Google for Obtaining Pedagogical Resources Frank McCown, Johan Bollen, and Michael L. Nelson Old Dominion University Computer.
-- Martin Klein & Michael L. Nelson Old Dominion University.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
Online Marketing. Types Marketing Link Building Content Marketing Search Engine Optimization(SEO) Social Media Marketing Advertising.
1 FollowMyLink Individual APT Presentation First Talk February 2006.
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
Introduction to Digital Libraries Assignment #1 Old Dominion University Department of Computer Science CS 751/851 Spring 2015 Michael L. Nelson 01/22/15.
1 Atom Syndication Format CS791/891 Presented by: George Arnaout Chaitanyakrishna Telukuntla.
SOCIAL MEDIA PLATFORM Best ones too use for Acumen animation studios.
14. June 2016 Mapping democracy Indira Ishmurzina
Marketing Data: 50+ Charts & Graphs
Social Media Marketing!
Marketing Data: 101 Charts & Graphs
PIWIK JUNIOR TIDAL ASSOCIATE PROF., WEB SERVICES & MULTIMEDIA LIBRARIAN NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY.
Internet Searching: Finding Quality Information
Library Web Portals: Reinventing Libraries for the Future
by Jakob Gray, and Sara Inglis, Jerry Sun
Introduction to Data Programming
Agreeing to Disagree: Search Engines and Their Public Interfaces
Eric Sieverts University Library Utrecht Institute for Media &
Using and Understanding Social Media
Improving DevOps and QA efficiency using machine learning and NLP methods Omer Sagi May 2018.
Searching for Truth: Locating Information on the WWW
Characterization of Search Engine Caches
BIO1130 Lab 2 Scientific literature
Search and Retrieval in a Virtual World
Maximizing Your Social Media
Connecting the Dots Between News Article
Presentation transcript:

Resurrecting My Revolution Using Social Link Neighborhood in Bringing Context to the Disappearing Web Hany SalahEldeen & Michael Nelson Resurrecting My Revolution TPDL 2013 Hany M. SalahEldeen & Michael L. Nelson Old Dominion University Department of Computer Science Web Science and Digital Libraries Lab. TPDL 2013

From Twitter, Websites, Books: The Egyptian revolution. From Twitter Only: Stanford’s SNAP dataset: Iranian elections. H1N1 virus outbreak. Michael Jackson’s death. Obama’s Nobel Peace Prize. Twitter API: The Syrian uprising. Six Socially Significant Events Hany SalahEldeen & Michael Nelson 02 Resurrecting My Revolution TPDL 2013

Social Events Having a Bimodal Time Distribution Hany SalahEldeen & Michael Nelson 03 Resurrecting My Revolution TPDL 2013

Resources Missing & Archived Hany SalahEldeen & Michael Nelson 04 Resurrecting My Revolution TPDL 2013 CollectionPercentage MissingPercentage Archived 23.49% H1N1 Outbreak 41.65% 36.24% Michael Jackson 39.45% 26.98% Iran 43.08% 24.59% Obama 47.87% 10.48%Egypt20.18% 7.04%Syria 5.35% 31.62%30.78% 24.47%36.26% 25.64%43.87% 26.15%46.15%

Hany SalahEldeen & Michael Nelson 05 Resurrecting My Revolution TPDL 2013 Missing and Archived Percentages Across Time

Previous Conclusions Hany SalahEldeen & Michael Nelson 06 Resurrecting My Revolution TPDL 2013 Measured 21,625 resources from 6 data sets in archives & live web. After a year from publishing about 11% of content shared on social media will be gone. After this we are losing roughly 0.02% daily.

New Research Questions Validity: is our estimation model still valid? Existence Stability: do resources on the live web remain missing? Archival Stability: do resources in public archives persist? Social Context: how can we extract social context of missing resources and potential replacements? Hany SalahEldeen & Michael Nelson 07 Resurrecting My Revolution TPDL 2013

Revisiting Existence From previous study: Rerunning after a year: Hany SalahEldeen & Michael Nelson 08 Resurrecting My Revolution TPDL 2013 MJIranH1N1ObamaEgyptSyria Measured37.10%37.50%28.17%30.56%26.29%31.62%32.47%24.64%7.55%12.68% Predicted31.72%31.42%31.96%30.98%30.16%29.68%29.60%28.36%19.80%11.54% Error5.38%6.08%3.79%0.42%3.87%1.94%2.87%3.72%12.25%1.14% Average Prediction Error = 4.15%

Revisiting Archival From previous study: Rerunning after a year: Hany SalahEldeen & Michael Nelson 09 Resurrecting My Revolution TPDL 2013 MJIranH1N1ObamaEgyptSyria Measured48.61%40.32%60.80%55.04%47.97%52.14%48.38%40.58%23.73%0.56% Predicted61.78%61.18%62.26%60.30%58.66%57.70%57.54%55.06%37.94%21.42% Error13.17%20.86%1.46%5.26%10.69%5.56%9.16%14.48%14.21%20.86% Average Prediction Error = 11.57% in all cases, our archival predictions were too optimistic

Measured Vs. Predicted Hany SalahEldeen & Michael Nelson 10 Resurrecting My Revolution TPDL 2013

Interesting Phenomenon: Reappearance On The Live Web And Disappearance From The Archives Hany SalahEldeen & Michael Nelson 11 Resurrecting My Revolution TPDL 2013 EventMJIranObamaH1N1EgyptSyriaAverage % Re-appearing on the web11.29%11.48%6.63%3.68%4.21%1.97%6.54% % Disappearing from archives9.98%11.17%15.65%5.46%2.81%2.25%7.89% % Going from 1 memento to 02.72%2.89%4.24%1.96%0.23%0.28%2.05%

Reappearing And Disappearance Predictions Hany SalahEldeen & Michael Nelson 12 Resurrecting My Revolution TPDL 2013

Tweet Existence Hany SalahEldeen & Michael Nelson 13 Resurrecting My Revolution TPDL 2013 Problem: We don’t have the URIs of most of the tweets Solution: compute loss of other tweets linking to the URI: for each resource in the datasets: -Extract all the tweets that link to the resource using Topsy API (up to 500 tweets) -Check existence of each tweet. -(Topsy (mostly) does not delete indexed tweets)

Using Topsy API Hany SalahEldeen & Michael Nelson 14 Resurrecting My Revolution TPDL 2013 Get all the tweets having the same URI Check how many still exist on the live web

Tweet Existence Hany SalahEldeen & Michael Nelson 15 Resurrecting My Revolution TPDL 2013 EventMJIranObamaH1N1EgyptSyriaAverage Average % of missing posts14.43%14.59%10.03%7.38%15.08%0.53%10.34%

Tweets Disappearing Across Time Hany SalahEldeen & Michael Nelson 16 Resurrecting My Revolution TPDL 2013

Context Discovery And Shared Resource Replacement Hany SalahEldeen & Michael Nelson 17 Resurrecting My Revolution TPDL 2013 Problem: 140 characters limits the description of the linked resource. If it went missing, can we get the next best thing? Solution: Shared links typically have several tweets, responses, and retweets We can mine these traces for context and viable replacements

Context Discovery Hany SalahEldeen & Michael Nelson 18 Resurrecting My Revolution TPDL 2013 Linking to:

Use Topsy to Discover Tweets with the Same Link Hany SalahEldeen & Michael Nelson 19 Resurrecting My Revolution TPDL 2013

Tweet Text Replacement Hany SalahEldeen & Michael Nelson 21 Resurrecting My Revolution TPDL 2013 From all extracted tweets, extract the best replacement tweet having the longest common N-gram Replace with more descriptive one

Resource Replacement Hany SalahEldeen & Michael Nelson 22 Resurrecting My Revolution TPDL 2013 Assume that the resource linked in the tweet disappeared. We mine the list of tweets for: Hashtags User mentions Co-tweeted URIs

Co-Tweeted Resources Hany SalahEldeen & Michael Nelson 23 Resurrecting My Revolution TPDL 2013 A missing resource could be described or replaced by another resource that have been shared within the same tweet. replaces

Co-Tweeted Resources Hany SalahEldeen & Michael Nelson 24 Resurrecting My Revolution TPDL 2013 A missing resource could be described or replaced by another resource that have been shared within the same tweet. replaces Or

Hany SalahEldeen & Michael Nelson 25 Resurrecting My Revolution TPDL 2013 Generated Social Context Extraction: { "URI": " "Related Tweet Count": 500, "Related Hashtags": "#tran #citizensx #arabspring #visualstorytelling #collaborativerevolution #feb11http://t.co/qxusp70...", "Users who talked @webdocumentario:...", "All associated unique links:": " "All other links associated:": " ", "Most frequent link appearing:": " "Number of times the Most frequent link appearing:": 49, "Most frequent tweet posted and reposted:": "Check out 18DaysInEgypt - A crowd sourced documentary project ================= "Number of times the Most frequent tweet appearing:": 46, "The longest common phrase appearing:": "RT 2ke0rEjP is an interactive documentary website that YOU can help create Get your Jan25 stories ready! Pl RT", "Number of times the Most common phrase appearing:": 18 }

Shared Resource Replacement Recommendation Hany SalahEldeen & Michael Nelson 26 Resurrecting My Revolution TPDL 2013 From the contextual information extracted, can we obtain a viable replacement resource? Co-tweeted resources Tweet signature as query to search engines

Build a Tweet Document Hany SalahEldeen & Michael Nelson 20 Resurrecting My Revolution TPDL 2013 A tweet document represents the concatenation of all extracted tweets: do you have a story to tell about your 18 days of revolution? share it or contact sara 18days brand new interactive storytelling project on egyptian revolution a very creative platform to tell your story daysinegypt marches heading to tahrir square now from all over cairo it's all over again use the website to document your revolutionary stories and share them with the world! check out awesome documentary project crowdsourcing a people's narrative of the egyptian revolution … ” “

Tweet Signatures Hany SalahEldeen & Michael Nelson 27 Resurrecting My Revolution TPDL 2013 From the tweet document earlier: "tweet signature" = the top 5 most frequent words Use the tweet signature as a query to Google search Extract top 10 resulting resources

Tweet Signatures Hany SalahEldeen & Michael Nelson 28 Resurrecting My Revolution TPDL 2013 Tweet Document: Tweet Signature = top 5 most frequent terms from Tweet Document documentary project daysinegypt check sourced ” “ do you have a story to tell about your 18 days of revolution? share it or contact sara 18days brand new interactive storytelling project on egyptian revolution a very creative platform to tell your story daysinegypt marches heading to tahrir square now from all over cairo it's all over again use the website to document your revolutionary stories and share them with the world! check out awesome documentary project crowdsourcing a people's narrative of the egyptian revolution …

Query Google w/ Tweet Signature Hany SalahEldeen & Michael Nelson 29 Resurrecting My Revolution TPDL 2013

Search Engine Results Hany SalahEldeen & Michael Nelson 30 Resurrecting My Revolution TPDL 2013 The original resource

Search Engine Results Hany SalahEldeen & Michael Nelson 30 Resurrecting My Revolution TPDL 2013 The original resource Others are good replacement candidates

Recommendation Evaluation Hany SalahEldeen & Michael Nelson 32 Resurrecting My Revolution TPDL 2013 We extract a dataset of resources that are currently available: Pretend these resources no longer exist (for a baseline) Each of the resources are textual based Each resource have at least 30 retrievable tweets.  Extracted 731 unique resources

Recommendation Evaluation Hany SalahEldeen & Michael Nelson 33 Resurrecting My Revolution TPDL 2013 We use boiler plate removal library* to remove the template from the: linked resources top 10 retrieved results from Google  We use cosine similarity to compare the documents *

Similarity Measures In Resource Replacements Hany SalahEldeen & Michael Nelson 34 Resurrecting My Revolution TPDL 2013

Results Hany SalahEldeen & Michael Nelson 35 Resurrecting My Revolution TPDL 2013  41% of the test cases we can find a replacement page with at least 70% similarity to the original missing resource  The search results provide a mean reciprocal rank of 0.43

Conclusions Hany SalahEldeen & Michael Nelson 36 Resurrecting My Revolution TPDL 2013 We validated our model in predicting the resource existence on the current web with ~4% error after one year. The archival prediction on the other hand produced a large error ~11.5% We explored a phenomenon of reappearing on the web after disappearing (6.54%), and disappearing from the archives too (7.89%). The removal of search engine caches in the most recent Memento revision could be a possible explanation of the disappearance from the archives.

Cont. Conclusions Hany SalahEldeen & Michael Nelson 37 Resurrecting My Revolution TPDL 2013 Measured the estimated percentage missing from the tweets ~10.5%. Utilized Topsy API to extract context information about tweets with missing resources. Investigated the viability of finding a replacement resource to the missing one. In 41% of the cases we were able to extract a replacement resource that is 70% or more similar to the missing resource.