We.b : The web of short URLs Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas.

Slides:



Advertisements
Similar presentations
Learning more about Facebook and Twitter. Introduction  What we’ve covered in the Social Media webinar series so far  Agenda for this call Facebook.
Advertisements

The Big Idea for the “Emerging Young Artists” is to do SMART marketing using digital marketing avenues. The idea is to create awareness and increase.
Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
The Importance of Social Media. Some facts and statistics: Nearly 1 out of every 5 minutes online is spent on social media Facebook reached 1.11 billion.
Tweet That Town! How to Turn the Twitter Lackluster in Local Government into Vibrant Online Community Building Dr. Marcus Messner Virginia.
Presented By: Omofonmwan Nelson. Agenda:  Twitter  Benefits of Twitter  Tweet  Tweeter Services  Geographical Distribution  Conclusion.
User Involvement Statistics Iceland’s Experience Nordic Statistical Conference August 2010 Heiðrún Sigurðardóttir Þorbjörg Magnúsdóttir.
How to Use Social Networking to Help Job Seekers By: Wendy Jo Moyer, WORKFORCE CENTRAL FLORIDA and Candace Moody, WorkSource.
Web 2.0: Concepts and Applications 5 Connecting People.
Web 2.0: Concepts and Applications 5 Connecting People.
WEB2.0 Social Media & Independent Pharmacy Real World Use & Possibilities.
Flickr Information propagation in the Flickr social network Meeyoung Cha Max Planck Institute for Software Systems With Alan Mislove.
Social Networking Ottawa Lifelong Learning Fall 2009 Impact.
Masoud Valafar †, Reza Rejaie †, Walter Willinger ‡ † University of Oregon ‡ AT&T Labs-Research WOSN’09 Barcelona, Spain Beyond Friendship Graphs: A Study.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
TC2-Computer Literacy Mr. Sencer February 4, 2010.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
Social Media What it means for Ohio State. Social Media So you’ve got Facebook, Twitter, YouTube, Four Square, and Google + What is next Make it effective,
Web 2.0 Web 2.0 is the term given to describe a second generation of the World Wide Web (WWW) that is focused on the ability for people to collaborate.
Anthony Bonomi, Amber Heeg, Elizabeth Newton, Bianca Robinson & Marzi Shabani.
How the World Wide Web Works
GATEHOUSE NEWS & INTERACTIVE DIVISION DAILY DEALS ON FACEBOOK AND TWITTER.
First of all….. it’s never too late to start. “blogger” is not “viral” is not “avatar” is not.
TwitterSearch : A Comparison of Microblog Search and Web Search
Tweet, Tweet, Tweet… Tweeting Assignments & Discussions Kara Damm, Technology Integration Specialist.
SOCIAL NETWORKS AND THEIR IMPACTS ON BRANDS Edwin Dionel Molina Vásquez.
Abstract Introduction Results and Discussions James Kasson  (Dr. Bruce W.N. Lo)  Information Systems  University of Wisconsin-Eau Claire In a world.
Arif Fazel School of Molecular and Cellular Biology Academic Advisor IlliAAC Conference 2012 December 14, 2012 Tweet Us, Like Us, Watch US! MCB Goes Viral.
Social Media How Can This New Technology Possibly Help My Congregation?
Online Marketing & Social Media for Voluntary Organisations Mike Hughes Microsoft Ireland
1 All Your iFRAMEs Point to Us Mike Burry. 2 Drive-by downloads Malicious code (typically Javascript) Downloaded without user interaction (automatic),
Using Social Networks to Harvest Addresses Reporter: Chia-Yi Lin Advisor: Chun-Ying Huang Mail: 9/14/
SpotRank : A Robust Voting System for Social News Websites
OFF Page SEO Tips & Tricks Step By Step By IT Team of SlideLearn.com.
The Value of Old Data: Trends in GSA Data Repository Usage Matt Hudson, Geological Society of America, 3300 Penrose Place, Boulder CO INTRODUCTION.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
22 Social Media Marketing Trends for 2010 Dreamgrow Digital
Author: Sali Allister Date: 21/06/2011 COASTAL Google Analytics Report March 2011 – June /03/2011 – 08/06/11.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
 The World Wide Web is a collection of electronic documents linked together like a spider web.  These documents are stored on computers called servers.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Not So Fast Flux Networks for Concealing Scam Servers Theodore O. Cochran; James Cannady, Ph.D. Risks and Security of Internet and Systems (CRiSIS), 2010.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Jargon Busters Presented by Katie Munton and Natalie Dawson.
Parallel Crawlers Junghoo Cho (UCLA) Hector Garcia-Molina (Stanford) May 2002 Ke Gong 1.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Using Social Media for Fundraising and Communication with Supporters Lindsay Boyle – Communications & Research Coordinator Claire Chapman – Information.
Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.
Common Terms in the Internet Adnan Iqbal MCS-MIT-WD-A+ The College of Law.
Shoveling tweets: An analysis of the microblogging engagement of traditional news organizations Marcus Messner Maureen Linke Asriel Eford School of Mass.
This week on social media Oct 29 th -Nov 4 th. General stats.
Phi.sh/$oCiaL: The Phishing Landscape through Short URLs Sidharth Chhabra *, Anupama Aggarwal †, Fabricio Benevenuto ‡, Ponnurangam Kumaraguru † * Delhi.
 Definition of Social Media - forms of electronic communication (as Web sites for social networking and microblogging) through which users create online.
#GoingViral giulia_bonelli, formicablu Using social media to promote research CAGLIARI,
Dominique Renault. > Groups Groups - A group can be set up by any user and can be set to private. These are generally used by smaller groups of people.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
APICS QUICK GUIDE TO TWITTER January 19, 2016 This presentation was created by the APICS Marketing Team for members, chapters, and partners who want to.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Digital Communication Report December Facebook Increase of 301 fans. In terms of fans, we had basically the same growth of the previous months.
Quality SEO Content Writer
Some Common Terms The Internet is a network of computers spanning the globe. It is also called the World Wide Web. World Wide Web It is a collection of.
Traffic Audit Industry: Internet of Things (IoT) Ted Politidis Head of SEO
Social Media Account Management Services
Entrepreneurial Journalism
How to Use Social Networking to Help Job Seekers
Presentation transcript:

we.b : The web of short URLs Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas Karagiannis FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, Microsoft Research www 2011 March Presented by Somin Kim

Outline  Introduction  URL Shortening Services  Data Collection  The Web of Short URLs  Evolution and Lifetime  Publishers  Short URLs and Web Performance  Conclusion 2/36

Introduction  The idea behind URL shortening services is to assist in the easy sharing of URLs by providing a short equivalent one  Short URLs have seen a significant increase in their usage –Result of their extensive usage in Online Social Networks  Understanding the usage of short URLs is important –To provide insight into the interests of OSNs or IM systems –To know performance, scalability, and reliability of URL shortening services –To define the proper architecture for URL shortening services 3/36

Outline  Introduction  URL Shortening Services  Data Collection  The Web of Short URLs  Evolution and Lifetime  Publishers  Short URLs and Web Performance  Conclusion 4/36

 URL Shortening Services  Popularity of URL shortening services –The rapid adoption of OSNs has led to an increased demand for short URLs –Short URLs are also useful in traditional systems  such as IMs, SMSes, and s URL Shortening Services(1/3) 5/36 Long URL g.url.com/indeed.html Long URL g.url.com/indeed.html Short URL Short URL access Redirected to original URL URL shortening Service bit.ly URL shortening Service bit.ly publish

URL Shortening Services(2/3)  Some of these services provide statistics about the accesses of these URLs –The number of hits –The referrer sites the hits came from –The visitor’s countries –…  Users can create many short URLs for the same long URL –If a user creates a short URL for the same long URL, the service will create a different hash that will be given to the user –For each unique long URL, bit.ly provides a unique global hash with an information page –Overall statistics will still be kept by the global URL’s information page 6/36

URL Shortening Services(3/3) 통계페이지 캡쳐해서보여 줄까 ? Global information 7/36

Outline  Introduction  URL Shortening Services  Data Collection –Collection methodology –Collected data  The Web of Short URLs  Evolution and Lifetime  Publishers  Short URLs and Web Performance  Conclusion 8/36

Data Collection(1/3) Collection Methodology  Twitter crawling –Twitter crawling returns links “gossiped” in a social network –We collected tweets that contain HTTP URLs –Only 13% of the HTTP URLs were not shortened by any URL shortening services –50% of the HTTP URLs from Twitter were from bit.ly URLs 9/36

Data Collection(2/3) Collection Methodology  Brute-Force –We can get hashes irrespective of their published medium and recency –We gathered metadata provided by the shortening service –We monitored the evolution of the keyspace in ow.ly system  Ow.ly serially iterates over the available short URL space  About new short URLs created each day 10/36

Data Collection(3/3) Collected Data  In case of twitter and bitly, all the accompanied metadata for each short URL are also collected 11/36

Outline  Introduction  URL Shortening Services  Data Collection  The Web of Short URLs –Where do short URLs come from? –Where do short URLs point to? –Location –Popularity  Evolution and Lifetime  Publishers  Short URLs and Web Performance  Conclusion 12/36

The Web of Short URLs(1/7) Where do short URLs come from?  Short URLs do not frequently appear in traditional web pages –The vast majority of users arrive at bit.ly from non-web applications –Users who access through web applications mostly come from social networking channels (Twitter, facebook) 13/36

The Web of Short URLs(2/7) Where do short URLs point to?  Most popular types of short URL contents –News and informative content come first –4% of the most accessed URLs in owly trace were shortening services  Spammers use short URLs packed inside other short URLs to avoid exposure of the long URL 14/36

The Web of Short URLs(3/7) Location  The penetration of short URL use is significantly different from that of the Internet/web –Most of these accesses come from the United States, Japan, and Great Britain –Any accesses from China and India was not seen  China and India are ranked in the top-5 countries with the largest number of Internet users 15/36

The Web of Short URLs(4/7) Popularity  URL popularity –Large systems that provide content to users typically exhibit the power-law behavior  A small fraction of the content is very popular  Most of it is considered uninteresting 16/36

The Web of Short URLs(5/7) Popularity  URL popularity (cont.) –We split short URLs into active and inactive  Inactive : no hit was observed during the last 7 days of trace –10% of the short URLs are responsible for about 90% of the total hits seen in trace 17/36

The Web of Short URLs(6/7) Popularity  Content popularity –Besides familiar websites, less known or popular websites were observed  Pollpigeon.com(short opinion polls), Mashable.com(social media news), Twibbon.com(Twitter campaign) –Short polls are popular contents  It’s very common in social networking sites 18/36

The Web of Short URLs(7/7) Popularity  Content popularity (cont.) –Do popular web sites significantly change over time?  About 6 sites appears every single day of April 2010 in the top-100 –22 sites for March 2010  About 400 sites enjoy short bursts of popularity 19/36

Outline  Introduction  URL Shortening Services  Data Collection  The Web of Short URLs  Evolution and Lifetime –Life span of short URLs –Temporal evolution  Publishers  Short URLs and Web Performance  Conclusion 20/36

Evolution and Lifetime(1/5) Life span of short URLs  Lifetime of a URL is the number of days between its last and first observed hit  Lifetime CDF of the traces (twitter2, bitly) –50% of the short URLs are not ephemeral –Inactive URLs have a shorter lifespan 21/36

Evolution and Lifetime(2/5) Temporal evolution  The daily change in the number of hits for each short URL –The number of accesses for a typical short URL varies by as much as 40% from one day to the next –As less popular URLs are included, larger daily changes are observed 22/36

Evolution and Lifetime(3/5) Temporal evolution –Inactive URLs  Average 60% of hits are observed during their first day  After that, hit rate drops sharply –Active URLs  First-day effect is also evident  A significant hit rate for recent days are also observed  The evolution of hit rate across the lifetime of the short URLs 23/36

Evolution and Lifetime(4/5) Temporal evolution  The daily hit rate with a short URL’s lifetime for inactive short URLs –There’s no obvious dependence of the daily hit rate with a short URL’s lifetime 24/36

Evolution and Lifetime(5/5) Temporal evolution  Total number of hits as a function of the short URL’s lifetime –Active short URLs(bottom) appear to exhibit a linear relationship in log-log scale 25/36

Outline  Introduction  URL Shortening Services  Data Collection  The Web of Short URLs  Evolution and Lifetime  Publishers  Short URLs and Web Performance  Conclusion 26/36

Publishers(1/4)  Twitter effect –Short URLs referred from Twitter enjoy significantly higher popularity 27/36

Publishers(2/4)  CCDF of posted short URLs per Twitter user –Most users published a handful of tweets with short URLs –The majority of tweets with short URLs are original Twitter messages (not retweets) 28/36

Publishers(3/4)  User’s daily publish rate of short URLs –Median rate is 1 short URL per day –98% or the user publish no more than 5 short URLs per day 29/36

Publishers(4/4)  Correlation between a user’s publish rate and total number of hits –As the number of URLs published by a poster increases, the expected hit rate drops  Spamming-type behavior  Only a few short URLs from each publisher enjoy high hit rates 30/36

Outline  Introduction  URL Shortening Services  Data Collection  The Web of Short URLs  Evolution and Lifetime  Publishers  Short URLs and Web Performance –Space reduction –Latency  Conclusion 31/36

Short URLs and Web Performance(1/3) Space reduction  Space gain for the short URL –URL shortening services are quite effective at reducing URL size  For roughly 50% of the URLs, 91% reduction in size is observed –In twitter trace, only 31% of long versions of short URL remained under the character limit 32/36

Short URLs and Web Performance(2/3) Latency  URL shortening services impose an additional overhead in the user’s web request  We periodically accessed the 10 most popular short URLs –Fb.me and ow.ly exhibit a bimodal behavior –Bit.ly appears to be the slowest but shows more consistent behavior 33/36

Short URLs and Web Performance(3/3) Latency  The redirection overhead of bit.ly –More than 50% of the accesses, the URL shortening redirection imposes a relative overhead of 54% –This additional delay turns out to be comparable to the final web page access time in a significant fraction 34/36

Outline  Introduction  URL Shortening Services  Data Collection  The Web of Short URLs  Evolution and Lifetime  Publishers  Short URLs and Web Performance  Conclusion 35/36

Conclusion  We have presented a large-scale study of URL shortening services –Exploring traces from services themselves and Twitter  Summary –Short URLs appear mostly in ephemeral media, with profound effects on their popularity, lifetime, and access patterns –Small number of URLs have a very large number of accesses –A large percentage of short URLs are not ephemeral –The most popular websites changes slowly over time –The web sites differ from the sites which are popular among the broader web community –URL shortening services are extremely effective in space gaining but increase the overhead to access the web page 36/36