The Mobile Web is Structurally Different Apoorva Jindal USC Chris Crutchfield MIT Samir Goel Google Inc Ravi Jain Google Inc Ravi Kolluri Google Inc.

Slides:



Advertisements
Similar presentations
The Structure of the Web Mark Levene (Follow the links to learn more!)
Advertisements

Measurement and Analysis of Online Social Networks 1 A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee Presentation by Shahan Khatchadourian.
Analysis and Modeling of Social Networks Foudalis Ilias.
Web as Network: A Case Study Networked Life CIS 112 Spring 2010 Prof. Michael Kearns.
The Connectivity and Fault-Tolerance of the Internet Topology
1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.
Project Ideas slides modified from Eileen Kraemer and David P. Feldman.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
Asking Questions on the Internet
Web Graph Characteristics Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!)
Mining and Searching Massive Graphs (Networks)
Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
CS 345A Data Mining Lecture 1
CS 345A Data Mining Lecture 1 Introduction to Web Mining.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Common Properties of Real Networks. Erdős-Rényi Random Graphs.
Problem Addressed Attempts to prove that Web Crawl is random & biased image of Web Graph and does not assert properties of Web Graph Understanding the.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Decoding the Structure of the WWW : A Comparative Analysis of Web Crawls AUTHORS: M.Angeles Serrano Ana Maguitman Marian Boguna Santo Fortunato Alessandro.
CS 345 Data Mining Lecture 1 Introduction to Web Mining.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Network Science and the Web: A Case Study Networked Life CIS 112 Spring 2009 Prof. Michael Kearns.
1 Uniform Sampling from the Web via Random Walks Ziv Bar-Yossef Alexander Berg Steve Chien Jittat Fakcharoenphol Dror Weitz University of California at.
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
Ch. 13 Structure of the Web Padmini Srinivasan Computer Science Department Department of Management Sciences
CS 312: Algorithm Analysis Lecture #16: Strongly Connected Components This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported.
Modeling the Internet and the Web School of Information and Computer Science University of California, Irvine WEB GRAPHS.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
WEB SCIENCE: ANALYZING THE WEB. Graph Terminology Graph ~ a structure of nodes/vertices connected by edges The edges may be directed or undirected Distance.
A Transcoding Proxy for HTML Web Pages: Web Page Sampling and Conversion Evaluation. Andrew Stone CS525m.
Addressing Incompleteness and Noise in Evolving Web Snapshots KJDB2007 Masashi Toyoda IIS, University of Tokyo.
Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
The Shape of the Web So, the Web is a directed graph, but what does it look like?
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.
Social Networking Algorithms related sections to read in Networked Life: 2.1,
The Web Graph & The Laws of The Web P. Baldi, et al. Modeling the Internet and the Web: Probabilistic Methods and Algorithms John Wiley & Sons, Inc. ©
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
Mathematics of Networks (Cont)
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Copyright © Curt Hill Graphs Definitions and Implementations.
Models of Web-Like Graphs: Integrated Approach
Web Page Clustering using Heuristic Search in the Web Graph IJCAI 07.
1 3/21/2016 MATH 224 – Discrete Mathematics First we determine if a graph is connected.
SYNERGY: A Game-Theoretical Approach for Cooperative Key Generation in Wireless Networks Jingchao Sun, Xu Chen, Jinxue Zhang, Yanchao Zhang, and Junshan.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Dec.
An Algorithm for Enumerating SCCs in Web Graph Jie Han, Yong Yu, Guowei Liu, and Guirong Xue Speaker : Seo, Jong Hwa.
CS 115: COMPUTING FOR THE SOCIO-TECHNO WEB FINDING INFORMATION WITH SEARCH ENGINES.
22C:145 Artificial Intelligence
Introduction to Web Mining
Uniform Sampling from the Web via Random Walks
Generative Model To Construct Blog and Post Networks In Blogosphere
The likelihood of linking to a popular website is higher
Network Science: A Short Introduction i3 Workshop
Approximating the Community Structure of the Long Tail
Detecting Phrase-Level Duplication on the World Wide Web
Project Ideas slides modified from Eileen Kraemer and David P. Feldman.
CS246: Web Characteristics
CS 345A Data Mining Lecture 1
CS 345A Data Mining Lecture 1
Introduction to Web Mining
CS 345A Data Mining Lecture 1
Presentation transcript:

The Mobile Web is Structurally Different Apoorva Jindal USC Chris Crutchfield MIT Samir Goel Google Inc Ravi Jain Google Inc Ravi Kolluri Google Inc

The Mobile Web is Structurally Different The Mobile Web? Web pages designed for consumption on mobile wireless devices  CHTML, XHTML, WML All other pages referred to as fixed web Becoming more important  Better devices  Better networks  Cheaper plans Different from fixed web?  Smaller pages  Fewer hyperlinks  Fewer images is Structurally Different

Web graph  pages ↔ nodes  hyperlinks ↔ edges Properties of this graph  In-degree distribution  Out-degree distribution  Strongly connected component size distribution  …. Importance  Used in basic algorithms to implement search Crawling Ranking the web pages Studied in detail for fixed web INFOCOM 2008 Structurally? The Mobile Web is Structurally Different The Mobile Web is EDAS

Bow-tie Structure [Broder et al 2000] Model to describe the structure of the fixed web.

Methodology Collapse all pages in a domain to one node Use Tools based on Mapreduce Google’s mobile web index, June 2007  CHTML  XHTML + WML Webbase 2001 Google’s fixed web index, July 2007 In-degree & out-degree distributions  Tools based on mapreduce  Use [Clauset et al 2006] to infer the power law coefficient Determine bow-tie structure properties  Use COSIN tools [Donato et al 2004] Limitations  Cannot handle Google fixed web 2007 at page level

Mobile web is sparser Page-level Graph properties – Degree Distributions CorpusAvg Node Degree In-degreeOut-degree XHTML+WML CHTML Webbase Coefficient of power-law distribution CHTML lies between XHTML+WML and fixed web Out-degree distribution falls off faster for mobile web

Mobile web  Smaller SCC  Larger IN and smaller OUT  Bigger Disconnected + Tendrils Connectivity: Fixed Web > CHTML > XHTML/WML Page-level Graph properties – Bow-tie structure CorpusSCCINOUTTendrilsDiscon nected XHTML +WML 10.5%18%10.4%18.3%42.7% CHTML22%25.9%14.2%22%15.8% Webba se 33%11%39%13%4%

Language Properties Sub-graph of pages that share a common trait  Like keyword, location.  Called Thematically Unified Clusters (TUCs).  In fixed web, they retain the structural properties of the entire graph. Mobile web? CorpusLanguageFraction of Nodes XHTML Chinese42.6% English22.3% Russian13.4% French3.4% German2.3% CHTMLJapanese92.3% English5.9% CorpusSCCINOUTTendrilsDisconn ected XHTML +WML 10.5%18%10.4%18.3%42,7% Chinese13%22%9%14%42% English2%3%7%25%63% Russian22%40%8%11%19% Don’t study Japanese: Properties same as CHTML

Domain-level Graph Properties Domain-level graph  Collapse all nodes for a domain into a single super-node Compare mobile web 2007 and fixed web 2007 Advantages  Allows us to understand the differences at a much coarser level  Allows us to compare present day fixed and mobile webs CorpusAvg Node Degree SCCINOUTTendrils + Disconn. XHTML +WML %40.7%2.73%15.9% CHTML5.5683%16.4%0.22%0.36% Fixed web %5.62%0.4%0.03% Observations  Domain-level graphs are better connected.  XHMTL + WML has a much larger Disconnected component  CHTML properties lies between XTHML+WML and Fixed web. Structural differences between domain-level fixed web and mobile web same as the differences between page-level fixed web and mobile web.

Application: Impact on Crawling Crawling is resource-intensive.  Efficiency is important Higher level of disconnectedness  Need a larger and a more diverse seed set Covering the IN component requires special care Depth-first strategy risks spending a disproportionate time in Tendrils and Disconnected components Different languages have different levels of disconnectedness  Require a larger seed set for English pages than Russian pages  Crawl depth can be reduced for Russian sub-graph Sparseness also can give an advantage  Chances of encountering the page again during a crawl is smaller

Conclusions Mobile web graph is structurally different  Sparser, more disconnected  Smaller SCC and OUT CHTML properties lies between XHTML+WML and Fixed web Surprising preponderance of Chinese pages English sub-graph extremely disconnected

Future Work Only a first step Results motivate the need of a deeper and more extensive analysis Propose alternatives to bow-tie model for mobile web Better understanding of language sub-graphs Quantitatively characterize the impact of differences in structure on different search algorithms