Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Mobile Web is Structurally Different Apoorva Jindal USC Chris Crutchfield MIT Samir Goel Google Inc Ravi Jain Google Inc Ravi Kolluri Google Inc.

Similar presentations


Presentation on theme: "The Mobile Web is Structurally Different Apoorva Jindal USC Chris Crutchfield MIT Samir Goel Google Inc Ravi Jain Google Inc Ravi Kolluri Google Inc."— Presentation transcript:

1 The Mobile Web is Structurally Different Apoorva Jindal USC Chris Crutchfield MIT Samir Goel Google Inc Ravi Jain Google Inc Ravi Kolluri Google Inc

2 The Mobile Web is Structurally Different The Mobile Web? Web pages designed for consumption on mobile wireless devices  CHTML, XHTML, WML All other pages referred to as fixed web Becoming more important  Better devices  Better networks  Cheaper plans Different from fixed web?  Smaller pages  Fewer hyperlinks  Fewer images is Structurally Different

3 Web graph  pages ↔ nodes  hyperlinks ↔ edges Properties of this graph  In-degree distribution  Out-degree distribution  Strongly connected component size distribution  …. Importance  Used in basic algorithms to implement search Crawling Ranking the web pages Studied in detail for fixed web INFOCOM 2008 Structurally? The Mobile Web is Structurally Different The Mobile Web is EDAS

4 Bow-tie Structure [Broder et al 2000] Model to describe the structure of the fixed web.

5 Methodology Collapse all pages in a domain to one node Use Tools based on Mapreduce Google’s mobile web index, June 2007  CHTML  XHTML + WML Webbase 2001 Google’s fixed web index, July 2007 In-degree & out-degree distributions  Tools based on mapreduce  Use [Clauset et al 2006] to infer the power law coefficient Determine bow-tie structure properties  Use COSIN tools [Donato et al 2004] Limitations  Cannot handle Google fixed web 2007 at page level

6 Mobile web is sparser Page-level Graph properties – Degree Distributions CorpusAvg Node Degree In-degreeOut-degree XHTML+WML3.752.003.49 CHTML5.061.994.06 Webbase7.02.12.7 Coefficient of power-law distribution CHTML lies between XHTML+WML and fixed web Out-degree distribution falls off faster for mobile web

7 Mobile web  Smaller SCC  Larger IN and smaller OUT  Bigger Disconnected + Tendrils Connectivity: Fixed Web > CHTML > XHTML/WML Page-level Graph properties – Bow-tie structure CorpusSCCINOUTTendrilsDiscon nected XHTML +WML 10.5%18%10.4%18.3%42.7% CHTML22%25.9%14.2%22%15.8% Webba se 33%11%39%13%4%

8 Language Properties Sub-graph of pages that share a common trait  Like keyword, location.  Called Thematically Unified Clusters (TUCs).  In fixed web, they retain the structural properties of the entire graph. Mobile web? CorpusLanguageFraction of Nodes XHTML Chinese42.6% English22.3% Russian13.4% French3.4% German2.3% CHTMLJapanese92.3% English5.9% CorpusSCCINOUTTendrilsDisconn ected XHTML +WML 10.5%18%10.4%18.3%42,7% Chinese13%22%9%14%42% English2%3%7%25%63% Russian22%40%8%11%19% Don’t study Japanese: Properties same as CHTML

9 Domain-level Graph Properties Domain-level graph  Collapse all nodes for a domain into a single super-node Compare mobile web 2007 and fixed web 2007 Advantages  Allows us to understand the differences at a much coarser level  Allows us to compare present day fixed and mobile webs CorpusAvg Node Degree SCCINOUTTendrils + Disconn. XHTML +WML 3.9140.6%40.7%2.73%15.9% CHTML5.5683%16.4%0.22%0.36% Fixed web 2007 35.7593.9%5.62%0.4%0.03% Observations  Domain-level graphs are better connected.  XHMTL + WML has a much larger Disconnected component  CHTML properties lies between XTHML+WML and Fixed web. Structural differences between domain-level fixed web and mobile web same as the differences between page-level fixed web and mobile web.

10 Application: Impact on Crawling Crawling is resource-intensive.  Efficiency is important Higher level of disconnectedness  Need a larger and a more diverse seed set Covering the IN component requires special care Depth-first strategy risks spending a disproportionate time in Tendrils and Disconnected components Different languages have different levels of disconnectedness  Require a larger seed set for English pages than Russian pages  Crawl depth can be reduced for Russian sub-graph Sparseness also can give an advantage  Chances of encountering the page again during a crawl is smaller

11 Conclusions Mobile web graph is structurally different  Sparser, more disconnected  Smaller SCC and OUT CHTML properties lies between XHTML+WML and Fixed web Surprising preponderance of Chinese pages English sub-graph extremely disconnected

12 Future Work Only a first step Results motivate the need of a deeper and more extensive analysis Propose alternatives to bow-tie model for mobile web Better understanding of language sub-graphs Quantitatively characterize the impact of differences in structure on different search algorithms


Download ppt "The Mobile Web is Structurally Different Apoorva Jindal USC Chris Crutchfield MIT Samir Goel Google Inc Ravi Jain Google Inc Ravi Kolluri Google Inc."

Similar presentations


Ads by Google