Web Markov Skeleton Processes and Applications Zhi-Ming Ma 10 June, 2013, St.Petersburg

Slides:



Advertisements
Similar presentations
Stationary Probability Vector of a Higher-order Markov Chain By Zhang Shixiao Supervisors: Prof. Chi-Kwong Li and Dr. Jor-Ting Chan.
Advertisements

Markov Models.
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
Link Analysis: PageRank
Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining.
Web Markov Skeleton Processes and their Applications Zhi-Ming Ma 18 April, 2011, BNU.
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
Google 搜索与 Inter 网的信息检索 马志明 May 16, 2008
Click Evidence Signals and Tasks Vishwa Vinay Microsoft Research, Cambridge.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Distributed PageRank Computation Based on Iterative Aggregation- Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing,
How Google Relies on Discrete Mathematics Gerald Kruse Juniata College Huntingdon, PA
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Link Analysis, PageRank and Search Engines on the Web
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).
Network Science and the Web: A Case Study Networked Life CIS 112 Spring 2009 Prof. Michael Kearns.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
Google’s PageRank: The Math Behind the Search Engine Author:Rebecca S. Wills, 2006 Instructor: Dr. Yuan Presenter: Wayne.
1 CS 178H Introduction to Computer Science Research What is CS Research?
Amy N. Langville Mathematics Department College of Charleston Math Meet 2/20/10.
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 9.1 Chapter 9 : Social Networks What is a social.
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
Google’s Billion Dollar Eigenvector Gerald Kruse, PhD. John ‘54 and Irene ‘58 Dale Professor of MA, CS and I T Interim Assistant Provost Juniata.
Web Research © Copyright William Rowan Objectives By the end of this you will be able to: Use search engines and *URL’s on the internet as a research.
The Technology Behind. The World Wide Web In July 2008, Google announced that they found 1 trillion unique webpages! Billions of new web pages appear.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,
Link Analysis Hongning Wang
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
Scaling Personalized Web Search Authors: Glen Jeh, Jennfier Widom Stanford University Written in: 2003 Cited by: 923 articles Presented by Sugandha Agrawal.
CHAPTER 15 SECTION 1 – 2 Markov Models. Outline Probabilistic Inference Bayes Rule Markov Chains.
PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.
CompSci 100E 3.1 Random Walks “A drunk man wil l find his way home, but a drunk bird may get lost forever”  – Shizuo Kakutani Suppose you proceed randomly.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
How works M. Ram Murty, FRSC Queen’s Research Chair Queen’s University or How linear algebra powers the search engine.
Categories of Presented Papers Papers Ranking Results – S. Brin and L. Page. The Page Rank Citation Ranking: Bringing Order to the Web. Stanford InfoLab.
9 Algorithms: PageRank. Ranking After matching, have to rank:
MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.
By: Jesse Ehlert Dustin Wells Li Zhang Iterative Aggregation/Disaggregation(IAD)
CompSci 100E 4.1 Google’s PageRank web site xxx web site yyyy web site a b c d e f g web site pdq pdq.. web site yyyy web site a b c d e f g web site xxx.
Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.
Starter (June 2011) Explain two methods by which someone could find a website on the internet. [4]
Google’s means to provide better search results Qi-Yuan Gou.
A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.
Mathematics of the Web Prof. Sara Billey University of Washington.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Recitation 3 Steve Gu Jan
Google搜索与 Inter网的信息检索
Search Engines and Link Analysis on the Web
Lecture #11 PageRank (II)
Link-Based Ranking Seminar Social Media Mining University UC3M
PageRank and Markov Chains
DTMC Applications Ranking Web Pages & Slotted ALOHA
Laboratory of Intelligent Networks (LINK) Youn-Hee Han
Piyush Kumar (Lecture 2: PageRank)
9 Algorithms: PageRank.
Information retrieval and PageRank
Presentation transcript:

Web Markov Skeleton Processes and Applications Zhi-Ming Ma 10 June, 2013, St.Petersburg

Y. Liu, Z. M. Ma, C. Zhou: Web Markov Skeleton Processes and Their Applications, Tohoku Math J. 63 (2011), Y. Liu, Z. M. Ma, C. Zhou: Further Study on Web Markov Skeleton Processes, in Stochastic Analysis and Applications to Finance,World Scientific,2012 C. Zhou: Some Results on Mirror Semi- Markov Processes, manuscript

Web Markov Skeleton Process Markov Chain conditionally independent given

Define by : WMSP

Simple WMSP: Many simple WMSPs are Non-Markov Processes

[LMZ2011a,b]

Mirror Semi-Markov Process Mirror Semi-Markov Process is not a Hou-Liu’s Markov Skeleton Process, i.e. it does not satisfy

WMSP Multivariate Point Process associated with WMSP

Let

Consequentlywhere Define We can prove that

where

Time-homogeneous mirror semi-Markov processes are all independent of n

More property of of time homogeneity Renewal Theory Contribution probability Staying times and first entry times Limit distribution for semi-Markov process Limit distribution for mirror semi-Markov processes Reconstruction of Mirror Semi-Markov Processes

Why it is called a Web Markov Skeleton Process?

A simple Markov Skeleton Process From probabilistic point of view, PageRank is the stationary distribution of a Markov chain. Page Rank, a ranking algorithm used by the Google search engine. 1998, Sergey Brin and Larry Page, Stanford University

Markov chain describing surfing behavior

Markov chain describing surfing behavior

Web surfers usually have two basic ways to access web pages: 1.with probability α, they visit a web page by clicking a hyperlink. 2. with probability 1-α, they visit a web page by inputting its URL address.

where

Weak points of PageRank Using only static web graph structure Reflecting only the will of web managers, but ignore the will of users e.g. the staying time of users on a web. Can not effectively against spam and junk pages. BrowseRankSIGIR.ppt

Data Mining

Browsing Process Markov property Time-homogeneity

Computation of the Stationary Distribution –Stationary distribution: – is the mean of the staying time on page i. The more important a page is, the longer staying time on it is. – is the mean of the first re-visit time at page i. The more important a page is, the smaller the re- visit time is, and the larger the visit frequency is.

BrowseRank: Letting Web Users Vote for Page Importance Yuting Liu, Bin Gao, Tie-Yan Liu, Ying Zhang, Zhiming Ma, Shuyuan He, and Hang Li July 23, 2008, Singapore the 31st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval. Best student paper !

Browse Rank the next PageRank says Microsoft jerbrows er.wmvjerbrows er.wmv

Browsing Processes will be a Basic Mathematical Tool in Internet Information Retrieval Beyond: --General fromework of Browsing Processes? --How about inhomogenous process? --Marked point process --Mobile Web: not really Markovian

ExtBrowseRank and semi-Markov processes

MobileRank and Mirror Semi-Markov Processes

[10] B. Gao, T. Liu, Z. M. Ma, T. Wang, and H. Li A general markov framework for page importance computation, In proceedings of CIKM '2009, [11] B. Gao, T. Liu, Y. Liu, T. Wang, Z. M. Ma and H. LI Page Importance Computation based on Markov Processes, Information Retrieval online first: < Web Markov Skeleton Process

Research on Random Complex Networks and Information Retrieval: In recent years we have been involved in the research direction of Random Complex Netowrks and Information Retrieval. Below are some of the related outputs by our group (in collaboration with Microsoft Research Asia)

right continuous, piecewise constant functions More property of time homogeneity

Theorem [LMZ 2011a] for all n Theorem [LMZ 2011b] General case

The statistical properties of a time homogeneous mirror semi-Markov process is completely determined by:

Reconstruction of Mirror Semi-Markov Processes We can construct such that Given:,, Theorem [LMZ 2011b]

uniformly

Limit distribution for semi-Markov process

Limit distribution for mirror semi-Markov processes

Staying times and first entry times Staying time on the state j: First entry time into the state k: into k where Distribution Expectation Distribution Expectation

Contribution probability from state i to state j:

Renewal Theory Proposition

Renewal Equation [LMZ2011a]

Renewal functional : where Below are the resuls on the renewal functional [LMZ2011a]

Thank you !

Time Homogeneous WMSP

right continuous, piecewise constant functions

More property of of time homogeneity Theorem [LMZ 2011b] for all

Write is expressed as Reconstruction of WMSP [LMZ2011b]

Ranking Websites, a Probabilistic View Ying Bao, Gang Feng, Tie-Yan Liu, Zhi-Ming Ma, and Ying Wang Internet Mathematics, Volume 3 (2007), Issue 3 AggregateRank: Bring Order to Web Sites 29th Annual International Conference on Research & Development on Information Retrieval (SIGIR’06). G.Feng, T.Y. Liu, Ying Wang, Y.Bao, Z.M.Ma et al