Zhenjiang Lin, Michael R. Lyu and Irwin King

Slides:



Advertisements
Similar presentations
Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
Advertisements

1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Collaborative QoS Prediction in Cloud Computing Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China Rocky.
Question Identification on Twitter 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 Google Research, Beijing, China 3 AT&T Labs Research,
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
1 Content Based Image Retrieval Using MPEG-7 Dominant Color Descriptor Student: Mr. Ka-Man Wong Supervisor: Dr. Lai-Man Po MPhil Examination Department.
Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Efficient Convex Relaxation for Transductive Support Vector Machine Zenglin Xu 1, Rong Jin 2, Jianke Zhu 1, Irwin King 1, and Michael R. Lyu 1 4. Experimental.
Data-rich Section Extraction from HTML pages Introducing the DSE-Algorithm Original Paper from: Jiying Wang and Fred H. Lochovsky Department of Computer.
1 Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure Allen, Zhenjiang LIN CSE, CUHK 13 Dec 2006.
Constructing a Large Node Chow-Liu Tree Based on Frequent Itemsets Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.
1 PageSim: A Link-based Similarity Measure for the World Wide Web Zhenjiang Lin, Irwin King, and Michael, R., Lyu Computer Science & Engineering, The Chinese.
MPEG-7 DCD Based Relevance Feedback Using Merged Palette Histogram Ka-Man Wong and Lai-Man Po ISIMP 2004 Poly U, Hong Kong Department of Electronic Engineering.
Chapter 8 Web Structure Mining Part-1 1. Web Structure Mining Deals mainly with discovering the model underlying the link structure of the web Deals with.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Yin Yang (Hong Kong University of Science and Technology) Nilesh Bansal (University of Toronto) Wisam Dakka (Google) Panagiotis Ipeirotis (New York University)
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2015 Lecture 8: Information Retrieval II Aidan Hogan
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,
We introduce the use of Confidence c as a weighted vote for the voting machine to avoid low confidence Result r of individual expert from affecting the.
Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Graph Data Management Lab, School of Computer Science Add title here: Large graph processing
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou.
An Analytical Study of Puzzle Selection Strategies for the ESP Game Ling-Jyh Chen, Bo-Chun Wang, Kuan-Ta Chen Academia Sinica Irwin King, and Jimmy Lee.
Link-based Similarity Measurement Techniques and Applications Department of Computer Science & Engineering The Chinese University of Hong Kong Zhenjiang.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
Question Routing in Community Question Answering: Putting Category in Its Place 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 AT&T Labs.
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
1 Effect of Spatial Locality on An Evolutionary Algorithm for Multimodal Optimization EvoNum 2010 Ka-Chun Wong, Kwong-Sak Leung, and Man-Hon Wong Department.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu Yu Kang, Yangfan Zhou, Zibin Zheng, and Michael R. Lyu {ykang,yfzhou,
1 Authors: Glen Jeh, Jennifer Widom (Stanford University) KDD, 2002 Presented by: Yuchen Bian SimRank: a measure of structural-context similarity.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Hongbo Deng, Michael R. Lyu and Irwin King
Recommender Systems with Social Regularization Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu The Chinese University of Hong Kong Irwin.
Computer Science 1 Using Clustering Information for Sensor Network Localization Haowen Chan, Mark Luk, and Adrian Perrig Carnegie Mellon University
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
1 CS 430: Information Discovery Lecture 5 Ranking.
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
1 Learning to Impress in Sponsored Search Xin Supervisors: Prof. King and Prof. Lyu.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Glen Jeh & Jennifer Widom KDD  Many applications require a measure of “similarity” between objects.  Web search  Shopping Recommendations  Search.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
SimRank: A Measure of Structural-Context Similarity Glen Jeh and Jennifer Widom Stanford University ACM SIGKDD 2002 January 19, 2011 Taikyoung Kim SNU.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu
A Collaborative Quality Ranking Framework for Cloud Components
CPS : Information Management and Mining
WSRec: A Collaborative Filtering Based Web Service Recommender System
CIKM’ 09 November 3rd, 2009, Hong Kong
Video Summarization by Spatial-Temporal Graph Optimization
Diversified Top-k Subgraph Querying in a Large Graph
Identify Different Chinese People with Identical Names on the Web
Junghoo “John” Cho UCLA
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Mingzhen Mo and Irwin King
Three steps are separately conducted
Web Page Classification with Heterogeneous Data Fusion
Presentation transcript:

MatchSim: A Novel Neighbor-based Similarity Measure with Maximum Neighborhood Matching Zhenjiang Lin, Michael R. Lyu and Irwin King Department of Computer Sciences and Engineering The Chinese University of Hong Kong, Shatin, Hong Kong Overview Motivation We propose a neighbor-based similarity measure, called MatchSim, to solve the problem of computing similarity between objects in a graph. The method recursively refining the similarities between objects by finding the maximum matching of similarity between their neighbors. Experimental results demonstrate the effectiveness of the proposed method. Effectively and efficiently exploring similarity between objects by exploiting the relationships ( the links) among them only. Main Contributions A neighbor-based similarity measure, MatchSim, which recursively extract similarity between two pages by finding the maximum matching between their similar neighbors. Experiments on two real-world datasets. Basic Idea Example: MatchSim vs SimRank SimRank: sim(a, b) = Σsim(ai , bj )/4 = 0.4. By dropping the most similar page-pair (a2 , b2), sim(a, b) increases to sim(a1 , b1 )/1 = 0.6, which is obviously counterintuitive! Figure: Measuring similarity between a and b based on their neighbors. (sim(a1, b1) = 0.6, sim(a1, b2) = sim(a2, b1) = 0.1, sim(a2, b2) = 0.8.) MatchSim: finds the maximum matching between their neighbors, and takes the average similarity of the matched pairs as sim(a, b). Here, sim(a, b) = (sim(a1 , b1) + sim(a2 , b2))/2 = 0.7. SimRank (an existing neighbor-based method): sim(a, b) is the average similarity between their neighbors. MatchSim (proposed method): sim(a, b) is the average similarity of the maximum matching between their neighbors. MatchSim Definition MatchSim Iteration sim(a, b) is the fix point of the iteration. W (a, b) is the sum of similarity of the maximum matching between neighbors of a and b, i.e., I(a) and I(b). Wk (a, b) is computed based on the scores simk (*, *). Iteration starts with sim(a, b) = 1 for a = b and 0 otherwise. Properties Convergency Symmetric: sim(a, b) = sim(b, a); Bounded: 0 ≦ sim(a, b) ≦1; Reaches 1 if and only if a and b are identical. The convergence has been proved theoretically. Experimentally converges within 15 iterations. Experimental Evaluation The Google Scholar (GS) Dataset: A citation graph crawled from http://scholar.google.com, containing 20,000 papers and 87,717 citations. Ground truth: “Related Articles” returned by Google Scholar. The CSE Website (CW) Dataset: A web graph crawled from http://cse.cuhk.edu.hk, containing 22,615 web pages and 120,947 hyperlinks. Ground truth: cosine TFIDF similarity scores.