Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

Slides:



Advertisements
Similar presentations
Competitive Contagion Scoring Review Let P be the population distribution of seed choices on graph G For every seed set s that appears with non-zero probability.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Analysis and Modeling of Social Networks Foudalis Ilias.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
FindAll: A Local Search Engine for Mobile Phones Aruna Balasubramanian University of Washington.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
Networked Games: Coloring, Consensus and Voting Prof. Michael Kearns Networked Life MKSE 112 Fall 2012.
The process of increasing the amount of visitors to a website by ranking high in the search results of a search engine.
Behavioral Graph Coloring “An Experimental Study of the Coloring Problem on Human Subject Networks” [Science 313, August 2006] Michael Kearns Computer.
1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu,
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Behavioral Graph Coloring Michael Kearns Computer and Information Science University of Pennsylvania Collaborators: Nick Montfort Siddharth Suri Special.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
1 Worm Modeling and Defense Cliff C. Zou, Don Towsley, Weibo Gong Univ. Massachusetts, Amherst.
Tag-based Social Interest Discovery
1 Anonshare 2.0 P2P Anonymous Browsing History Share Frank Chiang Terry Go Rui Ma Anita Mathew.
Adversarial Information Retrieval The Manipulation of Web Content.
A Survey on Social Network Search Ranking. Web vs. Social Networks WebSocial Network Publishing Place documents on server Post contents on social network.
Master Thesis Defense Jan Fiedler 04/17/98
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Some Analysis of Coloring Experiments and Intro to Competitive Contagion Assignment Prof. Michael Kearns Networked Life NETS 112 Fall 2014.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
PEERSPECTIVE.MPI-SWS.ORG ALAN MISLOVE KRISHNA P. GUMMADI PETER DRUSCHEL BY RAGHURAM KRISHNAMACHARI Exploiting Social Networks for Internet Search.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
1 Shape Segmentation and Applications in Sensor Networks Xianjin Xhu, Rik Sarkar, Jie Gao Department of CS, Stony Brook University INFOCOM 2007.
Algorithmic Detection of Semantic Similarity WWW 2005.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Networked Games: Coloring, Consensus and Voting Prof. Michael Kearns Networked Life NETS 112 Fall 2013.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
ASSIST: Adaptive Social Support for Information Space Traversal Jill Freyne and Rosta Farzan.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Chapter 8: Web Analytics, Web Mining, and Social Analytics
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Presentation by: Rebecca Chambers WebDuck Designs
Networked Games: Coloring, Consensus and Voting
Accurately Interpreting Clickthrough Data as Implicit Feedback
Networked Games: Coloring, Consensus and Voting
Discrete ABC Based on Similarity for GCP
Networked Games: Coloring, Consensus and Voting
Networked Games: Coloring, Consensus and Voting
Knowledge Sharing Mechanism in Social Networking for Learning
Information Retrieval and Web Design
Presentation transcript:

Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer Science, University of Toronto

2 Presentation Outline Part I: Exploiting Social Networks for Internet Search Part II: An Experimental Study of the Coloring Problem on Human Subject Networks

Computer Science Department, University of Toronto 3 Exploiting Social Networks for Internet Search Alan Mislove, Krishna Gummadi, and Peter Druschel, HotNets 2006 Part I

4 Introduction  Social Networking (SN) A new form of publishing and locating information  Objective To understand whether these social links can be exploited by search engines to provide better results  Contributions Comparison of the mechanisms in Web and online SN for  Publishing: Mechanisms to make information available to users  Locating: Mechanisms to find information Results from an experiment in social network-based Web Search Challenges and opportunities in using Social Networks for Internet Search

5 Web vs. SN (1/2) Web  Publishing: By placing documents on a Web Server (and then search for incoming links)  Locating: Via Search engines (Exploiting the link graph) Pros  Very Effective (incoming links are good indicators of importance) Limitations  No fresh data  No personalized results  Unlinked pages are not indexed

6 Web vs. SN (2/2) Social Networks  Publishing: No explicit links between content (photos, videos, blogs) but implicit links between content through explicit links between users.  Locating: Navigation through the social network and browsing users’ content Keyword based search for textual or tagged content Through "Top-10" lists Pros  Helps a user find timely, relevant information by browsing adjacent regions of the network of users with similar interests  Content is rated rapidly (by comments and feedback of a community)

7 Integration of Web Search and SN  Web and SN information is disjoint  No unified search tool that locates information across different systems

8 PeerSpective: SN-based Web Search  Technology: Lucene text search engine and FreePastry P2P Overlay Lightweight HTTP Proxy transparently indexes all visited URLs of user

9 Searching Process  A query is submitted by a user to Google  The proxy transparently forwards the query to both Google and the Proxies of Users in the network  Each proxy executes the query on the local index  Results are then collated and presented alongside Google results  Peerspective Ranking: Lucene Sc. + Pagerank + Scores from users who previously viewed the result

10 Search Results Example

11 Experiments  10 grad. students share downloaded or viewed Web content  One month long experiments  Distinct URLs  25% were of type text/html or application/pdf (so the can be indexed) Reports On:  Limits of hyperlink-based search  Benefits of SN-based Search

12 Limits of hyperlink-based search  Report on fraction of visited URLs that are not indexed by Google Too new page (blogs) Deep Web Dark Web (no links) Results  About 1/3 of requests cannot be retrieved by Google  Peerspective’s indices covers 30% of the requested URLs  13.3% of URLs were contained in PeerSpective but not in Google's index

13 Random samples of URLs not in Google and Potential Reason

14 Benefits of SN-based Search  Experiments on clicks on results on first page For 1730 queries (1079 resulted in clicks) Results  86.5% of the clicked results were returned only by Google  5.7% of the clicked results were returned by both  7.7% of the clicked results were returned only by PeerSpective Conclusions  This 7.7% is considered to be the gold standard of web search engineering  Inherent advantage of using social links in web search

15 Reasons for Clicks on Peerspective  Disambiguation Community tend to share definitions or interpretation of popular terms (bus)  Ranking SN information can bias the ranking algorithms to the interests of users (CoolStreaming)  Serendipity Ample opportunity of finding interesting things without searching

16 Example of URLs found in Peerspective

17 Opportunities and Challenges  Privacy Willingness of users to disclose information Need for mechanisms to control information flow and anonymity  Membership and Clustering of SN Users may participate in many networks Need for searching with respect to the different clusters  Content rating and ranking New approaches to ranking search results System Architecture: centralized or Distributed?

Computer Science Department, University of Toronto 18 An Experimental Study of the Coloring Problem on Human Subject Networks Michael Kearns, Siddharth Suri, Nick Montfort, SCIENCE, (313), Aug 2006 Part II

19 Experimental Study on Human Subject Networks  Theoretical work suggests that structural properties of naturally occurring networks are important in shaping behavior and dynamics E.g. Hubs in networks are important in routing information  Empirical Structural Properties established by many disciplines Small Diameter (the “six” degrees of separation) Local clustering of connectivity Heavy-tailed distribution of connectivity (Power-law distributions)  Empirical Studies of Networks Limitation: Networks are fixed and given (no alternatives) Other approach: Controlled laboratory study

20 Experiment  Experimental Scenario Distributed problem-solving from local information  Experimental Setting 38 human subjects (network vertices) Each subject controls the color of a vertex in a network Networks: simple and more complex Goal: Select a different color from that of all neighbors Problem: Coloring problem Information Available: Variable (Low, Medium, High)

21 Graph Coloring Problem  Graph coloring An assignment of "colors" to certain objects in a graph such that no two adjacent objects are assigned the same color  Graph Coloring Problem Find the minimum number of colors for an arbitrary graph (NP-hard)  Chromatic number The least number of colors needed to color the graph Example  Vertex coloring  A 3-coloring suits this graph but fewer colors would result in adjacent vertices of the same color

22 Network Topologies Leader Cycle Pref. Att. v=2 Pref. Att. v=3 Simple Cycle 5-Chord Cycle 20-Chord Cycle

23 Information View YOU YOU Overall Progress Low (Color of each Neighbor) Medium (#of Links of each Neighbor) All (All network)

24 Graph Properties and Experimental Results GraphGraph PropertiesExperimental Results Colors Required Min Links Max Links Avg. Distance Avg. Exp. Duration (sec) # Exp. Solved (sec) No. of Changes Simple Cycle / Chord Cycle / Chord Cycle /68265 Leader Cycle /78797 Pref. Att. V= /61744 Pref. Att. V= /64703

25 1: Collective Performance  Subjects could indeed solve the coloring problem across a wide range of networks 31/38 experiments ended in solution in less that 300 seconds 82 sec mean completion time  Collective Performance affected by network structure Preferential Attachment harder than Cycle-based networks  Cycle-based networks: Monotonic relationship between solution time and average network distance (smaller distance leading to shorter solution times)  Addition of random chords: Systematically reduces solution time

26 2: Human Performance VS Artificial Distributed Heuristics Heuristic considered:  A vertex is randomly selected If there are unused colors in the neighbor of this vertex then a color is selected randomly from the available ones If there are not unused then a color is selected randomly Comparison measure  Number of vertex color changes Findings:  Results exactly reversed: lower average distance increases the difficulty for the heuristic  Preferential attachment networks easier for the heuristic

27 3: Effects on Varying the Locality of Information View  Variable locality information provided to subjects Low: Their own and neighboring colors are visible Medium: Their own and neighboring colors are visible but providing information on connectivity of neighbors High: global coloring state at all times Findings:  Increased amount of information Reduces solution times for cycle-based networks Decreases solution times for preferential attachment networks Rapid convergence to one of the two solutions in cycle-based networks

28 Information View Effect 1: Pref. Att. VS Cycle-based Networks

29 Information View Effect 2: Cycle-based Solution Convergence Low Information View High Information View Population oscillates between approaches to the two solutions Rapid convergence to one of the Two possible solutions

30 Individual Strategies  Choosing colors that result in the fewest local conflicts  Attempt to avoid conflicts with highly connected subjects  Signaling behavior of subjects  Introducing conflicts to avoid local minima

Computer Science Department, University of Toronto 31 Questions?

Computer Science Department, University of Toronto 32 Thanks!