Partitioning Social Networks for Time-dependent Queries Berenice Carrasco, Yi Lu and Joana M. F. da Trindade - University of Illinois - EuroSys11 – Workshop.

Slides:



Advertisements
Similar presentations
Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky David G. Andersen * Princeton, Intel Labs, CMU Dont Settle for Eventual : Scalable Causal Consistency.
Advertisements

Google News Personalization Scalable Online Collaborative Filtering
An Online News Recommender System for Social Networks Department of Computer Science University of Illinois at Urbana-Champaign Manish Agrawal, Maryam.
Partitioning Social Networks for Fast Retrieval of Time-dependent Queries Mindi Yuan, David Stein, Berenice Carrasco, Joana Trindade, Yi Lu University.
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Scalable Content-Addressable Network Lintao Liu
Fast Algorithms For Hierarchical Range Histogram Constructions
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Predicting Tie Strength with the Facebook API Tasos Spiliotopoulos Madeira-ITI, University of Madeira, Portugal / Harokopio University, Greece Diogo Pereira.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin and Vasant Honavar. BigData2013.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall Some slides/illustrations.
Small-world Overlay P2P Network
By Libo Song and David F. Kotz Computer Science,Dartmouth College.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
1 University of Freiburg Computer Networks and Telematics Prof. Christian Schindelhauer Wireless Sensor Networks 21st Lecture Christian Schindelhauer.
Secure Multicast (II) Xun Kang. Content Batch Update of Key Trees Reliable Group Rekeying Tree-based Group Diffie-Hellman Recent progress in Wired and.
A Comparison of Layering and Stream Replication Video Multicast Schemes Taehyun Kim and Mostafa H. Ammar.
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
A Trust Based Assess Control Framework for P2P File-Sharing System Speaker : Jia-Hui Huang Adviser : Kai-Wei Ke Date : 2004 / 3 / 15.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
A scalable multilevel algorithm for community structure detection
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
User Interactions in OSNs Evangelia Skiani. Do you have a Facebook account? Why? How likely to know ALL your friends? Why confirm requests? Why not remove.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Beacon Vector Routing: Scalable Point-to-Point Routing in Wireless Sensornets.
Geographic Routing Without Location Information A. Rao, C. Papadimitriou, S. Shenker, and I. Stoica In Proceedings of the 9th Annual international Conference.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Resource Fabrics: The Next Level of Grids and Clouds Lei Shi.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
University of California at Santa Barbara Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Zhao.
Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.
+ Offline Optimal Ads Allocation in SNS Advertising Hui Miao, Peixin Gao.
Introduction to database systems
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Database Design – Lecture 16
Distributed Load Balancing for Key-Value Storage Systems Imranul Hoque Michael Spreitzer Malgorzata Steinder.
Network Aware Resource Allocation in Distributed Clouds.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
CORE 2: Information systems and Databases CENTRALISED AND DISTRIBUTED DATABASES.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
PR SM A Secure Code Deployment Scheme for Active Networks Amdjed Mokhtari Leïla Kloul 22 November 2005.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
Eiger: Stronger Semantics for Low-Latency Geo-Replicated Storage Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡ * Princeton,
Rendezvous Regions: A Scalable Architecture for Service Location and Data-Centric Storage in Large-Scale Wireless Sensor Networks Karim Seada, Ahmed Helmy.
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
KAIS T On the problem of placing Mobility Anchor Points in Wireless Mesh Networks Lei Wu & Bjorn Lanfeldt, Wireless Mesh Community Networks Workshop, 2006.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
CHAPTER 1 – INTRODUCTION TO ACCESS Akhila Kondai September 30, 2013.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Attribute Allocation in Large Scale Sensor Networks Ratnabali Biswas, Kaushik Chowdhury, and Dharma P. Agrawal International Workshop on Data Management.
Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Auburn University
Physical Database Design
CS 594: Empirical Methods in HCC Social Network Analysis in HCI
Maximum Flow Neil Tang 4/8/2008
WHAT IS THE DIFFERENCE BETWEEN THE FACEBOOK NEWSFEED AND THE TIMELINE
Presentation transcript:

Partitioning Social Networks for Time-dependent Queries Berenice Carrasco, Yi Lu and Joana M. F. da Trindade - University of Illinois - EuroSys11 – Workshop on Social Network Systems

My colleague’s facebook home page!

Adarsh Jona Nandana Joana Naseer What is visible to Joana? – Messages in a two- hop network

Why is partitioning important? Different types of queries in Social Networks – photo tags, marketplace, news feed Retrieve small records (personalized content) Multiple records from different users Time-dependent – Home page refresh at Facebook Most common query

Existing approaches Partition based on friendship solely (1-hop network) – Power-law degree distribution Highly interconnected data Small fraction of nodes with very large degrees – General approach: Horizontal partitioning + Replication

Existing approaches Hash-based horizontal partitioning Adarsh Jona Nandana Joana Naseer Jona Joana Adarsh Nandana Naseer p1p2p3  Multiple records in different servers  Bad response time  Inefficient network usage  High packet overhead for such small data Key: User name

Existing approaches Replication  Great amount of extra storage

Existing approaches Query-based partitioning  Assume queries do not change with time Curino et. al., “SCHISM: A workload-driven approach to database replication and partititioning”, 2010

The challenge for Social Networks Friendship or query-based do not work well Underlying network varies over time – Added/deleted friends – Interaction level changes Only 30% of Facebook user pairs interact consistently from one month to the next

Our approach Partitioning not only the friendship network but also along the time dimension – Interaction: activity network weighted links: strong vs. weak power-law with much lighter tail – Maximal degree around 100 – This partitioning results in: Fewer cross-edges Reduced need for replication – Goal: Provide frequent users with high data locality Faster response to queries

Our algorithm 1. Construct an Activity Prediction Graph (APG) 2. Compute cost of local partitions 3. Partitioning APG with KMETIS 4. Greedy algorithm for partitioning the current period Differentiate between: 1) period used for prediction and 2) current period to partition Look at the interaction and predict the strength of relationship Then, look at this strength and determine what data can be accessed together Identifies links from past traces and capture relationships with strong activity Assign a cost that will determine how costly it would be to cut one edge or another

Our algorithm We propose a way to compute weights in this APG User nodes Message nodes Two-hop network

Our algorithm We propose a way to compute weights in this APG Message node weights User node weights Decay factor # msg exchanged

Our algorithm Cost of local partitions Message node weights User node weights Edge weights Msg accessible to user X Remote msg weights Partition 1Partition 2

Evaluation: Graph Partitioning Data set: – Facebook New Orleans network Jan2005 to Dec users and wall posts APG: Jan2005 to Nov2006 Fixed period: Dec-2006, with wall posts

Evaluation of Data Locality We mimic real Facebook page downloads for all wall posts in Dec2006 – Query requests 6 most recent wall posts in the user’s two-hop network We compare our algorithm to two hashed- based horizontal partitioning algorithms – Hash_p1 – Hash_p1_p2 Number of partitions used: up to 20

Evaluation of Data Locality Proportion of queries that access only 1 partition

Evaluation of Data Locality Proportion of queries that access at most 3 partitions

Conclusion and Future Work Our algorithm partitions social network data according to interaction levels at different times Our activity prediction graph significantly improved data locality compared to hashing Placement of data across different periods

Backup Slides

Existing approaches Hash-based horizontal partitioning Gizzard Range partitioning Cassandra Consistent hashing Dynamo Modified consistent hashing

Our approach Replication with time-dependency

Our approach Replication with time-dependency

Greedy Algorithm Use an algorithm for messages corresponding to the non-predicted month: Dec2006 – Initiator and receiver of the message exist in the APG but no previous interaction – Exactly one of the initiator and receiver of the message exist in the APG – Neither the initiator nor the receiver exists in the APG