Association Rule Mining in Peer-to-Peer Systems Ran Wolff Assaf Shcuster Department of Computer Science Technion I.I.T. Haifa 32000,Isreal.

Slides:



Advertisements
Similar presentations
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
Advertisements

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Efficient Solutions to the Replicated Log and Dictionary Problems
Chapter 13 (Web): Distributed Databases
1 Complexity of Network Synchronization Raeda Naamnieh.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #4 Mobile Ad-Hoc Networks AODV Routing.
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
Dissemination protocols for large sensor networks Fan Ye, Haiyun Luo, Songwu Lu and Lixia Zhang Department of Computer Science UCLA Chien Kang Wu.
The Atomic Commit Problem. 2 The Problem Reaching a decision in a distributed environment Every participant: has an opinion can veto.
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
 Idit Keidar, Technion Intel Academic Seminars, February Octopus A Fault-Tolerant and Efficient Ad-hoc Routing Protocol Idit Keidar, Technion Joint.
Cyclone Time Technology Deriving Consistent Time Base Using Local Clock Information Ashok Agrawala Moustafa Youssef Bao Trinh University of Maryland College.
A Local Facility Location Algorithm Supervisor: Assaf Schuster Denis Krivitski Technion – Israel Institute of Technology.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #5 Mobile Ad-Hoc Networks TBRPF.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
1 Relates to Lab 4. This module covers link state routing and the Open Shortest Path First (OSPF) routing protocol. Dynamic Routing Protocols II OSPF.
NETWORK LAYER (2) T.Najah AlSubaie Kingdom of Saudi Arabia Prince Norah bint Abdul Rahman University College of Computer Since and Information System NET331.
A Highly Adaptive Distributed Routing Algorithm for Mobile Wireless Networks Research Paper By V. D. Park and M. S. Corson.
Distributed Computing 5. Synchronization Shmuel Zaks ©
Routing and Routing Protocols Dynamic Routing Overview.
Distributed Asynchronous Bellman-Ford Algorithm
1 Computer Communication & Networks Lecture 22 Network Layer: Delivery, Forwarding, Routing (contd.)
Database Design – Lecture 16
1 Distributed Operating Systems and Process Scheduling Brett O’Neill CSE 8343 – Group A6.
林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
Mobiquitous'07 Poster1 On Reducing the Moving Distance in Approaching Optimal Configuration in MANETs Muddana Roopa, Akasapu Girish, Zhen Jiang Computer.
Routing protocols Basic Routing Routing Information Protocol (RIP) Open Shortest Path First (OSPF)
Leader Election Algorithms for Mobile Ad Hoc Networks Presented by: Joseph Gunawan.
Secure Incremental Maintenance of Distributed Association Rules.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Naming Name distribution: use hierarchies DNS X.500 and LDAP.
Prophet Address Allocation for Large Scale MANETs Matt W. Mutka Dept. of Computer Science & Engineering Michigan State University East Lansing, USA IEEE.
Distributed Database Systems Overview
Static versus Dynamic Routes Static Route Uses a protocol route that a network administrators enters into the router Static Route Uses a protocol route.
12. Recovery Study Meeting M1 Yuuki Horita 2004/5/14.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
1 Distributed Process Management Chapter Distributed Global States Operating system cannot know the current state of all process in the distributed.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
Distributed Classification in Peer-to-Peer Networks Ping Luo, Hui Xiong, Kevin Lü, Zhongzhi Shi Institute of Computing Technology, Chinese Academy of Sciences.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 10, 2005 Session 9.
Termination Detection
1 Distributed Databases BUAD/American University Distributed Databases.
Distributed database system
1 Computer Communication & Networks Lecture 21 Network Layer: Delivery, Forwarding, Routing Waleed.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
M. Veeraraghavan (originals by J. Liebeherr) 1 Need for Routing in Ethernet switched networks What do bridges do if some LANs are reachable only in multiple.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
1 Chapter 4: Internetworking (IP Routing) Dr. Rocky K. C. Chang 16 March 2004.
1 Fault Tolerance and Recovery Mostly taken from
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Chapter 7 Packet-Switching Networks Shortest Path Routing.
The network layer: routing
Data Mining Algorithms for Large-Scale Distributed Systems
Prepared by Ertuğrul Kuzan
Fisheye Routing protocol
Routing: Distance Vector Algorithm
Outline Distributed Mutual Exclusion Distributed Deadlock Detection
Routing in Packet Networks Shortest Path Routing
Introduction to locality sensitive approach to distributed systems
COS 461: Computer Networks
Presentation transcript:

Association Rule Mining in Peer-to-Peer Systems Ran Wolff Assaf Shcuster Department of Computer Science Technion I.I.T. Haifa 32000,Isreal

Difficulties of Distributed DB Impracticality of global communications and global synchronization Impracticality of global communications and global synchronization Dynamic topology changes of the network Dynamic topology changes of the network On-the-fly data updates On-the-fly data updates Resource sharing with other applications Resource sharing with other applications Frequent failure and recovery of resources. Frequent failure and recovery of resources.

The Algorithm Requirements Entirely asynchronous Entirely asynchronous Imposes very little communication overhead Imposes very little communication overhead Transparently tolerates network topology changes and node failures Transparently tolerates network topology changes and node failures Quickly adjusts to changes in the data as they occur Quickly adjusts to changes in the data as they occur

Problems in LSD-ARM There can be no global synchronization There can be no global synchronization Nodes must act independently Nodes must act independently No point in time in which the algorithm is known to have finished No point in time in which the algorithm is known to have finished No way of knowing that the information they possess is final and accurate. No way of knowing that the information they possess is final and accurate.

Solution For each node to maintain an assumption of the correct result For each node to maintain an assumption of the correct result Update the result whenever new data arrives Update the result whenever new data arrives Nodes compute the result through local negotiation with their immediate neighbor Nodes compute the result through local negotiation with their immediate neighbor

Dynamic nature of LSD system If the mean time between failures of a single node is 20,000 hours If the mean time between failures of a single node is 20,000 hours A system consisting of 100,000 nodes could easily fail five times per hour A system consisting of 100,000 nodes could easily fail five times per hour Whenever a node departs, the global DB and result of computation will be changed Whenever a node departs, the global DB and result of computation will be changed Similar problem occurs when new nodes join Similar problem occurs when new nodes join

The majority voting protocol Requires no synchronization between the computing nodes Requires no synchronization between the computing nodes Each node communicates only with its immediate neighbors Each node communicates only with its immediate neighbors Locality implies that the algorithm is scalable to very large network Locality implies that the algorithm is scalable to very large network

8 Notation definition database at time t database at time t partition of node u at time t partition of node u at time t the group of machines reachable from u at time t the group of machines reachable from u at time t solution of LSD-ARM problem, for node u at time t, which is a set of rules solution of LSD-ARM problem, for node u at time t, which is a set of rules

LSD-Majority LSD-Majority :an entirely different majority voting protocol LSD-Majority :an entirely different majority voting protocol The purpose is to ensure that each node converges toward the correct majority The purpose is to ensure that each node converges toward the correct majority Ad-hoc solution of node u is : Ad-hoc solution of node u is : 1 :when the majority in is of set bits 1 :when the majority in is of set bits 0 :when the majority in is of unset bits 0 :when the majority in is of unset bits

The nodes communicate by sending messages containing two integers The nodes communicate by sending messages containing two integers Count :stands for the number of bits this message reports Count :stands for the number of bits this message reports Sum :which is the number of those bits which are equal to one Sum :which is the number of those bits which are equal to one

Cu is for now one Cu is for now one △ u measures the number of access set bits u has been informed of △ u measures the number of access set bits u has been informed of △ uv measures the number of access set bits u and v have last reported to one another △ uv measures the number of access set bits u and v have last reported to one another

△ u recalculation: each time Su changes, a message is received, or a node connects to v or disconnects from v △ u recalculation: each time Su changes, a message is received, or a node connects to v or disconnects from v △ uv recalculation: each time a message is sent to or received from v △ uv recalculation: each time a message is sent to or received from v As long as △u ≥ △uv ≥ 0 and As long as △u ≥ △uv ≥ 0 and △v ≥ △vu ≥ 0,there is no need to exchange data △v ≥ △vu ≥ 0,there is no need to exchange data

Algorithm 1: LSD-Majority

Generalize LSD-Majority for frequency counts Cu: size of the local database Cu: size of the local database Su: local support of an itemset Su: local support of an itemset λ: MinFreq λ: MinFreq Thus the resulting protocol will decide whether an itemset is frequent or not in Thus the resulting protocol will decide whether an itemset is frequent or not in

Cu: the number of transactions that include X in the local database Cu: the number of transactions that include X in the local database Su: the number of these transactions include both X and Y Su: the number of these transactions include both X and Y λ: MinConf λ: MinConf Thus the result will decide whether a rule X → Y is confident or not. Thus the result will decide whether a rule X → Y is confident or not.

Deciding whether a rule is correct or false requires that each node run two instances of the protocol. Deciding whether a rule is correct or false requires that each node run two instances of the protocol. This way LSD-Majority efficiently decides whether a candidate rule is correct or false. This way LSD-Majority efficiently decides whether a candidate rule is correct or false.

Majority-Rule Each node must take into account not only the local data, but also data brought to it by LSD-Majority. Each node must take into account not only the local data, but also data brought to it by LSD-Majority. An algorithm which never really finishes discovering all itemsets must generate rules on the fly. An algorithm which never really finishes discovering all itemsets must generate rules on the fly.

Majority-Rule

Conclusion A distributed majority vote protocol- LSD- Majority as part of the algorithm A distributed majority vote protocol- LSD- Majority as part of the algorithm An algorithm – Majority-Rule that mines association rules on distributed systems of unlimited size. An algorithm – Majority-Rule that mines association rules on distributed systems of unlimited size. Key quality is its locality. Key quality is its locality. Also fast convergence of the result and low communication demands Also fast convergence of the result and low communication demands