Yung-Ting Chuang Electrical and Computer Engineering University of California, Santa Barbara May 3, 2013 Committee Members: Professor P. Michael Melliar-Smith,

Yung-Ting Chuang Electrical and Computer Engineering University of California, Santa Barbara May 3, 2013 Committee Members: Professor P. Michael Melliar-Smith, Chair Professor Louise E. Moser Professor Timothy P. Sherwood Professor Volkan Rodoplu 5/3/20131Yung-Ting Chuang's Ph.D. Defense

Outline Motivation Trustworthy Distributed Search and Retrieval Protecting against Malicious Attacks in iTrust Membership Management for iTrust Statistical Inference and Dynamic Adaptation for iTrust Conclusions and Future Work 5/3/20132Yung-Ting Chuang's Ph.D. Defense

Motivation Information is accessed over the Internet using centralized search engines Benefits - efficient, robust, and scalable Drawbacks – depends on administrators remaining benign Thus, we present a decentralized and distributed search and retrieval system Benefits – prevent censorship and filtering of information Drawbacks – Need more network bandwidth Difficult to infer membership size and malicious nodes 5/3/20133Yung-Ting Chuang's Ph.D. Defense

1. Related Work 2. Design of iTrust 3. Implementation of iTrust 4. User Interface of iTrust 5. Performance Evaluation of iTrust 6. Summary 5/3/20134Yung-Ting Chuang's Ph.D. Defense

1. Related Work Survey by Mischeke and Risson on distributed search: Structured – Require nodes to be organized in an overlay network Distributed Hash Table (DHT), Ring, Tree, Skip Lists Unstructured – Typically gossip-based, and use randomization Flooding / Broadcast => Gnutella Random walk and data replication => Sarshar, GIA, Lv Key-based routing => Freenet Direct routing => Pub-2-Sub Square root function => Cohen, Zhong, Ferreira P2P systems concerned with security, privacy, and trust Quasar–Uses a structured overlay and protects user’s sensitive information OneSwarm– Uses a combination of trusted and untrusted peers and protect the privacy of the users GOSSPLE – Fully decentralized system for social acquaintances using a gossip protocol. Yung-Ting Chuang's Ph.D. Defense 5/3/20135

2. Design of iTrust a) Distribution of Metadata Source of Information 5/3/20136Yung-Ting Chuang's Ph.D. Defense

Source of Information Requester of Information Request Encounters Metadata 2. Design of iTrust b) Distribution of a Request 5/3/20137Yung-Ting Chuang's Ph.D. Defense

Source of Information Requester of Information Request Matched 2. Design of iTrust c) Retrieval of Information 5/3/20138Yung-Ting Chuang's Ph.D. Defense

3. Implementation of the iTrust System 5/3/20139Yung-Ting Chuang's Ph.D. Defense

4. User Interface of iTrust 5/3/201310Yung-Ting Chuang's Ph.D. Defense

4. User Interface of iTrust 5/3/201311Yung-Ting Chuang's Ph.D. Defense

5. Performance Evaluation of iTrust a) Analytical Model Notation Membership contains n participating nodes x is the proportion of participating nodes that are operational Metadata are distributed to m nodes Requests are distributed to r nodes k nodes report matches to a requesting node (for the same metadata and the same request) 5/3/201312Yung-Ting Chuang's Ph.D. Defense

5. Performance Evaluation of iTrust a) Analytical Model Probability of k matches is: Probability of one or more match is: 5/3/201313Yung-Ting Chuang's Ph.D. Defense

5. Performance Evaluation of iTrust a) Analytical Model 5/3/201314Yung-Ting Chuang's Ph.D. Defense

5. Performance Evaluation of iTrust b) Analysis vs. Emulation 5/3/201315 Yung-Ting Chuang's Ph.D. Defense

6. Summary Problem we are trying to solve: Centralized search engines can be tampered with to bias the results, or to conceal or censor information Our solutions and contributions: We have implemented iTrust, which is a decentralized distributed search and retrieval system with no centralized mechanisms and no centralized control We have demonstrated that the match probability is high, even if some participating nodes are subverted or non-operational 5/3/201316Yung-Ting Chuang's Ph.D. Defense

1. Background 2. Related Work 3. Foundations 4. Detecting Malicious Attacks 5. Defending against Malicious Attacks 6. Performance Evaluation 7. Summary 5/3/201317Yung-Ting Chuang's Ph.D. Defense

1. Background Potential attacks: Nodes do not match requests Nodes do not return responses to requester Effect of such attacks Probability of a match is decreased Existing work that addresses attacks: Place nodes on a blacklist (Jesi) Maintains a reputation or trust score (Condie) Our solution to such attacks is: Estimate the proportion of malicious nodes Increase the number of nodes to which requests are distributed in order to restore match probability 5/3/201318Yung-Ting Chuang's Ph.D. Defense

2. Related Work 5/3/201319 Work related to our detection algorithm Exponential Weighted Moving Average (EWMA) Roberts et al. - For discovering anomalies and issuing alerts Chi-squared test Goonatilake - For detecting intrusions Press et al. - For balancing weights of buckets Belen and Heckert – For determining similarity between two models EWMA and Chi-squared test Ye and Chen - For anomaly detection and intrusion detection Work related to our defensive adaptation algorithm: Morselli – Uses feedback mechanism to adjust the replicas to improve search result Leng – Uses maintainer to determine, update, and eliminate the data replicas Yung-Ting Chuang's Ph.D. Defense

3. Foundations a) Normalization We cannot use requests that return k=0 responses Because there might be no metadata to match Probability of k matches is negligibly small, when k is large Thus, we exclude requests for k=0 and for k > K Our normalization equation is: where 5/3/201320Yung-Ting Chuang's Ph.D. Defense

3. Foundations b) Exponential Weighted Moving Average The EWMA method is computed as follows: where c is the weighting factor for the EWMA method 5/3/201321Yung-Ting Chuang's Ph.D. Defense

3. Foundations c) Chi-Squared vs. Modified Chi-Squared Pearson’s chi-squared statistic: Pearson’s modified chi-squared statistic: where: o k : the actual number of observations that fall into kth bucket e k : the expected number of observations for the kth bucket K: the number of buckets into which the observations fall 5/3/201322Yung-Ting Chuang's Ph.D. Defense

3. Foundations d) Chi-Squared vs Modified Chi-Squared 5/3/201323Yung-Ting Chuang's Ph.D. Defense

4. Detecting Malicious Attacks a) Detection Algorithm 1. Collects responses for its request using EWMA method 2. Normalize empirical probabilities 3. Uses modified chi-squared test to compare the empirical probabilities against the analytical probabilities for x=1.0, 0.7, 0.4, and 0.2 4. Chooses the smallest value of chi-squared to estimate x’ 5/3/201324Yung-Ting Chuang's Ph.D. Defense

4. Detecting Malicious Attacks b) Example 5/3/201325Yung-Ting Chuang's Ph.D. Defense

5. Defending against Malicious Attacks a) Defensive Adaptation Algorithm 5/3/201326 1. Initialize r  0 2. Calculate y o based on current r with given n, m, and x. 3. Determine whether the y o is greater than the expected match probability. A. If not, increase r by 1 and go back to step 2 B. If so, return r Yung-Ting Chuang's Ph.D. Defense

5. Defending against Malicious Attacks b) Example 5/3/201327Yung-Ting Chuang's Ph.D. Defense

6. Performance Evaluation a) Varying the number of nodes 5/3/201328Yung-Ting Chuang's Ph.D. Defense

6. Performance Evaluation 5/3/201329Yung-Ting Chuang's Ph.D. Defense

7. Summary 5/3/201330 Problem we are trying to solve in this chapter: Absence of centralized control makes it difficult to determine the proportion of non-operational nodes in the network Our solution and contributions: A node can estimate the proportion of non-operational nodes in the network based on the responses to its requests A node calculates the number of nodes to which the requests are distributed to maintain a high match probability A node infers useful but unobservable information about the network as a whole by observing aspects of the behaviors of individual nodes that are visible to it Yung-Ting Chuang's Ph.D. Defense

1. Background 2. Related Work 3. iTrust Membership Protocols 4. Foundations 5. Performance Evaluation 6. Extended Scenario 7. Summary 5/3/201331Yung-Ting Chuang's Ph.D. Defense

1. Background Churn – Nodes joining and leaving the membership Challenging tasks Estimating membership and membership size Estimating churn Existing work that addresses churn: Passive Monitoring (Sen et al., Gummadi et al.) Active Probing (Chu et al., Liang, Bhagwan et al.) Gossiping (Bizenhofer, Pruteanu et al) Our approach to address churn: Nodes don’t predict churn characteristics in advance Each node maintains its local view of the membership and uses statistical inference to update its view 5/3/201332Yung-Ting Chuang's Ph.D. Defense

2. Related Work 5/3/201333 Work related to membership management: Zage – Biases neighbor selections toward beneficial nodes SCAMP – Nodes discover joining and leaving nodes through gossiping CYCLON – Nodes maintain a small and fixed-size neighbor list, with a shuffling protocol for large networks Newcast – Each node periodically selects a peer to exchange and update its membership list Work related to churn: Bizenhofer and Pruteanu et al. - Estimate the churn rate through gossiping Stutzbach & Rejaie - Study churn characteristics, highlight problems that cause biased peer selections. Paulo et al. – Maintains dynamic mapping of flows according to the current set of neighbors Liu – Presents an age-based membership protocol with a conservative neighbor maintenance scheme under churn Horowitz et al. – Relies on the departure and arrival of nodes to estimate the current network size, without requiring any additional communication Yung-Ting Chuang's Ph.D. Defense

3. iTrust Membership Protocols a) Joining the Membership Joining Node Bootstrapping Node 345/3/2013Yung-Ting Chuang's Ph.D. Defense

3. iTrust Membership Protocols b) Leaving the Membership Leaving Node 355/3/2013Yung-Ting Chuang's Ph.D. Defense

Source Node 3. iTrust Membership Protocols c) Distributing Metadata Discover Leaving Node Discover New Node 365/3/2013Yung-Ting Chuang's Ph.D. Defense

Requesting Node 3. iTrust Membership Protocols d) Distributing Requests Discover Leaving Node Discover New Node Redistribute Metadata 375/3/2013Yung-Ting Chuang's Ph.D. Defense

4. Foundations a) Metrics LND: Leaves Not Detected JND: Joins Not Detected MA: Membership Accuracy MP: Match Probability for a request RT: Response Time required for a request MC: Message Cost per time unit 5/3/201338Yung-Ting Chuang's Ph.D. Defense

5. Performance Evaluation a) Retry R Membership Protocol Motivation: When a node distributes a request message to R nodes, it might detect some leaving nodes. Therefore, it might not receive exactly R responses. Solution: We allow a node to keep sending its message to more than R nodes until it receives exactly R responses. Our input variables for the Retry R Membership Protocol: Try: The number of times that a requesting node sends its request message in an attempt to receive R responses. TryMax: The maximum Try value. 5/3/201339Yung-Ting Chuang's Ph.D. Defense

5. Performance Evaluation b) Adaptive RR Membership Protocol Our Churn Estimator is: where Left: Number of nodes that were detected as non-operational Joined: Number of nodes that were discovered have joined NumNodes: Number of requests that a requesting node sent The Requesting Rate (RR) is: if CE > RRMin / RRMax then RR  RRMax x CE else RR  RRMin 5/3/201340Yung-Ting Chuang's Ph.D. Defense

5. Performance Evaluation c) Message Cost vs. Membership Accuracy 5/3/2013Yung-Ting Chuang's Ph.D. Defense41 ?

5. Performance Evaluation d) Combined Adaptive Membership Start infinite loop if current time reaches nextTime while Try<=2 and resRec < R make request to (R-resRec) nodes and get responses array determine left, joined, N, responded from responses array resRec = resRec + responded Try = Try + 1 CE = (left+joined) / (R + R – resRec) if CE > 1 / RRMax RR = RRMax x CE else RR = 1 5/3/2013Yung-Ting Chuang's Ph.D. Defense42

5. Performance Evaluation e) Performance Tuning 5/3/201343 Combined Adaptive with Try=2, RRMax = 100, 50, 30 Yung-Ting Chuang's Ph.D. Defense

5. Performance Evaluation e) Message Cost vs. Membership Accuracy 5/3/201344Yung-Ting Chuang's Ph.D. Defense

6. Extended Scenario a) Combined Adaptive Membership Protocol 5/3/201345Yung-Ting Chuang's Ph.D. Defense

7. Summary 5/3/201346 Problem we are trying to solve in this chapter: We cannot accurately estimate the joining or leaving rates, or maintain an accurate view of the membership when the system has high membership churn Our solution and contributions: We presented an adaptive membership management protocol, which uses random sampling to discover newly joining and leaving nodes Based on the responses it received to its request, a node calculates the churn estimator and dynamically adjusts its requesting rate to update its local view of the membership Our membership protocol exploits the messages already required by the messaging protocol Yung-Ting Chuang's Ph.D. Defense

1. Background 2. Model for iTrust 3. Dynamic Adaptation Algorithm 4. Performance Evaluation 5. Summary 5/3/201347Yung-Ting Chuang's Ph.D. Defense

1. Background Problems that co-exist in a fully distributed system High membership churn Large proportion of malicious nodes Our approach to address both problems: Use random sampling Apply statistical inference techniques to estimate: Membership churn with a large proportion of malicious nodes Proportion of malicious nodes in the presence of high membership churn 5/3/201348Yung-Ting Chuang's Ph.D. Defense

2. Model for iTrust a) System and Fault Model We consider the following scenarios A node leaves the membership voluntarily A node leaves the membership involuntarily A malicious node responds to a request but it does not report a match Parameters for membership churn: JR: Joining Rate LR: Leaving Rate Parameters for detecting malicious nodes: X: Proportion of non-malicious nodes 5/3/201349Yung-Ting Chuang's Ph.D. Defense

3. Dynamic Adaptation Algorithm a) Parameters and Variables n: Size of the node’s current view of the membership m: Number of nodes to which the metadata are distributed r: Number of nodes to which the requests are distributed IE: Intersection estimator obtained by random sampling: nIE: Estimate of n in I mIE: Estimate of m in I rIE: Estimate of r in I left: Number of nodes that were detected as non-operational numNodes: Number of requests that a requesting node sent its request 5/3/201350Yung-Ting Chuang's Ph.D. Defense nIE mIE rIE

3. Dynamic Adaptation Algorithm 1. Newly joining node distributes join messages 2. Start infinite loop if current time reaches nextTime if a node is a source node distribute metadata to m nodes if a node is a requesting node distribute request messages to r nodes calculate empirical array O based on the responses it obtained calculates estimator IE, then nIE, mIE, rIE, then update n estimate x’ based on nIE, mIE, rIE, kMax, O estimate r’ based on x’, n, m, y o if a node is a source node, calculate and send more metadata calculate CE and rmr 5/3/2013Yung-Ting Chuang's Ph.D. Defense51

5. Summary Problem we are trying to solve in this chapter: Inferring proportion of malicious nodes and the size of the membership when the network has a lot of churn. Our solution and contributions: We use random sampling and statistical inference for iTrust in the presence of both membership churn and malicious nodes, which are not directly observable. We have demonstrated that the dynamic adaptation algorithm is sufficiently accurate and timely to allow it to be used to estimate both metrics 5/3/201354Yung-Ting Chuang's Ph.D. Defense

1. Trustworthy Distributed Search and Retrieval 2. Protecting against Malicious Attacks in iTrust 3. Membership Management for iTrust 4. Statistical Inference and Dynamic Adaptation for iTrust 5/3/201355Yung-Ting Chuang's Ph.D. Defense

1. Trustworthy Distributed Search and Retrieval Conclusion: We presented iTrust, a distributed search and retrieval system for the Internet to allow people to share information without worrying about censorship of information We have demonstrated that, for appropriate choice of the parameters, the probability of obtaining a match is high Future Work: Investigate the efficiency, scalability, and reliability in Emulab Investigate different classes of nodes, effects of geographical location, and network and processing loads Evaluate the ease of installation and use of iTrust Apply the ideas of iTrust to other applications 5/3/201356Yung-Ting Chuang's Ph.D. Defense

2. Protecting against Malicious Attacks in iTrust Conclusion: We have presented novel statistical algorithms for detecting and defending against malicious attacks We recognize that multiple responses to a request provide valuable information about the network We use statistical inference techniques to infer the characteristics of the network that are not measurable directly Experimental results show the effectiveness of the algorithms for detecting and defending against malicious attacks Future Work: Investigate other kinds of malicious attacks Develop other detection and defensive algorithms Investigate detection algorithm with different sets of metadata 5/3/201357Yung-Ting Chuang's Ph.D. Defense

3. Membership Management for iTrust Conclusion: We have presented membership algorithms that allow each member to maintain its own local view of the membership and keep that view close to the actual membership We exploit messages already required by the messaging protocol, rather than requiring extra messages for membership A requesting node discovers newly joining nodes and leaving nodes, and adjusts its requesting rate accordingly We have demonstrated that our membership algorithm is effective in estimating churn Future Work: Refine the algorithms to handle million of nodes Investigate the performance of the membership protocols in other scenarios 5/3/201358Yung-Ting Chuang's Ph.D. Defense

4. Statistical Inference and Dynamic Adaptation for iTrust Conclusion: We have presented a dynamic adaptive algorithm that uses random sampling and statistical inference to infer information that is not easy to detect or that is expensive to collect The algorithm dynamically adjusts r and rmr to obtain reasonable accuracy, response time, message cost and match probability We have demonstrated that our dynamic adaptive algorithm is effective in maintaining a high match probability and reasonable membership accuracy Future Work: Apply these statistical inference and dynamic adaptation techniques to other fields Create other dynamic adaptation algorithms using random sampling and statistical inference for distributed systems and computer networks 5/3/201359Yung-Ting Chuang's Ph.D. Defense

Questions? Comments? Our iTrust Web Site http://itrust.ece.ucsb.edu Contact information Yung-Ting Chuang: ytchuang@ece.ucsb.edu Our project is supported by NSF CNS 10-16193 5/3/201360Yung-Ting Chuang's Ph.D. Defense

Yung-Ting Chuang Electrical and Computer Engineering University of California, Santa Barbara May 3, 2013 Committee Members: Professor P. Michael Melliar-Smith,

Similar presentations

Presentation on theme: "Yung-Ting Chuang Electrical and Computer Engineering University of California, Santa Barbara May 3, 2013 Committee Members: Professor P. Michael Melliar-Smith,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Yung-Ting Chuang Electrical and Computer Engineering University of California, Santa Barbara May 3, 2013 Committee Members: Professor P. Michael Melliar-Smith,

Similar presentations

Presentation on theme: "Yung-Ting Chuang Electrical and Computer Engineering University of California, Santa Barbara May 3, 2013 Committee Members: Professor P. Michael Melliar-Smith,"— Presentation transcript:

Similar presentations

About project

Feedback