Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS.

Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS slide content courtesy of Brian Cooper

2 Secure Transactions  Authentication using public/private key pairs is essential today  Consider every Web transaction – we want to know whom we’re conversing with! … versus ending up with a phishing attack!

3 Secure Sockets Layer (SSL) Relies on a trusted third party  Certificate authority (CA) issues certificates to certify a server and its public key  Verisign is perhaps the best known of these A server S generates public-private keypair  Sends the public key, other info (plus $$$) to Verisign (etc.)  Gets back a certificate with:  CA name  S’s name, URL, public key  Timestamp and expiration info

4 Example Certificate Owner: CN=GTE CyberTrust Root, O=GTE Corporation, C=US Issuer: CN=GTE CyberTrust Root, O=GTE Corporation, C=US Serial number: 1a3 Valid from: Fri Feb 23 23:01:00 GMT 1996 until: Thu Feb 23 23:59:00 GMT 2006 Certificate fingerprints: MD5: C4:D7:F0:B2:A3:C5:7D:61:67:F0:04:CD:43:D3:BA:58 SHA1: 90:DE:DE:9E:4C:4E:9F:6F:D8:86:17:57:9D:D3:91:BC:65:A6:89:64

5 The SSL Protocol Client C connects to server S from enterprise E  S sends E’s certificate (cleartext)  C validates the certificate using the CA (e.g., Verisign)’s public key  C generates and sends to S a session key encrypted with E’s public key Java has built-in support for SSL (Java Secure Socket Extension, integrated in 1.4) and a tool for managing certificates (keytool)

6 So…  The client and server know each other given SSL  How do we go ahead and make a purchase?  Most commonly: you enter your credit card number  Sometimes this is stored in the retailer’s system for future purposes!  Best case:  The CC info is stored in a special, firewalled server, not part of the web site  Web server has other account info about you  When a transaction goes through, web site sends order to this special server, which combines it with CC info and sends it onward

7 Replication… Core of the Cloud  The vision of the “cloud”: a “computing utility” that is geographically distributed  At its core: geographical replication as well as partitioning  What to replicate (including granularity)  Where to replicate  How to maintain consistency (and how fresh data needs to be)

8 What to Replicate Cost to maintaining consistency if data is changing  Larger objects, slower networks, frequent updates, freshness requirements  replication is more expensive  May be able to send a “diff” instead of the whole object Thus, difference between LAN and WAN replication:  Local-area / cluster:  Single-writer, multiple-reader data is often replicated  e.g., CNN  Wide-area:  Need to limit replication to seldom-updated data, or relax the freshness or consistency constraints  e.g., Akamai (images, video), Google index

9 Where to Place Replicas in the Internet Want to place them at points where they can handle many requests and reduce traffic in bottlenecks  Commonly, at least one replica in Europe, Asia, US West Coast, US East Coast Server 1 Server 2 congested or failure-prone link C3 C2 C1 C4 C5C6 C7 C8 C9

10 Schemes for Maintaining Consistency Goal is to trade off performance vs. consistency guarantees  Lock-based protocols  Invalidation  Lease  Time-to-live

11 Lock-Based Protocols  Guarantee strong consistency  Similar to distributed version of what’s done in a database  Client request for an item requires a read lock at its handling server  Update to an item requires a write lock  Multiple read locks can be held concurrently; write lock must be exclusive What are the potential pitfalls of this approach? Is it resilient to network partition?

12 Invalidation Protocols  If a server is to update an item, it can multicast this to all replicas  Requires servers to know who all of the other parties are  May be somewhat weaker than lock-based models – why? Common variation: lease-based protocol  A replicated item is “leased” for a particular period  If the item is updated during its lease, it is invalidated/refreshed  After it expires, it is dropped What are the pros and cons of these protocols?

13 Time-to-Live-Based Replication  Generally used when freshness constraints aren’t severe  Replicas are provided with an expectation for how likely they are likely to be current  After the “time-to-live” expires, they need to be revalidated How does this compare to the previous approaches?

Replication in “Cloud” Services  Yahoo’s PNUTS, Google’s BigTable are based on the notion that there is locality of data access  Consider consistency within each record but ignore cross- record consistency  e.g., in a social network, we should coordinate accesses to the same user (but don’t care about consistency with unrelated friends) … but even here, we might be able to tolerate relaxed consistency among the users 14

15 Yahoo’s PNUTS Platform E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E Parallel database Geographic replication Indexes and views Structured, flexible schema A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E

16 Query model Per-record operations  Get  Set  Delete Multi-record operations  Multiget  Scan  Getrange Web service (RESTful) API

System Architecture 17 Storage units Routers Tablet controller REST API Clients Local region Remote regions YMB

18 Tablet splitting and balancing Each storage unit has many tablets (horizontal partitions of the table) Tablets may grow over time Overfull tablets split Storage unit may become a hotspot Shed load by moving tablets to other servers Storage unit Tablet

19 Accessing data SU 1 Get key k 2 3 Record for key k 4

20 Storage unit 1Storage unit 2Storage unit 3 Range queries Router Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon Grapefruit…Pear? Grapefruit…Lime? Lime…Pear? MIN-CanteloupeSU1 Canteloupe-LimeSU3 Lime-StrawberrySU2 Strawberry-MAXSU1 Strawberry-MAX SU2Lime-Strawberry SU3Canteloupe-Lime SU1MIN-Canteloupe

21 Updates 1 Write key k 2 7 Sequence # for key k 8 SU 3 Write key k 4 5 SUCCESS 6 Write key k Routers Message brokers

22 Asynchronous replication and consistency

23 Asynchronous Replication

24  Goal: make it easier for applications to reason about updates and cope with asynchrony  Consider a single record for Brian Cooper’s Facebook entry: Time Record inserted Update Delete Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Update Consistency Model

25 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Current version Stale version Read (local) Consistency Model

26 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read up-to-date Current version Stale version Consistency Model

27 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read ≥ v.6 Current version Stale version Consistency Model

28 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write Current version Stale version Consistency Model

29 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write if = v.7 ERROR Current version Stale version Consistency Model

30 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write if = v.7 ERROR Current version Stale version Mechanism: per record mastership Consistency Model

PNUTS Recap  An interesting compromise between consistency and performance/availability  Used underneath many of Yahoo’s properties  … And an exemplar of the new generation of cloud services 31

32 Experiments – Show It’s So!  The general goal: to help demonstrate and show why a real- world artifact provides a benefit  Versus some benchmark or naïve strategy  We also want to understand why there’s a benefit  Some common kinds of experiments:  Usability: some sort of user tests, versus a benchmark  Performance: as we increase the workload, what happens?  Scalability: as we increase the data, devices, nodes, what happens?  Complexity: especially for things like code, what happens as we make the task harder or bigger?

33 Experimentation  In general, experiments should follow the scientific method:  Hypothesis (e.g., our method will do better than XYZ on workloads like QWV, which are representative of domain ABC)  Experiment (examine this – may need many trials, random workloads, etc.)  Conclusion (show, with statistically significant measurements, that the hypothesis is true)  Often, the hypothesis almost goes unsaid in computer science – it’s implicit in the choice of the problem – but it is there!  Note that many attributes, e.g., elegance, style, are not very amenable to experiments  Others, like expressiveness, generally need to be proven rather than run

34 Experimental Workloads  There are generally three kinds of systems experiments:  Synthetic microbenchmark: experimental runs are done over inputs that are generated to stress a specific factor, but is not particularly realistic  Examples: a hard disk random access test; a web server’s maximum throughput  Really shows the factor of interest; can be tweaked, scaled, etc.  Synthetic based on real behavior: experimental runs are done over inputs that are modeled after real data, but perhaps generated randomly  Examples: SPEC benchmarks; TPC-W web transaction benchmark  Enables us to generate more inputs, testing scalability, etc.  Real-world: traces are collected of real system behavior over real data  Disadvantage: hard to quantify or control the different factors

Experimental Methodology  Consider the important factors that you wish to examine (and demonstrate)  Scalability – can typically be in terms of running time, size of the problem, space consumed, etc.  Here: performance is what matters  Break it down into individual parameters  Crawl & index time; time to answer a query; etc.  Consider a workload that helps measure the parameter  Crawl 1000 documents; run 50 queries 10 times apiece; etc.  Vary one parameter at a time, study effects  Number of machines; number of threads per machine; etc.  Run experiment multiple times; average and show 95% confidence intervals in line (continuous) or bar (discrete) chart 35

36 Course Recap (Until Next Week’s Midterm 2!)  Distributed, Web-scale systems are here to stay!  They create many issues that are not totally resolved, and for which there is no one answer:  Heterogeneity  Timing  Partitioning and replication  Consistency and integrity  Etc.  This course tried to give you a sense of the issues and state-of-the-art – as well as the skills to go out and work in this domain  I hope the amount of work we all sank into the material (and the homeworks) will pay off for you!  And stay tuned – there’s lots more to come!  Sensor networks, semantic Web, mobile systems, location-based services, …

Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS.

Similar presentations

Presentation on theme: "Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS.

Similar presentations

Presentation on theme: "Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 13, 2015 PNUTS."— Presentation transcript:

Similar presentations

About project

Feedback