CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.

Slides:



Advertisements
Similar presentations
Ethernet Switch Features Important to EtherNet/IP
Advertisements

Brief-out: Isolation Working Group Topic discussion leader: Ken Birman.
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University.
Database Architectures and the Web
Chapter 4 Infrastructure as a Service (IaaS)
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
High Performance Computing Course Notes Grid Computing.
Chapter 22: Cloud Computing and Related Security Issues Guide to Computer Network Security.
Cloud Testing – Guidelines and Approach. Agenda Understanding “The Cloud”? Why move to Cloud? Testing Philosophy Challenges Guidelines to select a Cloud.
Building Your Own Firewall Chapter 10. Learning Objectives List and define the two categories of firewalls Explain why desktop firewalls are used Explain.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
Ken Birman Cornell University. CS5410 Fall
Ken Birman Cornell University. CS5410 Fall
Notes to the presenter. I would like to thank Jim Waldo, Jon Bostrom, and Dennis Govoni. They helped me put this presentation together for the field.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
DDoS Defense by Offense Presented by: Matthew C.H. Ma Damon Chan.
After the Internet Ken Birman Professor, Dept. of Computer Science Cornell University.
LYU9802 Quality of Service in Wired/Wireless Communication Networks: Techniques and Evaluation Supervisor: Dr. Michael R. Lyu Marker: Dr. W.K. Kan Wan.
Larger Site Networks Part2. 2 Ethernet Virtual LANs Hubs versus Switches –Hubs broadcast bits out all ports –Switches usually send a frame out a one port.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
The Future of the Internet Jennifer Rexford ’91 Computer Science Department Princeton University
SPRING 2011 CLOUD COMPUTING Cloud Computing San José State University Computer Architecture (CS 147) Professor Sin-Min Lee Presentation by Vladimir Serdyukov.
(part 3).  Switches, also known as switching hubs, have become an increasingly important part of our networking today, because when working with hubs,
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Network Topologies.
Client/Server Architectures
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Computer Measurement Group, India Reliable and Scalable Data Streaming in Multi-Hop Architecture Sudhir Sangra, BMC Software Lalit.
David I. McGeown "Where's the value in on site generation? Can real time metering and dispatch make a difference?
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
1 The SpaceWire Internet Tunnel and the Advantages It Provides For Spacecraft Integration Stuart Mills, Steve Parkes Space Technology Centre University.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
Communication (II) Chapter 4
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec
Exploring the Network.
Scalability Don McGregor Research Associate MOVES Institute
Distributed Systems: Concepts and Design Chapter 1 Pages
Cloud Computing Dave Elliman 11/10/2015G53ELC 1. Source: NY Times (6/14/2006) The datacenter is the computer!
Russ Housley IETF Chair Internet2 Spring Member Meeting 28 April 2009 Successful Protocol Development.
Distributed Systems and Algorithms Sukumar Ghosh University of Iowa Spring 2011.
Lecture 1 Page 1 CS 239, Fall 2010 Distributed Denial of Service Attacks and Defenses CS 239 Advanced Topics in Computer Security Peter Reiher September.
© 2006 Cisco Systems, Inc. All rights reserved. Cisco IOS Threat Defense Features.
Authors: Ronnie Julio Cole David
CS5412: SHEDDING LIGHT ON THE CLOUDY FUTURE Ken Birman 1 Lecture XXV.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Group member: Kai Hu Weili Yin Xingyu Wu Yinhao Nie Xiaoxue Liu Date:2015/10/
Got Data? A Guide to Data Preservation in the Information Age Written by Francine Berman Presented by Akadej Udomchaiporn.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Why TRUST? Why this team? Ken Birman TRUST TRUST:Team for Research in Ubiquitous Secure Technologies September 13th 2004 NSF STC Review.
Role Of Network IDS in Network Perimeter Defense.
Lecture 17 Page 1 Advanced Network Security Network Denial of Service Attacks Advanced Network Security Peter Reiher August, 2014.
Understanding IT Infrastructure Lecture 9. 2 Announcements Business Case due Thursday Business Analysis teams have been formed Business Analysis Proposals.
Topologies and behavioral properties of the network Yvon Kermarrec Based on tml.
Distributed Systems Architecure. Architectures Architectural Styles Software Architectures Architectures versus Middleware Self-management in distributed.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Lecture 9 Page 1 CS 236 Online Firewalls What is a firewall? A machine to protect a network from malicious external attacks Typically a machine that sits.
Built on the Powerful Microsoft Azure Platform, Lievestro Delivers Care Information, Capacity Management Solutions to Hospitals, Medical Field MICROSOFT.
TRUST:Team for Research in Ubiquitous Secure Technologies
Azure-Powered Solution Ensures Great Precision and Clinical Accuracy for Health Monitoring Kit “Security and privacy are at the core of how we develop.
COS 561: Advanced Computer Networks
CS514: Intermediate Course in Operating Systems
Presentation transcript:

CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA

After the Internet We’re living at the end of history… For government types, that refers to the fall of the Berlin Wall and the collapse of the USSR For us, it refers to the.COM boom and bust The Internet had infinite promise until 2000… now it is approaching maturity What do we know how to do? What major challenges do we face as we look at the “after the Internet” picture?

Critical Infrastructure Rapidly Expanding Web of Dependency Massive rollout underway Control of restructured power grid New medical information systems link hospital to other providers, reach right into the home Telephony infrastructure Financial systems: eMoney replaces cash! Disaster response and coordination Future military will be extremely dependent on information resources and solutions

Tangled Interdependencies Power Grid Internet Telephony Banking Internet Software, COTS Technology Base

Multiple Concerns Infrastructure industries have been dangerously naïve about challenges of using Internet and computing technologies in critical ways Nationally critical information systems poorly protected, fragile, easily disrupted Stems from pervasive use of COTS components Vendors poorly motivated to address the issue Yet academic research is having little impact No sense of “excitement” or importance Few significant technology transition successes

Most serious issue? Loss of public interest and enthusiasm Government shares this view “It’s just software; we buy it from Microsoft” Academic researchers often seen as freeloading at taxpayer’s expense Critical infrastructure components often look “less critical” considered in isolation Ten thousand networked medical care systems would worry us, but not individual instances

Concrete Examples of Threats? Power system requires new generation of technology for preventing cascaded failures, implementing load-following power contracts Industry requires solutions but has no idea how to build them. Technical concern “masked” by politics DOE effort is completely inadequate Three branches of military are separately developing real-time information support tools. Scale will be orders of magnitude beyond anything ever done with Internet technologies Goals recall the FAA’s AAS fiasco (lost $6B!)

Concrete examples of threats? 2003 East Coast blackout Restructuring of power grid broke it into multiple competing producers / consumers But technology to monitor and control the restructured grid lagged the need Consequences of this deficiency? Operators were unable to make sense of a slowly cascading instability that ultimately engulfed the whole East Coast!

Vendor Perspective? Little interest in better security “You have zero privacy anyway. Get over it.” Scott McNealy, CEO Sun Microsystems; 1/99 In contrast, Bill Gates has often stated that MSFT needs to improve But doesn’t have critical infrastructure in mind And he doesn’t point to Internet issues. Internet technology is adequate for the most commercially lucrative Web functions But inadequate reliability, security for other emerging needs, including CIP requirements Issue is that market is the main driver for product evolution, and market for critical solutions is small

Security: Often mistaken for the whole story Even today, most CIP work emphasizes security and denial of service attacks But critical applications must also work Correctly When and where required Even when components fail or are overloaded Even when the network size grows or the application itself is used on a large scale Even when the network is disrupted by failures

Market failure Refers to situations in which a good technology is unsuccessful as a product For example, everyone wants reliability Many people like group communication But how much will they pay for it? One metric: “as a fraction of their total software investment for the same machines” Probably not more than 5-10% Revenue stream may be too small to sustain healthy markets and product growth

Let’s get technical A digression to illustrate both the potential for progress but also the obstacles we confront!

Scalability: Achilles Heel of a Networked World? 1980’s: Client-server architectures. 1 server, 10’s of simultaneous clients 1990’s: Web servers Small server cluster in a data center or farm 1000’s of simultaneous clients First decade of 2000? Server “geoplex”: large farms in a WAN setting 10’s of 1000’s of simultaneous clients Emergence of peer-to-peer applications: “live” collaboration and sharing of objects Wireless clients could add another factor of 10 client load

Technologies need to keep pace We want predictable, stable performance, reliability, security … despite Large numbers of users Large physical extent of network Increasing rates of infrastructure disruption (purely because of growing span of network) Wide range of performance profiles Growth in actual volume of work applications are being asked to do

Scalable Publish Subscribe A popular paradigm; we’ll use it to illustrate our points Used to link large numbers of information sources in commercial or military settings to even larger numbers of consumers Track down the right servers Updates in real-time as data changes Happens to be a top military priority, so one could imagine the government tackling it…

Server cluster Subscriber must identify the best servers. Subjects are partitioned among servers hence one subscriber may need multiple connections Publisher offers new events to a proxy server. Subjects are partitioned among the server sets. In this example there are four partitions: blue, green, yellow and red. Server set and partition function can adjust dynamically Like the subscribers, each publisher connects to the “best” proxy (or proxies) given its own location in the network. The one selected must belong to the partition handling the subject of the event. log publish

Large-scale applications with similar technical requirements Restructured Electric Power Grid Large-scale financial applications Disaster response Community medical systems Large-scale online information provision Decentralized stock markets Network monitoring, control

Poor Scalability Long “rumored” for distributed computing technologies and tools Famous study by Jim Gray points to scalability issues in distributed databases Things that scale well: Tend to be stateless or based on soft state Have weak reliability semantics Are loosely coupled

Do current technologies scale? CategoryTypical large useLimits? Client-Server and object- oriented environments LAN system, perhaps 250 simultaneous clients Server capacity limits scale. Web-like architecturesInternet, hundreds of clients No reliability guarantees Publish-subscribe Group multicast About 50 receivers, 500 in hierarchies Throughput becomes unstable with scale. Multicast storms Many-Many DSMRarely seen except in small clusters Update costs grow with cluster size Shared databaseFarm: ; RACS: 100’s, RAPS: 10’s Few successes with rapidly changing real-time data

Recall the Stock Exchange Problem: Vsync. multicast is too “fragile” Most members are healthy…. … but one is slow Most members are healthy….

With 32 processes… Virtually synchronous Ensemble multicast protocols perturb rate average throughput on nonperturbed members ideal actual

The problem got worse as the system scaled up Virtually synchronous Ensemble multicast protocols perturb rate average throughput on nonperturbed members group size: 32 group size: 64 group size:

Why doesn’t anything scale? With weak semantics… Faulty behavior may occur more often as system size increases (think “the Internet”) With strong semantics… Encounter a system-wide cost (e.g. membership reconfiguration, congestion control) That can be triggered more often as a function of scale (more failures, or more network “events”, or bigger latencies) Gray’s O(n 2 ) database degradation reflects very similar issues… a new law of nature?

Serious issue for our scalable publish-subscribe technology What if we build it for the military or some other critical use, and it works in the laboratory but not in the field? Early evaluation has ruled out most off-the-shelf networking technologies They just don’t have the necessary scalability! In fact, this happened with Navy’s Cooperative Engagement Capability (CEC) They built it… but it melts down under stress!

Fight fire with fire! Turn to randomized protocols… … with probabilistic reliability goals This overcomes the scalability problems just seen Then think about how to “present” mechanism to user

Tools in our toolkit Traditional deterministic tools: Virtual synchrony: Only in small groups Paxos Transactions New-age probabilistically reliable ones: Bimodal multicast Astrolabe DHTs

Server cluster Subscriber must identify the best servers. Subjects are partitioned among servers hence one subscriber may need multiple connections Publisher offers new events to a proxy server. Subjects are partitioned among the server sets. In this example there are four partitions: blue, green, yellow and red. Server set and partition function can adjust dynamically Like the subscribers, each publisher connects to the “best” proxy (or proxies) given its own location in the network. The one selected must belong to the partition handling the subject of the event. log publish We can use Bimodal Multicast here This replication problem looks like an instance of virtual synchrony Perhaps this client can use Astrolabe to pick a server

Server cluster Subscriber must identify the best servers. Publisher uses Astrolabe to identify the correct set of receivers log Bimodal Multicast Astrolabe manages configuration and connection parameters, tracks system membership and state. The combined technologies solve the initial problem!

A glimpse inside a data center Pub-sub combined with point-to-point communication technologies like TCP LB service LB service LB service LB service LB service “front-end applications”, web sites, web services routers + legacy systems…

Cornell: QuickSilver platform in a datacenter Query sourceUpdate source Services are hosted at data centers but accessible system-wide Server pool Data center A Data center B To send a query, client needs a way to “map” to appropriate partition of the target service and then to locate a suitable representative of the appropriate cluster To send an update, we not only need to find the cluster, but also initiate some form of replication protocol: a multicast, chain update, 1SR transaction, etc. Notice the potentially huge numbers of replications “groups”: the selected technology must not only be fault-tolerant and fast, but it also needs to scale in numbers of distribution patterns… a dimension as yet unexplored by research community and overlooked in most products! System administrators will need a way to monitor the state of all these services. This hierarchical database is a good match with Astrolabe, an example of a P2P solution Cornell has been exploring. They also need a way to update various control parameters at what may be tens of thousands of locations. The resulting “scalable” reliable multicast problem is also one Cornell has looked at recently. Best hope for dealing with legacy components is to somehow “wrap” them in a software layer designed to integrate them with the monitoring and control infrastructure and bring autonomic benefits to bear on them where practical. By intercepting inputs or replicating checkpoints may be able to harden these to some degree

Good things? We seem to have technologies that can overcome Internet limitations using randomized P2P gossip However, Internet routing can “defeat” our clever solutions unless we know network topology These have great scalability and can survive under stress And both are backed by formal models as well as real code and experimental data Indeed, analysis is “robust” too!

Bad things? These are middleware, and the bottom line is that only MSFT can sell middleware! Current commercial slump doesn’t help; nobody is buying anything Indeed, while everything else advances at “Internet speed”… the Internet architecture has somehow gotten stuck circa 1985! Is this an instance of a market failure? The modern Internet: Unsafe at any speed?

The Internet “policy” Assumes almost everything uses TCP TCP is designed to be greedy Ratchet bandwidth up until congestion occurs Routers are designed to drop packets They use RED (Random Early Detection) Throw away packets at random until TCP gets the point and slows down Our problem? We’re not running TCP and this policy penalizes us, although it “works” for TCP….

Internet itself: Main weak point Our hardest open problems arise in the Internet Astrolabe and Bimodal Multicast don’t do much for security They need to know network topology… but the Internet conceals this information We could perhaps use these tools to detect and react to a DOS attack at the application layer, but in fact such an attack can only be stopped in the network itself Butler Lampson: “The Internet and the Web are successful precisely because they don’t need to work very well to succeed”

The Internet got stuck in 1985 Critical Infrastructure Protection hostage to a perception that the Internet is perfect! Must somehow recapture the enthusiasm of the field and the commercial sector for evolution and change Scalability: building massive systems that work really well and yet make full use of COTS Awesome performance, even under stress Better Internet: Time for a “Supernet”?

Lagging public interest An extremely serious problem The Internet boomed… then it melted down And we’re Internet people Even worse in the CIP area We predicted disaster in 1996… 1999… 2000 Cyberterrorists… “Internet will melt down” We’re the people who keep crying wolf Realistically, can’t fight this perception Argues that CIP success will have to come from other pressures, not a direct public clamor!

A missing “pipeline” Long term research Fundamental questions: 10 year time horizon New practical options: 5 years from products Industry stakeholders ready to apply good ideas in real settings Companies interested in ideas for new products Researchers at Cornell Researchers at SRI Developers at the Electric Power Research Institute COTS solutions Practical needs Basic needs

Best hope? Government must work with all three communities: CIP stakeholders, researchers, vendors A tricky role: consider MSFT initiative on security Will MSFT trigger a wave of commercial products? Or will the 800lb gorilla just crush the whole market? Reexamine legal basis for “hold harmless” clauses that indemnify software vendors against damages if products are defective through outright negligence Growing need for military, homeland defense helps But need to balance against understandable inclination to keep such programs “ black”

Conclusions CIP hostage to complacency as an undramatic threat slowly grows! Nationally critical infrastructure is exposed to security & reliability problems and this exposure is growing, yet is largely ignored. Research effort has contracted around an overly theoretical security community Current trend a recipe for economic stagnation. Inadequate technology blocks new markets