After the Internet Ken Birman Professor, Dept. of Computer Science Cornell University.

Slides:



Advertisements
Similar presentations
Ranveer Chandra Ramasubramanian Venugopalan Ken Birman
Advertisements

Brief-out: Isolation Working Group Topic discussion leader: Ken Birman.
Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University.
Guide to Network Defense and Countermeasures Second Edition
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
Reliable Group Communication Quanzeng You & Haoliang Wang.
Issues of Security and Privacy in Networking in the CBA Karen Sollins Laboratory for Computer Science July 17, 2002.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Building Your Own Firewall Chapter 10. Learning Objectives List and define the two categories of firewalls Explain why desktop firewalls are used Explain.
Lecture 2 Page 1 CS 236, Spring 2008 Security Principles and Policies CS 236 On-Line MS Program Networks and Systems Security Peter Reiher Spring, 2008.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
This work is supported by the National Science Foundation under Grant Number DUE Any opinions, findings and conclusions or recommendations expressed.
Lesson 11-Virtual Private Networks. Overview Define Virtual Private Networks (VPNs). Deploy User VPNs. Deploy Site VPNs. Understand standard VPN techniques.
Flash Crowds And Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites Aaron Beach Cs395 network security.
Larger Site Networks Part2. 2 Ethernet Virtual LANs Hubs versus Switches –Hubs broadcast bits out all ports –Switches usually send a frame out a one port.
Astrolabe Serge Kreiker. Problem Need to manage large collections of distributed resources (Scalable system) The computers may be co-located in a room,
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Ken Birman Cornell University. CS5410 Fall
Epidemic Techniques Chiu Wah So (Kelvin). Database Replication Why do we replicate database? – Low latency – High availability To achieve strong (sequential)
The Future of the Internet Jennifer Rexford ’91 Computer Science Department Princeton University
Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.
DDoS Attack and Its Defense1 CSE 5473: Network Security Prof. Dong Xuan.
Network Topologies.
PROS & CONS of Proxy Firewall
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Computer Measurement Group, India Reliable and Scalable Data Streaming in Multi-Hop Architecture Sudhir Sangra, BMC Software Lalit.
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
1 The SpaceWire Internet Tunnel and the Advantages It Provides For Spacecraft Integration Stuart Mills, Steve Parkes Space Technology Centre University.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 8 – Denial of Service.
Internet Addressing. When your computer is on the Internet, anything you do requires data to be transmitted and received. For example, when you visit.
Communication (II) Chapter 4
GrIDS -- A Graph Based Intrusion Detection System For Large Networks Paper by S. Staniford-Chen et. al.
Remote Access Chapter 4. Learning Objectives Understand implications of IEEE 802.1x and how it is used Understand VPN technology and its uses for securing.
Collaborative Content Delivery Werner Vogels Robbert van Renesse, Ken Birman Dept. of Computer Science, Cornell University A peer-to-peer solution for.
HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.
Knowledge plane based on high level declaration of intent assemble, re-assemble, detect failures & repair focus of this rant failure detection & repair.
Scalability Don McGregor Research Associate MOVES Institute
Distributed Systems: Concepts and Design Chapter 1 Pages
1 An Advanced Hybrid Peer-to-Peer Botnet Ping Wang, Sherri Sparks, Cliff C. Zou School of Electrical Engineering & Computer Science University of Central.
Chapter 4 Realtime Widely Distributed Instrumention System.
Approved for Public Release, Distribution Unlimited QuickSilver: Middleware for Scalable Self-Regenerative Systems Cornell University Ken Birman, Johannes.
Autonomous Replication for High Availability in Unstructured P2P Systems Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Addressing Issues David Conrad Internet Software Consortium.
© 2006 Cisco Systems, Inc. All rights reserved. Cisco IOS Threat Defense Features.
Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.
CS5412: BIMODAL MULTICAST ASTROLABE Ken Birman CS5412 Spring Lecture XIX.
CS5412: SHEDDING LIGHT ON THE CLOUDY FUTURE Ken Birman 1 Lecture XXV.
Lecture 20 Page 1 Advanced Network Security Basic Approaches to DDoS Defense Advanced Network Security Peter Reiher August, 2014.
Scalable Self-Repairing Publish/Subscribe Robbert van Renesse Ken Birman Werner Vogels Cornell University.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Microsoft ISA Server 2000 Presented by Ricardo Diaz Ryan Fansa.
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Got Data? A Guide to Data Preservation in the Information Age Written by Francine Berman Presented by Akadej Udomchaiporn.
CS5412: BIMODAL MULTICAST ASTROLABE Ken Birman Gossip-Based Networking Workshop 1 Lecture XIX Leiden; Dec 06.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Role Of Network IDS in Network Perimeter Defense.
Lecture 17 Page 1 Advanced Network Security Network Denial of Service Attacks Advanced Network Security Peter Reiher August, 2014.
Understanding IT Infrastructure Lecture 9. 2 Announcements Business Case due Thursday Business Analysis teams have been formed Business Analysis Proposals.
Mobile IP THE 12 TH MEETING. Mobile IP  Incorporation of mobile users in the network.  Cellular system (e.g., GSM) started with mobility in mind. 
Internet Quarantine: Requirements for Containing Self-Propagating Code
Firewalls.
TRUST:Team for Research in Ubiquitous Secure Technologies
COS 561: Advanced Computer Networks
Presentation transcript:

After the Internet Ken Birman Professor, Dept. of Computer Science Cornell University

My Background in CIP Professor at Cornell since 1982: 19 years of research on reliable, secure communications software. Author of 150 papers, 2 books… founder and CEO of two companies My software is used by New York and Swiss Stock Exchanges, French air traffic control system, AEGIS warship Lead 1995 DARPA ISAT study of Critical Infrastructure Assurance, recommendations used to retarget DARPA ITO programs

Critical Infrastructure Rapidly Expanding Web of Dependency Massive rollout underway Control of restructured power grid New medical information systems link hospital to other providers, reach right into the home Telephony infrastructure Financial systems: eMoney replaces cash! Disaster response and coordination Future military will be extremely dependent on information resources and solutions

Tangled Interdependencies Power Grid Internet Telephony Banking Internet Software, COTS Technology Base

Multiple Concerns Infrastructure industries have been dangerously naïve about challenges of using Internet and computing technologies in critical ways Nationally critical information systems poorly protected, fragile, easily disrupted Stems from pervasive use of COTS components Vendors poorly motivated to address the issue Yet academic research is having little impact No sense of “excitement” or importance Few significant technology transition successes

Most serious issue? Loss of public interest and enthusiasm Government shares this view “It’s just software; we buy it from Microsoft” Academic researchers often seen as freeloading at taxpayer’s expense Critical infrastructure components often look “less critical” considered in isolation Ten thousand networked medical care systems would worry us, but not individual instances

Concrete Examples of Threats? Power system requires new generation of technology for preventing cascaded failures, implementing load-following power contracts Industry requires solutions but has no idea how to build them. Technical concern “masked” by politics DOE effort is completely inadequate Three branches of military are separately developing real-time information support tools. Scale will be orders of magnitude beyond anything ever done with Internet technologies Goals recall the FAA’s AAS fiasco (lost $6B!)

Vendor Perspective? Little interest in better security “You have zero privacy anyway. Get over it.” Scott McNealy, CEO Sun Microsystems; 1/99 Gates recently suggested that perhaps MSFT needs to improve, but doesn’t have critical infrastructure in mind and didn’t point to Internet issues. Internet technology is adequate for the most commercially lucrative Web functions But inadequate reliability, security for other emerging needs, including CIP requirements Issue is that market is the main driver for product evolution, and market for critical solutions is small

Security: Often mistaken for the whole story Even today, most CIP work emphasizes security and denial of service attacks But critical applications must also work Correctly When and where required Even when components fail or are overloaded Even when the network size grows or the application itself is used on a large scale Even when the network is disrupted by failures

Let’s get technical A digression to illustrate both the potential for progress but also the obstacles we confront!

Scalability: Achilles Heel of a Networked World? 1980’s: Client-server architectures. 1 server, 10’s of simultaneous clients 1990’s: Web servers Small server cluster in a data center or farm 1000’s of simultaneous clients First decade of 2000? Server “geoplex”: large farms in a WAN setting 10’s of 1000’s of simultaneous clients Emergence of peer-to-peer applications: “live” collaboration and sharing of objects Wireless clients could add another factor of 10 client load

Technologies need to keep pace We want predictable, stable performance, reliability, security … despite Large numbers of users Large physical extent of network Increasing rates of infrastructure disruption (purely because of growing span of network) Wide range of performance profiles Growth in actual volume of work applications are being asked to do

Scalable Publish Subscribe A popular paradigm; we’ll use it to illustrate our points Used to link large numbers of information sources in commercial or military settings to even larger numbers of consumers Track down the right servers Updates in real-time as data changes Happens to be a top military priority, so one could imagine the government tackling it…

Server cluster Subscriber must identify the best servers. Subjects are partitioned among servers hence one subscriber may need multiple connections Publisher offers new events to a proxy server. Subjects are partitioned among the server sets. In this example there are four partitions: blue, green, yellow and red. Server set and partition function can adjust dynamically Like the subscribers, each publisher connects to the “best” proxy (or proxies) given its own location in the network. The one selected must belong to the partition handling the subject of the event. log publish

Large-scale applications with similar technical requirements Restructured Electric Power Grid Large-scale financial applications Disaster response Community medical systems Large-scale online information provision Decentralized stock markets Network monitoring, control

Poor Scalability Long “rumored” for distributed computing technologies and tools Famous study by Jim Gray points to scalability issues in distributed databases Things that scale well: Tend to be stateless or based on soft state Have weak reliability semantics Are loosely coupled

Do current technologies scale? CategoryTypical large useLimits? Client-Server and object- oriented environments LAN system, perhaps 250 simultaneous clients Server capacity limits scale. Web-like architecturesInternet, hundreds of clients No reliability guarantees Publish-subscribe Group multicast About 50 receivers, 500 in hierarchies Throughput becomes unstable with scale. Multicast storms Many-Many DSMRarely seen except in small clusters Update costs grow with cluster size Shared databaseFarm: ; RACS: 100’s, RAPS: 10’s Few successes with rapidly changing real-time data

Stock Exchange Problem: Vsync. multicast is too “fragile” Most members are healthy…. … but one is slow Most members are healthy….

With 32 processes… Virtually synchronous Ensemble multicast protocols perturb rate average throughput on nonperturbed members ideal actual

The problem gets worse as the system scales up Virtually synchronous Ensemble multicast protocols perturb rate average throughput on nonperturbed members group size: 32 group size: 64 group size:

Why doesn’t anything scale? With weak semantics… Faulty behavior may occur more often as system size increases (think “the Internet”) With strong semantics… Encounter a system-wide cost (e.g. membership reconfiguration, congestion control) That can be triggered more often as a function of scale (more failures, or more network “events”, or bigger latencies) Gray’s O(n 2 ) database degradation reflects very similar issues… a new law of nature?

Serious issue for our scalable publish-subscribe technology What if we build it for the military or some other critical use, and it works in the laboratory but not in the field? Early evaluation has ruled out most off-the-shelf networking technologies They just don’t have the necessary scalability! In fact, this happened with Navy’s Cooperative Engagement Capability (CEC) They built it… but it melts down under stress!

Fight fire with fire! Turn to randomized protocols… … with probabilistic reliability goals This overcomes the scalability problems just seen Then think about how to “present” mechanism to user

Cornell Scalability Research Spinglass: Scalable technologies that step into the gap Includes two major components: Astrolabe: A resource location technology Bimodal Multicast: A scalable multicast

Astrolabe Goal is to create a dynamic database showing continuously evolving state of the programs comprising some system We’ll use this to build better systems Approach: “peer to peer gossip”. Basically, each machine has a piece of a jigsaw puzzle. Assemble it on the fly

Astrolabe in a single domain NameLoadWeblogic?SMTP?Word Version… swift falcon cardinal Row can have many columns Total size should be k-bytes, not megabytes Configuration certificate determines what data is pulled into the table (and can change)

Build a hierarchy using a P2P protocol that “assembles the puzzle” without any servers NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris San Francisco New Jersey SQL query “summarizes” data Dynamically changing query output is visible system-wide

(1) Query goes out… (2) Compute locally… (3) results flow to top level of the hierarchy NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris San Francisco New Jersey

Hierarchy is virtual… data is replicated NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris San Francisco New Jersey

Hierarchy is virtual… data is replicated NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris San Francisco New Jersey

Examples? A flexible, user-programmable mechanism Which sensors are reporting detection of low- levels of chemical warfare agents? Which soldiers are downwind from location X,Y? Where can I find intelligence about the building located at coordinates X,Y? Which machines are running WebLogic v 3.2? Think of aggregation functions as small agents that look for information When changes occur, aggregated table reflects those changes within seconds

Astrolabe summary Scalable: technology can support hundreds of thousands of participants Flexible: can easily extend domain hierarchy, define new columns or eliminate old ones Secure: Uses keys for authentication and can even encrypt Handles firewalls gracefully, including issues of IP address re-use behind firewalls Performs well: updates propagate in seconds Cheap to run: tiny load, small memory impact

Contrast with most P2P schemes Our peer-to-peer approach is implemented using pseudo-random gossip In contrast most peer-to-peer architectures Are specifically intended to support file systems Don’t use pseudo-random P2P patterns Any hierarchical structure is “real”; ours is an abstraction constructed by the protocol itself

Bimodal Multicast Work we did before developing Astrolabe A multicast, hence more directly comparable with older replication technology Also uses peer-to-peer gossip. We’ll review it to illustrate the approach

Reminder: Multicast scaling issue Virtually synchronous Ensemble multicast protocols perturb rate average throughput on nonperturbed members group size: 32 group size: 64 group size:

Start by using unreliable multicast to rapidly distribute the message. But some messages may not get through, and some processes may be faulty. So initial state involves partial distribution of multicast(s) Uses IP multicast (unreliable) to distribute data

Periodic gossip used to repair message loss. (In reality, it isn’t synchronized, and we do all sorts of things to avoid excessive gossip over WAN links or slow connections) Rounds of gossip repair gaps

Bimodal Multicast uses gossip Dates back to NNTP, Clearinghouse. Best papers are by Demers et. al. Periodically, each process picks some other process, merge states Mathematics: epidemic infection Must tune to deal with bandwidth, latency issues, other pragmatics For example, if a region is missing some data, we re-multicast it locally

Figure 5: Graphs of analytical results Bimodal Multicast is amenable to formal analysis

Unlimited scalability! Probabilistic gossip “routes around” congestion And probabilistic reliability model lets the system move on if a computer lags behind Results in: Constant communication costs Constant loads on links Steady behavior even under stress

Server cluster Subscriber must identify the best servers. Subjects are partitioned among servers hence one subscriber may need multiple connections Publisher offers new events to a proxy server. Subjects are partitioned among the server sets. In this example there are four partitions: blue, green, yellow and red. Server set and partition function can adjust dynamically Like the subscribers, each publisher connects to the “best” proxy (or proxies) given its own location in the network. The one selected must belong to the partition handling the subject of the event. log publish

Server cluster Subscriber must identify the best servers. Publisher uses Astrolabe to identify the correct set of receivers log Bimodal Multicast Astrolabe manages configuration and connection parameters, tracks system membership and state. The combined technologies solve the initial problem!

Good things? Both technologies overcome Internet limitations using randomized P2P gossip However, Internet routing can “defeat” our clever solutions unless we know network topology Both have great scalability and can survive under stress And both are backed by formal models as well as real code and experimental data Indeed, analysis is “robust” too!

Bad things? These are middleware, and the bottom line is that only MSFT can sell middleware! Current commercial slump doesn’t help; nobody is buying anything Indeed, while everything else advances at “Internet speed”… the Internet architecture has somehow gotten stuck circa 1985! The Internet: Unsafe at any speed?

The Internet “policy” Assumes almost everything uses TCP TCP is designed to be greedy Ratchet bandwidth up until congestion occurs Routers are designed to drop packets They use RED (Random Early Detection) Throw away packets at random until TCP gets the point and slows down Our problem? We’re not running TCP and this policy penalizes us, although it “works” for TCP….

Internet itself: Main weak point Our hardest open problems arise in the Internet Astrolabe and Bimodal Multicast don’t do much for security They need to know network topology… but the Internet conceals this information We could perhaps use these tools to detect and react to a DOS attack at the application layer, but in fact such an attack can only be stopped in the network itself Butler Lampson: “The Internet and the Web are successful precisely because they don’t need to work very well to succeed”

The Internet got stuck in 1985 Critical Infrastructure Protection hostage to a perception that the Internet is perfect! Must somehow recapture the enthusiasm of the field and the commercial sector for evolution and change Scalability: building massive systems that work really well and yet make full use of COTS Awesome performance, even under stress Better Internet: Time for a “Supernet”?

Lagging public interest An extremely serious problem The Internet boomed… then it melted down And we’re Internet people Even worse in the CIP area We predicted disaster in 1996… 1999… 2000 Cyberterrorists… “Internet will melt down” We’re the people who keep crying wolf Realistically, can’t fight this perception Argues that CIP success will have to come from other pressures, not a direct public clamor!

Best hope? Government could require that suppliers implement best practice security and “defense” techniques Will MSFT emphasis on reliability and security trigger a wave of commercial products in these areas? Or will the 800lb gorilla just crush the whole market? Reexamine legal basis for “hold harmless” clauses that indemnify software vendors against damages if products are defective through outright negligence Growing need for homeland defense tools might help

Conclusions CIP hostage to complacency as an undramatic threat slowly grows! Nationally critical infrastructure is exposed to security & reliability problems and this exposure is growing, yet is largely ignored. Research effort has contracted around an overly theoretical security community Current trend a recipe for economic stagnation. Inadequate technology blocks new markets