A P2P-based Storage Platform for Storing Session Data in Internet Access Networks T. Bahls, D. Duchow Nokia Siemens Networks Broadband Access Division Greifswald, Germany World Telecommunication Congress 2010 Network & Service Management Reliability September Peter Danielis, M. Gotzmann, D. Timmermann University of Rostock, Germany Institute of Applied Microelectronics and Computer Engineering
Outline Introduction & Motivation Utilization of P2P Technology Erasure Resilient Codes for High Data Availability Realization of the P2P-based Storage Platform Summary 2
Introduction & Motivation Internet Service Providers (ISPs) provide Internet access Access nodes (ANs) = essential network elements E.g., DSLAMs (Digital Subscriber Line Access Multiplexers) 3
Introduction & Motivation Access nodes (ANs) = essential network elements ANs have to be powerful but well-priced ANs ≠ servers! Budget with available resources! 4 $ $ $ $
Introduction & Motivation Access nodes (ANs) = essential network elements ANs need resets (or may fail) data must not be lost! AN configuration data needs to be saved persistently! But there‘s more… 5
Introduction & Motivation Data - called session data - … … comprises MAC/IP addresses, IP lease times of customers … is required for data forwarding/traffic filtering 6 MAC address: E1-15-A0 IP address: Lease Time:60 min Active:No DHCP Request: I have MAC address E1-15-A0! DHCP Response: Your IP address is for 60 min!
Introduction & Motivation Data - called session data - … … comprises MAC/IP addresses, IP lease times of customers … is required for data forwarding/traffic filtering … has to be always available persistent storage needed … is highly volatile due to continous changes 7 MAC address: E1-15-A0 IP address: Lease Time:60 min Active:Yes DHCP Request: I have MAC address E1-15-A0! DHCP Response: Your IP address is for 60 min!
Introduction & Motivation Today: ANs store session data in persistent flash memory Problem: Flash memory limited availability/rewritability ISPs „sacrifice“ flash memory for session data reluctantly 8
Today: ANs store session data in persistent flash memory Problem: Flash memory limited availability/rewritability Solution: Use available volatile RAM resources of ANs! Introduction & Motivation 9
Average AN, e.g., PowerQuicc III (Freescale Semiconductor) RAM capacity = 1 Gbyte + unlimited rewritability Introduction & Motivation 10
Average AN, e.g., PowerQuicc III (Freescale Semiconductor) Calculating capacity = 1234 Dhrystone MIPS Introduction & Motivation 11
Average AN, e.g., PowerQuicc III (Freescale Semiconductor) Calculating capacity = 1234 Dhrystone MIPS Introduction & Motivation 12
Average AN, e.g., PowerQuicc III (Freescale Semiconductor) Problem: How to efficiently utilize available resources? Introduction & Motivation 13
Outline Introduction & Motivation Utilization of P2P Technology Erasure Resilient Codes for High Data Availability Realization of the P2P-based Storage Platform Summary 14
What options does P2P offer?...beyond the incriminated applications, of course. New networking paradigm No clients and servers anymore 15
...beyond the incriminated applications, of course. New networking paradigm No clients and servers anymore All peers form a self-organizing network Network = storage resource Network = computing resource Scalability and resilience = intrinsic features Proven concept (BitTorrent, Zattoo, Joost) What options does P2P offer? 16
Networking paradigm Each AN is part of a logical P2P overlay on its uplink Network = Storage Resource Each AN stores just a piece of session data Network = Computing Resource Each AN implements P2P protocol But ANs may become unavailable… Problem: How to ensure high data availability? Utilization of P2P technology 17 Storage Capacity of ANs
Outline Introduction & Motivation Utilization of P2P Technology Erasure Resilient Codes (ERCs) for High Data Availability Realization of the P2P-based Storage Platform Summary 18
ERCs for High Data Availability Objective: High session data availability = % Simple replication wastes memory ressources Reed-Solomon Codes Split session data of each AN into m data chunks 19
ERCs for High Data Availability Objective: High session data availability = % Simple replication wastes memory ressources Reed-Solomon Codes Split session data of each AN into m data chunks Encoding: Add k interleaved coding chunks n=m+k chunks 20
ERCs for High Data Availability Objective: High session data availability = % Simple replication wastes memory ressources Reed-Solomon Codes Split session data of each AN into m data chunks Encoding: Add k interleaved coding chunks n=m+k chunks Decoding: Restore session data from any m of n chunks 21
Outline Introduction & Motivation Utilization of P2P Technology Erasure Resilient Codes for High Data Availability Realization of the P2P-based Storage Platform Summary 22
Kad-based Realization 23
Kad-based Realization Connection of access nodes (ANs) with P2P-based overlay 24
Kad-based Realization Connection of access nodes (ANs) with P2P-based overlay P2P protocol: Kad-based Distributed Hash Table (DHT) ring 25
Kad-based Realization Connection of access nodes (ANs) with P2P-based overlay P2P protocol: Kad-based Distributed Hash Table (DHT) ring Structured chunk storage via DHT ring Assignment of hash values to ANs and session data chunks ANs save session data chunks with similar hash values 26
Kad-based Realization Connection of access nodes (ANs) with P2P-based overlay P2P protocol: Kad-based Distributed Hash Table (DHT) ring Structured chunk storage via DHT ring Assignment of hash values to ANs and session data chunks ANs save session data chunks with similar hash values 27 Admin
Block Diagram The main components are… 28 DHCP Server
Block Diagram (1) module with controlling functionality 29 1 Save Session Data! Time to Save Session Data! DHCP Server
Block Diagram (2) memory with own session data 30 2 DHCP Server
Block Diagram (3) Kad block with ERC functionality 31 3 DHCP Server
Block Diagram (4) routing table 32 4 DHCP Server
Block Diagram (5) memory with session data chunks of other nodes 33 5 DHCP Server
Outline Introduction & Motivation Utilization of P2P Technology Erasure Resilient Codes for High Data Availability Realization of the P2P-based Storage Platform Summary 34
Summary Successful development of P2P-based storage platform Utilization of free RAM instead of rarely available flash memory Connection of access nodes by P2P overlay High scalability and resilience towards network errors Efficient sharing of RAM and computing resources ERCs for high data availability & low redundandy Completion of fully functional prototype 35
36 Thank you! Any questions? Thank you! Any questions?
Backup: Related Work 37 J. Kubiatowicz et. al., “Oceanstore: An architecture for global-scale persistent storage”, 2000 Schwarz, Xin, Miller, “Availability in Global Peer-To- Peer Storage Systems”, 2004 Sattler, Hauswirth, Schmidt, „UniStore: Querying a DHT-based Universal Storage“, 2007 Morariu, „DIPStorage: Distributed Storage of IP Flow Records“, 2008
Backup: Kad-based DHT 38 Kad (eMule): 128 bit address space Distances between hash values are calculated by the XOR metric
Binary tree with XOR distances of other peers to itself Organized into k-buckets Each peer knows many close peers Each peer knows only few distant peers Each peer has a life time 39 Backup: Kad Routing Table
Backup: Kad Bootstrapping & Maintenance Bootstrapping New peer contacts a known peer and inserts itself on ring Maintenance Contact peers from routing table with expired life time Contact other peers periodically to learn new contacts 40
Backup: Kad Lookup Process Searching peer selects peers close to target 41 These peers are contacted via a request Some respond with new peers
Backup: Kad Lookup Process Some of the new peers are contacted Some of them respond 42
Responding peers within a defined search tolerance Action request: Execute the action! If they send an action response, a counter is increased If counter==defined value, the lookup terminates Otherwise, it is terminated via a timeout 43 Backup: Kad Lookup Process
Backup: Prototype 44
Backup: Related Issues 45 Benefit from using ERCs instead of data replication Moderate quantitative memory savings But significantly higher data availability Kad network: open source is high quality! Minimal traffic overhead introduced by Kad maintenance
Backup: Memory requirements & performance 46 Currently, prototype is ported to a Xilinx FPGA board Long-time test/simu of prototype at our institute intended Functional verification Determination of performance Determination of memory requirements Determination of CPU utilization