Download presentation
Presentation is loading. Please wait.
1
Pub/Sub Internetworking Prototype
PSIRP BLACKHAWK Pub/Sub Internetworking Prototype Jimmy Kjällman Ericsson NomadicLab
2
TOPICS Overview of Blackhawk Recap of the PSIRP architecture
Implementation description High-level view of objects and API Overall system architecture Blackboard internals Networking part Application programming Demos Pub/sub programming Web applications 2 © Ericsson AB 2010
3
What is Blackhawk? Research prototype for FreeBSD 7 and 8
Basic functionality of PSIRP; proof-of-concept; reference Pure information-centric pub/sub API OS-integrated blackboard zFilter-based networking with local rendezvous Open source releases BSD/GPLv2 dual license Kernel module, API libraries, helper apps, examples (current) xx-xx (work in progress) Source code, preinstalled VM images, documentation: code.psirp.org (Concrete information.) 3 © Ericsson AB 2010
4
PSIRP ARCHITECTURE
5
Identifiers, Component Wheel, APIS
Publish / Subscribe Data Meta Scope Identifiers (SId) Associated with... Application Identifiers (AId) Rendezvous Identifiers (RId) Forwarding Identifiers (FId) Network Transit Paths Includes… Resolved to... Define... Low-level page API Memory object API Channel API Higher-level APIs 5 © Ericsson AB 2010
6
Path Computation, Forwarding
Sub’r Pub’r Fwd edge node Topology AS Rendezvous Data forwarding Subscribe Publish Create delivery path Configure forwarding Topo 6 © Ericsson AB 2010
7
BLACKHAWK IMPLEMENTATION
8
ARCHITECTURE vs Implementation
Note: The implementation and architecture are not 1-to-1 Only a part of the architectural ideas have been implemented Implementation details are different from the architecture documents Reasons... Architecture and implementation have evolved simultaneously Some concepts can be implemented in different ways (the architecture is not an exact specification) Simplifications (getting it to work; doing what resources allow) Efficiency Etc. 8 © Ericsson AB 2010
9
BLACKHAWK IMPLEMENTATION
Basic CONCEPTS: MEMORY OBJECTS Pub/Sub API
10
Publications as Memory Objects
A publication is an object in the blackboard - i.e., in the computer’s memory A (concept) publication is identified by a RId A version is a specific piece of data identified by a vRId A page is a block of data identified by a pRId Sub-object relationships Concept publications can have several different versions Versions have a specific set of pages in a specific order Scopes are special publications that are identified by SIds and store collections of RIds 11 © Ericsson AB 2010
11
Object Hierarchy Example
Root Scope Scope 0 Subscopes Scope 1 Scope 2 Publications Pub 1 Pub 2 Pub 3 Pub 4 Versions Version 1 Version 2 Version 3 Version 4 Version 5 Page 1 Page 8 Page 12 Pages i.e., actual data Page 2 Page 5 Page 9 Page 3 Page 6 Page 10 Page 4 Page 7 Page 11 ... ... 12 © Ericsson AB 2010
12
Conceptual API handle := create(size) publish(sid, rid, handle)
pointers to data and metadata of a memory object handle := create(size) publish(sid, rid, handle) handle := subscribe(sid, rid) events := listen(handles[]) 13 © Ericsson AB 2010
13
Conceptual API create() a memory object to be published
publish() a new (read-only) version of an object subscribe() to an object or specific version listen() to publication events, i.e, new versions Compare to, e.g.: malloc(), mmap(), socket(), send(), recv(), select(), 14 © Ericsson AB 2010
14
BLACKHAWK IMPLEMENTATION
System Architecture
15
System Architecture Kernel parts
user space Kernel parts Implemented as a FreeBSD kernel module Blackboard-based publication repository Publications stored as virtual memory objects/pages Events and data access through file system Node-internal rendezvous System call interface to user space kernel space file system pub/sub kernel module blackboard virtual memory system 16 © Ericsson AB 2010
16
System Architecture Library Applications pub/sub API C (native)
user space Library pub/sub API C (native) Python, Ruby (wrapped) Applications Can communicate via the blackboard through the library application application pub/sub library kernel space file system pub/sub kernel module blackboard virtual memory system 17 © Ericsson AB 2010
17
System Architecture Helper applications
user space Helper applications Daemons corresponding to functions in the component wheel Node-local scope management Local network rendezvous Network I/O and packet forwarding application application rendezvous helper scope helper network I/O helper pub/sub library kernel space file system pub/sub kernel module socket system blackboard virtual memory system network drivers 18 © Ericsson AB 2010
18
BLACKHAWK IMPLEMENTATION
Blackboard Internals
19
VM System Integration Motivation: We want to achieve efficiency and a natural interface FreeBSD’s Virtual Memory System (abstraction) Pages: vm_page_t (default page size: 4096 bytes) Objects: vm_object_t In our system, for each publication, we have a VM object for metadata and data 20 © Ericsson AB 2010
20
VM System Integration Metadata object Data object
One page (at least currently) Object’s own RId, its size, etc. List of sub-object RIds Pub: versions Version: pages Data object Pages contain actual content 21 © Ericsson AB 2010
21
VM System Integration Objects mapped to applications’ memory spaces
Metadata can be read through accessors (library feature) Data is seen as a pointer to a memory area Data is copy-on-write Can be changed and re-published as a new version of a publication results in a new shadow object unmodified pages are shared Others can re-subscribe to get the new version ... ... ... ... 22 © Ericsson AB 2010
22
File System Integration
For each publication, we have a vnode in the kernel Applications get an open file descriptor in the handle After publish or in subscribe Enables kevents FreeBSD’s kernel event notification mechanism kevent() is a bit similar to select() Examples of normal use: get notified when a write occurs on a file get notified when a socket has data to read We use it to get notified when somebody publishes something EVFILT_VNODE, NOTE_PUBLISH fd and pointer to handle 23 © Ericsson AB 2010
23
File System Integration
File system view to the blackboard E.g.: /pubsub/sid/rid/vrid/prid/data Data/metadata can be accessed on different levels in the object hierarchy In theory, we can map file system ops to pub/sub ops support for legacy apps ls, cat, etc. can be used currently ops that call write() probably cannot be used yet FS integration would also make demand paging over the network possible Initially ”empty” publications with only metadata known Page faults trigger data subscriptions Proof-of-concept implementation in another prototype /pubsub /sid1 /sid2 /rid1 data meta /vrid1 ... 24 © Ericsson AB 2010
24
Identifiers Scope Id, Rendezvous Id, Version-RId, Page-RId, Forwarding Id All are currently 256 bits Examples: de8a847bff8c343d69b853a215e6ee775ef2ef96 12:: (abbreviated form where :: is filled with 0s) :: is the SId for the node-local root scope, ”Scope 0” FId = LIPSIN zFilter RId = opaque byte string SId = scope’s RId 25 © Ericsson AB 2010
25
Identifiers pRId = hash of the data in a page (e.g., SHA-1)
vRId = root hash of a (skewed) Merkle hash tree of the data Binary tree: data blocks are leaf nodes, other nodes are hashes of their children pRIds are used as block hashes In other words, these RIds are tied to the content of a publication Can be used for, e.g., implementing data deduplication (page sharing), network-level caching, content authentication, etc. 26 © Ericsson AB 2010
26
In-kernel Rendezvous Publication Index (pubi)
Each publication, version, and page has one A node can thus have a very large number of these An additional data structure for in-kernel metadata Holds pointers to metadata and data VM objects, vnode, etc. Publication Index Table (PIT) Hash table with SId/RId/vRId/pRId → pubi mappings Used for object lookups in the blackboard for example when we subscribe and check the scope or when we re-publish and check for existing versions (Designed to be swappable) 27 © Ericsson AB 2010
27
In-kernel Rendezvous 28 © Ericsson AB 2010
28
In-kernel Rendezvous As we just said: Scope data contains RIds
PIT entry with SId/RId/vRId/pRID points to pubi pubi points to data and metadata Scope data contains RIds If we find the scope, a RId can be used for a new PIT lookup Publication metadata contains vRIds If we find the publication, a vRId can be used for a new PIT lookup Version metadata contains pRIds If we find the version, a pRId can be used for a new PIT lookup 29 © Ericsson AB 2010
29
Scope Helper A user space daemon that creates and updates scope publications Gets notified about every publish-event in a node Subscribes to a publication with the given SId If the scope publication does not yet exist in the blackboard, the helper creates a new scope, adds the given RId to it, and publishes it If the scope publication is found, the helper checks if the RId is already in the scope if the RId is found, the helper doesn’t need to do anything if the RId is not found, the helper adds the RId to the publication and re-publishes the scope Example of a simple pub/sub application 30 © Ericsson AB 2010
30
Some Current Limitations
Scopes have only one single memory page, publications cannot be removed from scopes Limits the maximum number of publications in a scope Metadata objects also have only one page, and no garbage collection has been implemented Limits the maximum number of sub-objects limited number of publication versions (vRId list becomes full) limited publication size (pRId list becomes full) 31 © Ericsson AB 2010
31
BLACKHAWK IMPLEMENTATION
Network I/O, FORWARDING, AND RENDEZVOUS
32
Network Communication
Network I/O daemon (netiod) Sends and receives packets Currently only Ethernet frames as broadcast Networking over UDP planned Implements packet forwarding with zFilters Only basic functionality: match FIds to link-Ids (LIds) Static configuration: /etc/netiod.conf Each node also has a LId pointing to itself Handles publication fragmentation and assembly Normal Ethernet links can only send and receive ~1500 bytes per frame, not whole 4096-byte pages + headers Publishes received, complete publications in the local node Received metadata is dispatched to the local rendezvous helper Currently user space app, but could/should be moved to the kernel 33 © Ericsson AB 2010
33
Network I/O Daemon netiod IPC Rendezvous Forwarding IPC socket laird
publish commands listen to events send data listen to events laird if0 if1 34 © Ericsson AB 2010
34
Packet formats Current format on a high level Forwarding header
FId, TTL Rendezvous header SId, RId, vRId, sequence number Metadata SId, RId, vRId, return-FId, data length, signal type Data 1024 bytes (or less) of publication content FwdH RzvH Metadata / Data 35 © Ericsson AB 2010
35
Local Network Rendezvous
Local network rendezvous helper (laird) deals with pub/sub operations between nodes in a LAN Listens to all subscribe-events that occur in the local node Also listens to updates to all local publications Initially to Scope 0 Recursively registers to listen to all sub-scopes Recursively registers to listen to all publications in those scopes Issues publish and subscribe commands to the network A local rendezvous node is notified about new publications State is kept for pending subscriptions until a publication appears locally A default FId to a local RZV node is used if the destination is unknown Receives publish and subscribe commands from the network Caches received metadata Rendezvous nodes send cached metadata to subscribers, and relay data subscriptions to data sources 36 © Ericsson AB 2010
36
Local Area Intra-Domain Rendezvous DAemon
laird Publication Metadata Pending subscriptions Event handlers listen to updates Pubs Subs IPC Scope 00::00 listen to subscriptions listen to events publish commands Scope cc::dd Pub aa::bb /pubsub/subs netiod 37 © Ericsson AB 2010
37
IPC between NEtworkinG Helpers
Blackboard-based IPC is used for communication between the network I/O and local rendezvous helpers in a node Both helpers have two common RIds in Scope 0 Both helpers subscribe and listen to one of these RIds, and publishes updates with the other RId A common IPC publication format is known by both parties laird publishes information about publications and subscriptions to netiod netiod publishes received information (metadata) to laird 38 © Ericsson AB 2010
38
Message exchange example
(collect return-FId) Local RZV node Publisher Subscriber pub() publish metadata store metadata subscribe metadata sub() publish metadata subscribe data subscribe data publish data 39 © Ericsson AB 2010
39
Some Current Limitations
Subscribe-before-publish is not possible Can be worked around in the local case by monitoring scopes; at least Scope 0 can always be subscribed to Getting more than one version of the same publication from the network is not possible If we have a local publication with a specific SId/RId pair, we will just return that one The number of simultaneous pending data subscriptions is limited No reliable transport These issues will be addressed in future versions 40 © Ericsson AB 2010
40
APPLICATION PROGRAMMING
41
General Principles Instead of sending and receiving...
Publish data that you want to make public (within a scope) Subscribe to data you need Use events to get notifications about updates Event-driven programming Possible to avoid excessive use of threads by sharing the same event queue between publications 42 © Ericsson AB 2010
42
API Native C API: the libpsirp pub/sub library
Wrappers for Python and Ruby Generated with SWIG and additional C and Python/Ruby code The API for Python is more object-oriented and has more features than the one for Ruby Documentation at code.psirp.org Examples and test apps provided with the source code 43 © Ericsson AB 2010
43
C aPI Header Types Primitives Accessors Events #include libpsirp.h
Identifiers: psirp_id_t (array) Handle: psirp_pub_t (pointer) Primitives psirp_create(), psirp_subscribe(), psirp_subscribe_sync(), psirp_publish(), psirp_free() Accessors for data, length, identifiers, fd, … psirp_pub_data(psirp_pub_t pub), psirp_pub_data_len(psirp_pub_t pub), ... Events psirp_kq_t or standard kqueue() and kevent() calls 44 © Ericsson AB 2010
44
C aPI: Primitives psirp_create(int length, psirp_pub_t *pub)
Allocates memory for a publication psirp_subscribe(psirp_id_t *sid, psirp_id_t *rid, psirp_pub_t *pub) Subscribes to a publication Returns immediately psirp_subscribe_sync(psirp_id_t *sid, psirp_id_t *rid, psirp_pub_t *pub, struct timeval *timeout) Waits for a publication to appear Especially useful when subscribing to something from the network 45 © Ericsson AB 2010
45
C API: Primitives psirp_publish(psirp_id_t *sid, psirp_id_t *rid, psirp_pub_t pub) Publishes data psirp_free(psirp_pub_t pub) Frees memory allocated and mapped for the current process Does not remove anything from the blackboard 46 © Ericsson AB 2010
46
C API: Events Types Functions Event: struct psirp_event
Event queue: psirp_kq_t Event list: psirp_kevl_t Functions psirp_kq_t psirp_create_kq(void) psirp_delete_kq(psirp_kq_t *ph) psirp_wait_kq(psirp_kq_t *ph, int max_secs, struct psirp_event *events, int *num_events) psirp_add_kq_listener(psirp_kq_t *kh, psirp_pub_t pub, int filter, psirp_callback_t callback, void *opaque) 47 © Ericsson AB 2010
47
C API: EVENTS, Alternative Way
int listen_to_new_versions(psirp_pub_t pub) { struct kevent kev; int kq; int err; kq = kqueue(); if (kq < 0) return -1; EV_SET(&kev, psirp_pub_fd(pub), EVFILT_VNODE, EV_ADD|EV_CLEAR, NOTE_PUBLISH|NOTE_UNMAP, NULL, pub); /* Register new event listener */ err = kevent(kq, &kev, 1, NULL, 0, NULL); if (err < 0) { } /* Listen to new events */ err = kevent(kq, NULL, 0, &kev, 1, NULL); /* do something with the pub */ 48 © Ericsson AB 2010
48
Python API Module Functions Events Publication class and subclasses
from psirp.libpsirp import * Publication class and subclasses pub.buffer gives read/write access to data in publication pub pub.publish(sid, rid) publishes the publication Functions create(len) length returns a new publication instance subscribe(sid, rid) subscribes to a publication Events PubSubKQueue class with register(pub) and listen() functions 49 © Ericsson AB 2010
49
CONCLUSIONS Some essential parts of PSIRP’s design have been implemented in this prototype The prototype is integrated with the OS and enables pure information-centric pub/sub programming and networking Further development and integration work is ongoing 50 © Ericsson AB 2010
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.