gevent network library Denis Bilenko gevent.org
Problem statement from urllib2 import urlopen response = urlopen(' body = response.read() How to manage concurrent connections?
Problem statement def on_response_read(response): d = response.read() d.addCallbacks(on_body_read, on_error) def on_error(error):... def on_body_read(body):... d = readURL(' d.addCallbacks(on_response_read, on_error) reactor.run() Possible answer: Async framework (Twisted, asyncore,...)
simplicity is lost
Problem statement from threading import Thread def read_url(url): response = urllib2.urlopen(url) body = response.read() t1=Thread(target=read_url, args=(' t1.start() t2=Thread(target=read_url, args=(' t2.start() t1.join() t2.join() Possible answer: Threads
resource hog
Memory required for 10k connections
gevent (greenlet + libevent) from gevent import monkey; monkey.patch_all() def read_url(url): response = urllib2.urlopen(url) body = response.read() a = gevent.spawn(read_url, ' b = gevent.spawn(read_url, ' gevent.joinall([a, b])
concurrent fetch
Memory required for 10k connections
greenlet
from greenlet import greenlet >>> def myfunction(arg):... return arg + 1 >>> g = greenlet(myfunction) >>> g.switch(2) 3
from greenlet import greenlet >>> MAIN = greenlet.getcurrent() >>> def myfunction(arg):... MAIN.switch('hello')... return arg + 1 >>> g = greenlet(myfunction) >>> g.switch(2) 'hello' >>> g.switch('hello to you') 3
switching deep down the stack >>> def myfunction(arg):... MAIN.switch('hello')... return arg + 1 >>> def top_function(arg):... return myfunction(arg) >>> g = greenlet(top_function) >>> g.switch(2) 'hello'
from greenlet import greenlet primitive pseudothreads, share same OS thread switched explicitly via switch() and throw() organized in a tree, each has.parent except MAIN switch(), throw() and.parent reserved for gevent
How gevent uses greenlet HUB MAIN spawned greenlets
Hub: greenlet that runs event loop from gevent import core class Hub(greenlet.greenlet): def run(self): core.dispatch() # wrapper for event_dispatch() def get_hub(): # return the global Hub instance # creating one if does not exist gevent/hub.py
Event loop libevent 1.4.x or beta gevent.core: wraps libevent API (like pyevent) >>> def print_hello():... print 'hello' >>> gevent.core.timer(1, print_hello) >>> gevent.core.dispatch() hello 1 # return value (no more events)
Implementation of gevent.sleep() def sleep(seconds=0): """Put the current greenlet to sleep""“ switch = getcurrent().switch timer = core.timer(seconds, switch) try: get_hub().switch() finally: timer.cancel()
Cooperative socket gevent.socket: compatible synchronous interface wraps a non-blocking socket def recv(self, size): while True: try: return self._sock.recv(size) except error, ex: if ex[0] == EWOULDBLOCK: wait_read(self.fileno()) else: raise
Cooperative socket gevent.socket: compatible synchronous interface wraps a non-blocking socket def wait_read(fileno): switch = getcurrent().switch event = core.read_event(fileno, switch) try: get_hub().switch() finally: event.cancel() gevent/socket.py
Cooperative socket gevent.socket dns queries are resolved through libevent-dns (getaddrinfo, gethostbyname) gevent.ssl
Monkey patching from gevent import monkey; monkey.patch_all() def read_url(url): response = urllib2.urlopen(url) body = response.read() a = gevent.spawn(read_url, ' b = gevent.spawn(read_url, ' gevent.joinall([a, b])
Monkey patching Patches: socket and ssl modules time.sleep, select.select thread and threading Beware: libraries that wrap C libraries (e.g. MySQLdb) Disk I/O things not yet patched: subprocess, os.system, sys.stdin Tested with httplib, urllib2, mechanize, mysql-connector, SQLAlchemy,...
Greenlet objects from gevent import monkey; monkey.patch_all() def read_url(url): response = urllib2.urlopen(url) body = response.read() a = gevent.spawn(read_url, ' b = gevent.spawn(read_url, ' gevent.joinall([a, b])
Greenlet objects def read_url(url): response = urllib2.urlopen(url) body = response.read() g = Greenlet(read_url, url) g.start() # wait for it to complete g.join() # or raise an exception and wait to exit g.kill() = spawn
Greenlet objects def read_url(url): response = urllib2.urlopen(url) body = response.read() g = Greenlet(read_url, url) g.start() # wait for it to complete (or timeout expires) g.join(timeout=2) # or raise and wait to exit (or timeout expires) g.kill(timeout=2) = spawn
Timeouts with gevent.Timeout(5): response = urllib2.urlopen(url) for line in response: print line # raises Timeout if not done after 5 seconds with gevent.Timeout(5, False): response = urllib2.urlopen(url) for line in response: print line # exits block if not done after 5 seconds Beware: catch-all “ except: ”, non-yielding code
API socket, ssl Greenlet Timeout Event, AsyncResult Queue (also JoinableQueue, PriorityQueue, LifoQueue) – Queue(0) is a synchronous channel Pool StreamServer: TCP and SSL servers WSGI servers
gevent.wsgi – uses libevent-http – efficient, but lacks important features gevent.pywsgi – uses gevent sockets green unicorn (gunicorn.org) – its own parser or gevent’s server – pre-fork workers
Caveat emptor Reduced portability – no Jython, IronPython – not all platforms supported by CPython PyThreadState is shared – exc_info (saved/restored by gevent) – tracing, profiling info
Future plans alternative coroutine libraries – Stackless – swapcontext more libevent: – http client – buffered socket operations – priorities process handling (gevent.subprocess) even more stable API with 1.0
Examples bitbucket.org/denis/gevent/src/tip/examples/ chat.gevent.org omegle.com ProjectsUsingGevent – gevent-mysql – psycopg2 bit.ly/use-gevent – websockets, web crawlers, facebook apps
Summary coroutines are easy-to-use threads as efficient as async libraries works well if app is I/O bound simple API, many things familiar works with unsuspecting 3 rd party modules
Thank you!