12/19/2008

Evolution of Codependency in Antagonistic Relationships

I'm reading Out of Control: The New Biology of Machines, Social Systems, & the Economic World, a most excellent book about complexity. This quote caught my eye:

In defending itself so thoroughly against the monarch, the milkweed became inseparable from the butterfly. And vice versa. Any long-term antagonistic relationship seemed to harbor this kind of codependency. (p 74)

This made me realize something about the nature of governments and war: Governments evolved to protect resources and people from the threat of outside invasion. An organizing structure was required to create and maintain a fighting force capable of resisting invasion from neighbors. However, it's now obvious that governments are in a codependent relationship with war: If there were no more war, then there would be no need for a government's ability to organize a fighting force. Therefore it's in a government's best interest to ensure that war never ceases.

However, just like any other codependent relationship, a lot of denial takes place. I doubt most politicians would come out and say that a prime function of government is to create war. Actions speak louder than words, though, and it's clear that in the thousands of years of human civilization there have been plenty of wars.

lxml + eventlet mashup

Since Ian was kind enough to give me instructions that gave me a working lxml (I had never been able to compile it before), I thought I'd write a quick scraper by mashing lxml together with eventlet.

The result is a thing of beauty:


from os import path
import sys

from eventlet import coros
from eventlet import httpc
from eventlet import util

from lxml import html

## Make httpc work -- I'll make it work without this soon
util.wrap_socket_with_coroutine_socket()

def get(linknum, url):
print "[%s] downloading %s" % (linknum, url)
file(path.basename(url), 'wb').write(httpc.get(url))

def scrape(url):
root = html.parse(url).getroot()
pool = coros.CoroutinePool(max_size=8)
linknum = 0
for link in root.cssselect('a'):
url = link.get('href', '')
if url.endswith('.mp3'):
linknum += 1
pool.execute(get, linknum, url)
pool.wait_all()

if __name__ == '__main__':
if len(sys.argv) == 2:
scrape(sys.argv[1])
else:
print "usage: %s url" % (sys.argv[0], )

This script manages to max out my bandwidth -- 800KB/sec at home and 2.5MB/sec at work -- without breaking a sweat. It oscillates between about 10% and 20% CPU on my MacBook Pro. Nice!

12/06/2008

ptth (Reverse HTTP) implementation in a browser using Long Poll COMET

ptth is an idea I have planning on implementing for a few years now. The basic idea is that you take normal HTTP semantics and reverse them, meaning that the client (from the TCP perspective) acts like a server (from the application perspective), and the server (from the TCP perspective) acts like a client (from the application perspective) and makes requests on the client whenever it feels like it. This is distinguished from most normal COMET semantics in that ptth retains all of http's characteristics even though the underlying transport is radically different looking at the TCP level.

When I was at Linden Lab, I advocated using this technique in the Second Life Viewer as a refinement of the Plain Old COMET implementation currently in use (which I also helped implement). I wrote a wiki page describing how the http Upgrade: header can be used to initiate a ptth connection, effectively turning a socket that the client opened to the server around, allowing the server to make requests on the client as if the server had opened a connection to the client (even though it didn't). I even did an implementation in Python showing how once the Upgrade: has been performed the semantics are exactly the same as normal http. This means with a little hackery it's possible (and in the Python case, almost trivial) to reuse existing http client and server libraries. All you have to mess around with is the setup of the socket; once both sides have an open socket and have agreed to Upgrade:, you just grab the underlying socket and pass it to the client or server library and away you go.

Even though I didn't get the chance to implement and deploy this technique in the Second Life Viewer and Server before I left Linden for Mochi, I still hope this gets implemented someday, as I think it is a very elegant and efficient technique. While implementing the real ptth Upgrade: in C++ will be more challenging than doing a quick Python prototype, once the dirty business of extracting sockets and injecting them into the client and server libraries used is complete, it should be a very reliable technique since at that point everything is exactly the same as normal http.

However, it won't be possible to do these type of Upgrade: shenanigans when we are in the browser's Javascript environment and don't have access to low level details like socket APIs. Therefore, I also specced out what ptth would look like running over a Plain Old COMET Long Poll style transport. The wiki page describes encoding the reverse request and response as JSON for ease of parsing and generating in Javascript, but other content-types could be used (application/x-http-request and application/x-http-response perhaps, or maybe the message/http mime type could simply be used or modified to be message/http+request and message/http+response?)

On Saturday the 6th we had a Mochi Hack Day at our office, and I was hacking on my perpetual hacking project, Pavel. If you don't know me personally and haven't heard me talk about Pavel, someday I'll flesh out the ideas behind it more fully in a series of web pages, but for now you can read this old blog post to get a rough idea of what it is. The post uses the term "graphical multiuser networked programming environment" to describe the basic idea. From the very beginning I conceived ptth as a vehicle for driving updates of the user interface to Pavel, so I decided to get down to it and actually implement it. Since I have implemented so many COMET servers at this point that I have lost count, it turned out to be almost trivially easy, and I had something working in a few hours.

And now the part everyone has been waiting for: the demo. The demo takes place in firebug, where you can see the Javascript side of the Long Poll operating, and in the eventlet backdoor running inside of a terminal. The eventlet backdoor gives me a Python interactive REPL into the process which is serving the server side of the Long Poll, and allows me to manually inject ptth messages into the system which then get delivered to the browser, which then responds to the request. The first thing you see me doing is building a simple ptth request by hand, encoded in JSON. I then inject this message into the ptth system, copying and pasting the uuid of the user who is connected via the Firefox browser in the background. You can see the debug printing in Firebug showing that the request was delivered to the browser, and the result in the backdoor's REPL is the response that was generated in Javascript and sent back to the server.

This means that I now need to implement some sort of web framework in Javascript. I know Dojo has already done this and I'm sure other people will start to experiment with this idea as well, but for my purposes I'll probably come up with something super simple. The idea that immediately came to my mind is to have URIs represent XPath into the html document, and PUT replacing the selected node with the content fragment from the request body of the PUT.

7/29/2008

Eventlet 0.7 and Spawning 0.7 Released

Eventlet 0.7

Eventlet 0.7 fixes some very long-standing bugs. First of all, there was a CPU leak in the select hub which would cause an http keep-alive connection to consume 100% CPU while it was open. The problem was that every file descriptor was being passed in to select, even if the callback for the readiness mode was None. This bug has been in since the very beginning of eventlet, and it's great to have it fixed!

Second, another old bug. It's now possible to use Eventlet's SSL client to talk to Eventlet's SSL server. There was a subtle bug in the way SSL sockets would raise an error in some conditions instead of returning '' to indicate the connection was closed.

Finally, some memory leaks in the libevent and libev hubs (fairly new code) were fixed, so if you're using Eventlet with libevent or libev try it out and see how it performs for you.

Also, this release pulls in a bunch of API additions from the Linden SVN repository. Ryan Williams is now maintaining an HG repository which is synched with the SVN repository, so integrating patches between branches will now be much easier.

Update July 30, 2008This release of eventlet also supports stackless-pypy again. I had to check for the absence of the socket.ssl object, and re-enable the poll hub. To try this out, check out and translate pypy-c following the instructions on the pypy site, and then run one of the eventlet examples (for example, "./pypy-c /Users/donovan/src/eventlet/examples/wsgi.py")

Download Eventlet 0.7 from PyPI: http://pypi.python.org/pypi/eventlet/0.7

Spawning 0.7

Spawning has improved a lot since I last wrote about it. It now has a command line script, "spawn", which makes it easy to quickly serve any wsgi application. The concurrency strategy is also now extremely flexible and can be configured for a plethora of use cases.

The default is to use one non-blocking i/o process with a threadpool, which makes it easy to use with any existing wsgi applications out there that assume shared memory and the ability to block.

However, it's possible to independently configure the number of i/o processes, the number of threads, and even configure it to be single-process, single-thread, with fully non-blocking i/o (thanks to eventlet's monkey patching abilities).

Update July 30, 2008This release of spawning also has an experimental Django factory. To run a Django app under Spawning, run "spawn --factory=spawning.django_factory.config_factory mysite.settings".

Take a look at the Spawning PyPI entry for more information: http://pypi.python.org/pypi/Spawning/0.7

6/16/2008

Spawning 0.1 Released

Spawning is an experimental mashup between Paste and eventlet. It provides a server_factory for Paste Deploy that uses eventlet.wsgi. It also has some other nice features, such as the ability to run multiple processes to take advantage of multicore processors and multiprocessor machines, and graceful code reloading when modules change or the svn revision of a directory changes. Graceful reloading means new processes are immediately started which start serving new incoming requests, but old processes hang around processing the old requests until those requests are complete.

This is very early still. The code is currently hard-coded to run one process, but once I figure out how to use Paste Deploy's configuration files a bit better I will make it configurable. I mostly wanted to get it out quickly because Ian Bicking asked for it in the comments of my last blog post, and to get feedback. I'd like more of this code to be shared between Spawning and mulib's 'mud' server. I also need a better name than Spawning.

You can download a tarball here or you can clone the Mercurial repository here.

6/12/2008

Eventlet 0.5 Released

The last release of eventlet was 0.2, which we did when we re-open-sourced the fork of eventlet I worked on while I was at Linden Lab. 0.2 was released quite a while ago, and eventlet has seen significant improvement in the meantime.

The main change in this release is the ability to use libevent as the multiplexing api instead of raw select or poll. If libevent and the Python wrapping are not installed, eventlet will still fall back, first checking for the presence of poll and falling back to select if it is not available.

Another major change in this release is a much improved eventlet.wsgi server. The wsgi server now supports Transfer-Coding: chunked as well as Expect: 100 Continue, and is quite fast. I tested it against an eventlet based wsgi server I wrote which uses wsgiref (from the Python 2.5 standard library) and my informal tests showed eventlet.wsgi being several hundred requests a second faster at serving a "Hello, World!" wsgi application.

This release also features significant refactoring, cleaner code, support for cooperative operations on pipes (and unix domain sockets) as well as sockets, more tests, and docstrings for pretty much everything. The documentation, which was non-existant before, is now pretty comprehensive.

To install, just "easy_install eventlet" and start hacking!

PyPI page: http://pypi.python.org/pypi/eventlet/0.5

Overview: http://wiki.secondlife.com/wiki/Eventlet

Documentation: http://wiki.secondlife.com/wiki/Eventlet/Documentation

Mercurial Repository: http://donovanpreston.com:8888/eventlet

6/02/2008

REST + Actors

I had a really good idea over the weekend for using eventlet and mulib to combine the concepts of REST and Actors. Eventlet has had an Actor class for a while now, but I haven't really used it for anything. After otakup0pe twittered a link to the Reia language (everyone knows how much of a language geek I am) I started thinking about Actors again and how I could have applied them to various work problems I solved in the last few years. The last time I really tried to do anything serious with Actors was when I wrote the latest version of Pavel on top of the just-written (at the time) eventlet. I also tried to mix a prototype object system in there and the actor coroutines were implicit in the semantics of usage (an Actor which called a method on another Actor would be implicitly causing a switch into the other Actor's coroutine), which in retrospect was perhaps a bit too ambitious.

Ryan Williams wrote the current eventlet Actor (eventlet.coros.Actor) and it's much simpler and more straightforward: You override the received method to handle messages, and other actors call the cast method to send messages. This is different from my previous implementation (and also what my ideal would be) in that you get called back for every message, meaning the main coroutine is generic and there's no need to keep track of where the Actor's coroutine is to serialize an actor. This means it would be possible to request a representation of an Actor at any time between messages. The state would include all the Python instance variables along with all the unhandled messages currently in the Actor's mailbox.

So, with that realization, it suddenly becomes trivial to write a mulib handler for the Actor class. GET and PUT with the appropriate content types (application/json for example) would get or set the current state of the Actor. DELETE would delete it. POST enqueues a message in the actor's mailbox (it just calls cast with the body of the request). Simple and straightforward. I'm totally going to do this soon -- it probably would have been faster to just do the implementation rather than blog about it :-)

Oh, one more thing -- to enhance the experience of actually using these semantics, the cast method should become a generic method that dispatches based on pattern matching (using mulib.shaped). I haven't figured out what an efficient implementation of this would look like yet, but I'm going to try a brute-force implementation just for fun.

5/14/2008

Template on PUT

I just had a cool idea. Usually, people run HTML templating engines on GET. They fetch some data, load an HTML template, and then mash the two together. My idea is to instead run the templating engine on PUT. The body of the PUT would have the data to be templated. The URL that was PUT to would determine which template to use. The response from the PUT would contain the fully templated output, equivalent to what the client would get by doing a GET to that url at any point afterwards.

pyx