Guido doesn't want non-portable assembly in Python and it's understandable

(I wrote a long comment in the Hacker News discussion of Guido's slides about his plans for async io and asymmetric coroutines in Python 3.4, but I thought it was good enough to deserve a blog post)

From a certain perspective [Guido's desire to keep non-portable stack slicing assembly out of Python] is a rational decision. Because the CPython API relies so heavily on the C stack, either some platform-specific assembly is required to slice up the C stack to implement green threads, or the entire CPython API would have to be redesigned to not keep the Python stack state on the C stack.

Way back in the day the proposal for merging Stackless into mainline Python involved removing Python's stack state from the C stack. However there are complications with calling from C extensions back into Python that ultimately killed this approach.

After this Stackless evolved to be a much less modified fork of the Python codebase with a bit of platform specific assembly that performed "stack slicing". Basically when a coro starts, the contents of the stack pointer register are recorded, and when a coro wishes to switch, the slice of the stack from the recorded stack pointer value to the current stack pointer value is copied off onto the heap. The stack pointer is then adjusted back down to the saved value and another task can run in that same stack space, or a stack slice that was stored on the heap previously can be copied back onto the stack and the stack pointer adjusted so that the task resumes where it left off.

Then around 2005 the Stackless stack slicing assembly was ported into a CPython extension as part of py.lib. (By Armin Rigo. A million thanks from me for this.) This was known as greenlet. Unfortunately all the original codespeak.net py.lib pages are 404 now, but here's a blog post from around that time that talks about it.

Finally the relevant parts of greenlet were extracted from py.lib into a standalone greenlet module, and eventlet, gevent, et cetera grew up around this packaging of the Stackless stack slicing code.

So you see, using the Stackless strategy in mainline python would have either required breaking a bunch of existing C extensions and placing limitations on how C extensions could call back into Python, or custom low level stack slicing assembly that has to be maintained for each processor architecture. CPython does not contain any assembly, only portable C, so using greenlet in core would mean that CPython itself would become less portable.

Generators, on the other hand, get around the issue of CPython's dependence on the C stack by unwinding both the C and Python stack on yield. The C and Python stack state is lost, but a program counter state is kept so that the next time the generator is called, execution resumes in the middle of the function instead of the beginning.

There are problems with this approach; the previous stack state is lost, so stack traces have less information in them; the entire call stack must be unwound back up to the main loop instead of a deeply nested call being able to switch without the callers being aware that the switch is happening; and special syntax (yield or yield from) must be explicitly used to call out a switch.

But at least generators don't require breaking changes to the CPython API or non-portable stack slicing assembly. So maybe now you can see why Guido prefers it.

Myself, I decided that the advantages of transparent stack switching and interoperability outweighed the disadvantages of relying on non-portable stack slicing assembly. However Guido just sees things in a different light, and I understand his perspective.


Your giant proprietary (or at least silo) codebase is a huge liability

There has been a lot of news this week about vulnerabilities in very low-level platform code being used in production by many many people. First there was a ruby exploit, and now today I see that there is a new java zero day.

The truth is, these kinds of exploits are absolutely everywhere. When off-the-shelf libraries are assembled together to make a whole that is greater than the sum of the parts, strange interactions are possible that the original integrators never conceived of.

In the case of the ruby exploit, from what I read it seems something like: Part of the web decoding machinery that could decode URL encoded parameters was extended to be able to decode XML. The XML decoding machinery was then extended to be able to decode YAML.

YAML has a syntax for serializing arbitrary Ruby objects, and when that YAML file is deserialized a new instance of that object is created. With careful crafting of the input file, a large variety of arbitrary code execution is possible.

This is also the reason it is not a good idea to use pickle as a network serialization format in Python. You might think, "oh, I'll use marshal. Marshal doesn't support arbitrary class serialization." But take a look at the list of object types marshal does support:

None, integers, long integers, floating point numbers, strings, Unicode objects, tuples, lists, sets, dictionaries, and code objects

Code objects. I rest my case. Of course, you would have to be calling the return results from the marshal module in order for a code object constructed by an attacker to run on your server, but some hacker somewhere is probably going to figure out some crazy way.

Which brings me to my main point: I've observed over the years that for some reason business type people and even some programmers seem to think that a large proprietary codebase that nobody else is allowed to look at is an asset. It's not; it's a liability!

You don't understand what's in your code. You don't understand what's in the code of the large number of libraries that you use every day. Codebases are written over weeks, months, years, by different people, in different frames of mind.

There are solutions to this code complexity problem. We can break large complex code bases into small parts that are very explicit and careful about validating their input. We can completely isolate these parts from each other so that they can't accidentally (or maliciously) break something.

Libraries could strive for simplicity and explicitness rather than kitchen-sink-itis. If a surgeon wants to do surgery, they are going to choose a light, sharp, well-balanced scalpel, not an old Swiss Army knife.

Code that only a few people have to look at doesn't have to be clear. Only those few people have to bear the mental burden of holding that nasty code in their head. Code that a lot of people need to look at has a higher probability of being clear. This is one advantage of open source; obviously, it's not enough.

My suggestion for reducing the complexity in interactions like these is to create simpler, more well-defined libraries and isolate these libraries from each other in different processes.

Processes evolved in the 70s to isolate users from each other but now it is 2013 and we could start isolating more and more libraries from each other. For languages that don't use reference counting, fork with copy on write may be good enough to allow us to actually use many many UNIX processes for a single application without consuming too many resources.


Getting Started Developing for Firefox OS Screencasts

I've been working on Boot 2 Gecko (Firefox OS) for the last 9 months now, and it has been both a completely insane project and an awesome project. Insane amount of work, awesome implementation.

Writing an OS from the ground up is no easy task. Luckily, we're not doing that. We are building on top of the linux kernel and gecko, both open source projects that have lots of effort put against them.

It is really starting to firm up now, especially after a few weeks ago when we had the feature freeze. There is still a lot to do, however, and we are going to be trying to bring in developers from other areas of the company to help fix bugs and make this thing stable.

Luckily, the development process just got a lot easier with two things that recently landed. One is that the b2g desktop nightly builds now include a build of gaia, so you can just download a nightly build, double-click, and go. The other is that the remote debugger gained the ability to load code over-the-wire as part of the debugger protocol, so the way gaia packages up apps and refers to them using app:// urls is now debuggable without nasty workarounds.

As we are ramping up newer developers to help with the project, we need clear documentation of the development process. The Gaia/Hacking page is the canonical reference for how to do absolutely everything, but it's overwhelming. To help with this, I made a series of 5 screencasts that cover the basics of using b2g desktop nightly builds, remote debugging with b2g desktop, hacking on gaia itself in b2g desktop, flashing a phone with gaia changes, and what to do if Firefox OS asks you to choose from two homescreens or if remote debugging does not show your source for your app.

As an aside, I find it hilarious that there are all these incorrect rumors about the speed and the memory of the phone, when the correct specs were actually *announced* in February. I guess people would rather speculate and spread rumors than read press releases.

B2G Desktop Intro (Firefox OS)

B2G Desktop Intro (Firefox OS) from Donovan Preston on Vimeo.

Debugging Gaia (Firefox OS) with the Remote Debugger

Debugging Gaia (Firefox OS) with the Remote Debugger from Donovan Preston on Vimeo.

Hacking on Gaia in Debug Mode

Hacking on Gaia in Debug Mode from Donovan Preston on Vimeo.

Flashing Gaia onto a Firefox OS Phone and Remotely Debugging a Firefox OS Phone

Flashing Gaia to a Firefox OS Phone and Remotely Debugging a Firefox OS Phone from Donovan Preston on Vimeo.

Firefox OS Tips

Firefox OS Tips from Donovan Preston on Vimeo.


Where are the Peer-to-Peer web apps?

Perhaps the reason we have not seen single-page html applications that connect directly to peers without an intermediate server is that browsers cannot easily listen on a local port. They can open outgoing connections all day long, but WebRTC may be the first web standard that allows the browser to listen on a port. If there are others, please let me know.

Of course, the WebRTC spec looks overly complicated for the incredibly simple thing I want to do. I just want the browser to be able to listen on a port like any other process on the machine can. Sure, there are security implications, but these exist for everything the browser exposes to web applications, and there's an entire class of Peer-to-Peer web apps that simply cannot easily be written using current web technologies.

There are many examples of a Peer-to-Peer experience being delivered to users using a client-server architecture. ChatRoulette, Omegle, and even more recently, products like Google Hangouts. These applications must be implemented by using servers in the middle to connect the peers, making scaling them much harder than it would be if browsers could just listen.

There is an opportunity to explore a generic Client-Agent-Peer architecture, where Clients (Browsers) talk to an Agent server using HTTP to configure the state of the Agent, which would then be contacted by the Peer on behalf of the Client when the Browser is not online. When the Browser is online, the Agent can refer the Peer directly to the Client. When the Browser is offline, the Agent can handle the request itself using a cached copy of the material the Browser was sharing, or it can just decline to fulfill the request.

Reverse HTTP was my attempt to push out the simplest thing that could possibly work to get browsers to talk to each other. It didn't really go anywhere in terms of being implemented in actual browsers. Coincidentally, someone else had the same idea around the same time, and implemented Reverse HTTP in terms of actual HTTP Requests encoded in Responses, and vice versa. This makes it possible to write a pure javascript client rather than needing the browser to support the Upgrade protocol itself.

Really, though, it would be very nice if all of the hacks and tricks and workarounds weren't necessary, and I could just listen on a port with javascript. I'll keep dreaming.


Coverage and Profile Information Gathered from -D

The disassembly information provided by SpiderMonkey's -D switch is much richer than the plain coverage data I gathered with my trace hook. However, the disassembly is printed straight to stdout which makes it more difficult to separate from test output and harder to parse. So, I wrote a small patch which makes -D take the filename to write the disassembly to instead of stdout.


I need to get my situation with the mozilla-central repository figured out so I can create a branch and commit. In the meantime, there's the small patch.

Then, I rewrote my coverage_parser.py script in dom.js to parse the -D output and was able to generate nice coverage files, including displaying the number of total bytecodes executed on each line, and nice profile files, sorted from the lines which executed the most bytecodes down to those that executed the least.

Screen Shot 2011-08-10 at 6.47.44 PM

Screen Shot 2011-08-10 at 9.30.22 PM

With the test suites we scraped together from various places, we have almost 50% coverage of dom.js right off the bat.