DRW — Vlad Cainamisir

§ 1 Context

Jun
W1

W2
proto

W3
bugs

W4
Arrow

W5
prod?

Aug
<100ms

DRW's quant researchers write their code in Python because it's flexible, expressive, and fast to iterate. But Python as everyone knows is notoriously slow.

However:

Someone has to always rewrite the code in C++ to get a decent enough performance to even run a comprehensive backtest.

My moonshot project was to compile Python

Obviously, I am not the first person to try this, and for a few days I read the docs of all major Python compilers and interpreters.

I decided to go with Codon, not because it solved my problem, but because it had pointers, memory management and a half decent C FFI.

Within 2 weeks I had a working prototype interfacing with all the libraries my team wanted, figured out all the interoperability with C++, built some auto-sugar for most the common things QRs use

I even found bugs in Codon and reported them

One of my coworkers said that was the first time he saw an intern debug assembly — apparently Codon has weird struct layout conventions.

I also managed to clobber the stack after passing more than 12 params to a function... later discovering Codon hardcodes function params...

I ended up going very deep into Arrow internals to figure out how to use this janky bootstrapped compiler on top of Codon to interface with Arrow tables with zero copy. #DataMovement

With half the internship time left, I was onto using my compiler in production

I got access to some super spicy secret volatility forecasting algorithm so I had to understand some stats

... Because I wanted to reimplement the algorithm itself not just translate it and compile it

I ended up creating a Dynamic Programming algorithm for this which could run on a circular buffer and make the most use out of caches.

The previous service ran in a couple of hours while mine ran in under 100ms — which enabled me to do many experiments and ended up improving the R² by 0.005 as well

Problem

Python interpreter overhead in hot paths

Solution

Python → C++ compiler with zero-copy interop

Latency

~1 hour → <100ms

§ 2 A much better looking graph than the one I had

Basically what I had

Very simplified, and only for illustration purposes

§ 3 Volatility Forecasting Race

Both systems start simultaneously. Each processes one year of daily data and returns a volatility forecast. The ball reaches the finish line when the computation completes.

Python Researcher Code slow as hell

0.0s waiting

Python Compiled No interpreter · <100ms

0ms waiting

§ 4 How It Works

Python strategy code is parsed to an AST, type-inferred, then lowered to a typed IR. From there, C++ source is emitted and compiled to a native shared library via LLVM. Strategy engineers write ordinary Python; the runtime runs native code.
A bidirectional interoperability layer allows compiled functions to call back into Python objects — and Python to call compiled functions — without data copying or serialisation. No FFI boilerplate required from the strategy author's side.
NumPy arrays, Arrow record batches, and Pandas DataFrames are passed across the language boundary as zero-copy views of existing memory buffers.
The volatility forecasting service backtests a full year of daily closes and produces a forecast in under 100ms. R² improved by 0.005 over the interpreted baseline.