Work · Summer 2025
DRW
Research Engineer Intern · Chicago, IL
Duration
Jun — Aug 2025
Team
Research Engineering
Stack
Python · C++ · LLVM
Result
<100ms · R² +0.005
§ 1 Context
Jun
W1
W2
proto
W3
bugs
W4
Arrow
W5
prod?
Aug
<100ms

DRW's quant researchers write their code in Python because it's flexible, expressive, and fast to iterate. But Python as everyone knows is notoriously slow.

However:

Python is slow

Someone has to always rewrite the code in C++ to get a decent enough performance to even run a comprehensive backtest.

My moonshot project was to compile Python

Obviously, I am not the first person to try this, and for a few days I read the docs of all major Python compilers and interpreters.

I decided to go with Codon, not because it solved my problem, but because it had pointers, memory management and a half decent C FFI.

Within 2 weeks I had a working prototype interfacing with all the libraries my team wanted, figured out all the interoperability with C++, built some auto-sugar for most the common things QRs use

I even found bugs in Codon and reported them

One of my coworkers said that was the first time he saw an intern debug assembly — apparently Codon has weird struct layout conventions.

I also managed to clobber the stack after passing more than 12 params to a function... later discovering Codon hardcodes function params...

I ended up going very deep into Arrow internals to figure out how to use this janky bootstrapped compiler on top of Codon to interface with Arrow tables with zero copy. #DataMovement

With half the internship time left, I was onto using my compiler in production

I got access to some super spicy secret volatility forecasting algorithm so I had to understand some stats

... Because I wanted to reimplement the algorithm itself not just translate it and compile it

I ended up creating a Dynamic Programming algorithm for this which could run on a circular buffer and make the most use out of caches.

The previous service ran in a couple of hours while mine ran in under 100ms — which enabled me to do many experiments and ended up improving the R² by 0.005 as well

Problem
Python interpreter overhead in hot paths
Solution
Python → C++ compiler with zero-copy interop
Latency
~1 hour → <100ms
§ 2 A much better looking graph than the one I had
Basically what I had

Very simplified, and only for illustration purposes

§ 3 Volatility Forecasting Race

Both systems start simultaneously. Each processes one year of daily data and returns a volatility forecast. The ball reaches the finish line when the computation completes.

Python Researcher Code slow as hell
0.0s waiting
Python Compiled No interpreter · <100ms
0ms waiting
§ 4 How It Works
  • Python strategy code is parsed to an AST, type-inferred, then lowered to a typed IR. From there, C++ source is emitted and compiled to a native shared library via LLVM. Strategy engineers write ordinary Python; the runtime runs native code.
  • A bidirectional interoperability layer allows compiled functions to call back into Python objects — and Python to call compiled functions — without data copying or serialisation. No FFI boilerplate required from the strategy author's side.
  • NumPy arrays, Arrow record batches, and Pandas DataFrames are passed across the language boundary as zero-copy views of existing memory buffers.
  • The volatility forecasting service backtests a full year of daily closes and produces a forecast in under 100ms. R² improved by 0.005 over the interpreted baseline.
← Back