LayoutDB — Vlad Cainamisir

Research · 2025

LayoutDB

Disk-Resident Vector Search with In-Memory Performance · Harvard University

Stack

C++17 · AVX-512 · FAISS · mmap

Result

100× over IVF-Flat · 98% recall · 18% RAM

Dataset

2M vectors · 768-dim · Wikitext-2

Context

CS 265 Big Data Systems · 2025

IVF-FLATfull read · every candidate from disk

LAYOUTDBprefix prune · disk only for survivors

Both methods use IVF clustering to select n_probe clusters. IVF-Flat reads every candidate vector in full from NVMe. LayoutDB keeps a compact m-dimensional prefix of each vector in DRAM and checks it first — only vectors that pass the prefix filter trigger a disk read of the suffix. Click to replay.

§ 1 The Disk–Memory Problem

Scale

I/O
Cost

PQ
Tradeoff

Layout
DB

LLM applications depend on vector databases to retrieve relevant context and prevent hallucinations. A modern LLM like DeepSeek V3 is trained on trillions of tokens — storing embeddings for even its training set at 1600 dimensions would require roughly 380 TB of RAM. Purely in-memory indexes like HNSW are infeasible at that scale.

Disk-based solutions exist, but they suffer a fundamental I/O bottleneck. IVF-Flat keeps full vectors on NVMe and must read every candidate in a cluster in full. At 100 μs per disk access and a 1M dataset with n_probe=10 and 1000-vector clusters, a single query costs over 1 second just in data movement — before any distance computation.

Quantization-based methods like IVF-PQ compress vectors into 8–32-byte codes, trading storage for recall. A true neighbor can be missed by the approximate code distance and never reach the refinement stage. Practitioners must inflate n_probe or expansion factors to recover recall — incurring extra I/O without a principled guarantee of correctness.

The key observation: only the first few dozen dimensions are needed to accept or reject most candidates. If the partial distance on the prefix already exceeds the current k-th best distance, the full vector can never be a neighbor — no disk read required. LayoutDB stores just enough in DRAM to eliminate 98% of candidates before touching disk.

DeepSeek V3

~380 TB for 1600d embeddings

IVF-Flat

>1 s data movement per query on 1M dataset

IVF-PQ

Lossy codes cause false negatives

§ 2 Candidate Pruning Race

Both methods scan the same 1000 candidates across n_probe=10 clusters. IVF-Flat fetches every vector in full from NVMe — 1000 disk reads, nothing skipped. LayoutDB streams m-dimensional prefix tiles from DRAM: if the partial distance already exceeds the current k-th best, the candidate is eliminated instantly. Only ~20 survivors trigger a disk read.

IVF-Flat — Full Disk Scan baseline · every candidate read from NVMe

0 disk reads waiting

LayoutDB — Prefix Prune hybrid · DRAM prefix → NVMe suffix only for survivors

0 disk reads waiting

Close-query workload: IVF-Flat 157 ms · LayoutDB (Tiled+Snapshot) 23 ms

§ 3 Performance Results

Average query latency (ms) · Wikitext-768 · n_probe=1 · recall@1

LayoutDB (Tiled+Snapshot) is 6.8× faster than IVF-Flat on close-query workloads and 3× faster on normal queries, while maintaining identical recall@1. Prefix pruning is conservative: raw-prefix distance is a strict lower bound on full L2, guaranteeing zero false negatives. Snapshot mode uses a JLL-scaled projection (expected value matches full distance) with an empirically tuned scaling factor.

Workload	Mode	Avg. Latency	Recall@1	vs IVF-Flat
Close	IVF-Flat	157.1 ms	1.00	—
Close	TiledStorage	42.2 ms	1.00	3.7×
Close	Tiled+Snapshot	23.0 ms	1.00	6.8×
Normal	IVF-Flat	159.0 ms	0.90	—
Normal	TiledStorage	118.8 ms	0.90	1.3×
Normal	Tiled+Snapshot	52.9 ms	0.90	3.0×

§ 4 Implementation

Header-only C++17, ~2k lines. AVX-512 SIMD kernels accelerate prefix distance computation. Zero-copy disk access via mmap — the OS page cache handles hot data automatically. No background compaction or training required after index construction.
IVF clustering via FAISS. An offline prepareData phase calls FAISS's k-means to partition the dataset into n_list Voronoi cells. Each cluster writes two binary files: tiles.bin (m-dimensional prefix tiles, laid out row-by-row for SIMD scanning) and rows.bin (the remaining d−m suffix dimensions). Tiles are sized to land on hardware page boundaries so a single page read fetches 16–64 tiles.
Two prefix modes. Raw-prefix stores the first m dimensions verbatim — the partial squared distance ‖q_1:m − x_1:m‖² is a strict lower bound on the full squared distance, guaranteeing zero false negatives. Snapshot mode stores an m×d orthonormal projection Px; the scaled distance ‖Pq − Px‖² · D/m has expected value equal to the full squared distance, enabling aggressive JLL-backed filtering at the cost of a small controllable false-negative rate.
Three configurable RAM knobs. tile_cache_size pins a fraction of prefix tiles in DRAM. anchor_cache_size keeps full hot vectors as in-memory "anchors" — zero disk traffic for the most-accessed data. optimize_query reorders the scan to process all in-memory vectors first, tightening the k-th best threshold before any disk is touched, which increases selectivity on disk-resident candidates.
At 18% RAM, 98% recall, 1400 QPS. With modest caching (18% of dataset floats in RAM), LayoutDB discards over 99.99% of candidates at prefix time, issues one small read for survivors, and achieves P99 latencies approaching an all-in-memory FAISS baseline — while scaling to datasets that far exceed available DRAM. This represents a 100× throughput improvement over IVF-Flat under simulated disk delays.

← Back