PROPOSAL

Benchmarking eCP-FS


Supervisor: Omar Shahbaz Khan
Semester: Spring 2026
Tags: High-dimensional Indexing, Vector Store, Rust, Python, Approximate Nearest Neighbor

The extended Cluster Pruning (eCP) index, is a hierarchical approximate nearest neighbor index.

eCP-FS is a “white-box” implementation of the index as a file structure using zarr. The index building is done through Python, while the index loading and search have been implemented in both pure Python and a Rust-Python package.

While the index is slower from disk than other disk-based ANN solutions due to it not being stored as a single or few binary files but tons of smaller files, it allows more control over what is loaded, is easy to extend and when in-memory is on par with other memory based indexes.

Aside from the latency aspects, the quality aspects of eCP have been shown to be decent for very large collections and for exploratory/diversified search cases.

However, in order to get a better understanding of the index for various scenarioes it needs to be exposed to more datasets with various tasks.

In this project the aim is to refine the eCP-FS codebase and make it suitable for highly search oriented benchmarks such as ANN-benchmarks and VIBE.