[multi-vector] Multi-vectors with support for per-row metadata by arkrishn94 · Pull Request #785 · microsoft/DiskANN

arkrishn94 · 2026-02-18T22:13:20Z

This change adds support for multi-vectors where every vector is a full-precision-like vector with some metadata. It builds on the Slice implementation that supports the owned, referenced and mutable referenced versions of these vectors.

Primarily this is done by introducing a new struct SliceMatRepr<T, M> (defined for any primitive datatype T: Pod and metadata M : Pod) and providing implementations of Repr and its owned and mut variants for it. This is the bulk of the change in this PR.

This matrix type has been used to support computing max sim and chamfer distance between full precision queries and minmax quantized vectors.

Other changes -

Added the unsafe method from_canonical_mut_unchecked to SliceMut for constructing from raw data.
Fixed the lifetime parameter in the rows method of MatRef to ensure the caller is guaranteed that the row yielded by the iterator lives as long as the underlying MatRef.

merge main

Copilot

Pull request overview

Adds a new multi-vector matrix representation whose rows are canonical meta::slice::Slice views (vector elements + per-row metadata), and wires it into MinMax’s asymmetric (full-query vs quantized-doc) distance path.

Changes:

Introduces SliceMatRepr<T, M> + Repr/ReprMut/ReprOwned + constructors for owned/borrowed matrices backed by canonical [u8] layout.
Extends MinMax multi-vector MaxSim/Chamfer to accept arbitrary query Repr (enabling full-precision query rows with metadata).
Fixes MatRef::rows() iterator lifetime and adds SliceMut::from_canonical_mut_unchecked.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
diskann-quantization/src/multi_vector/slice.rs	New `SliceMatRepr` matrix representation for canonical slice rows (vector + per-row metadata), with extensive tests.
diskann-quantization/src/multi_vector/mod.rs	Exposes `SliceMatRepr` and related errors; adds module wiring and docs table update.
diskann-quantization/src/multi_vector/matrix.rs	Adjusts `MatRef::rows()` return lifetime to match the underlying `MatRef`.
diskann-quantization/src/minmax/multi/mod.rs	Updates docs/example to show full-query matrix + asymmetric distances.
diskann-quantization/src/minmax/multi/max_sim.rs	Generalizes MinMax MaxSim/Chamfer to generic query `Repr`; adds `FullQueryMatRef` alias and multi-vector `CompressInto` for full-query rows.
diskann-quantization/src/minmax/mod.rs	Re-exports `FullQueryMatRef`.
diskann-quantization/src/meta/slice.rs	Adds `SliceMut::from_canonical_mut_unchecked` and reuses it from the checked constructor.

Comments suppressed due to low confidence (1)

diskann-quantization/src/minmax/multi/max_sim.rs:67

MinMaxKernel::max_sim_kernel doesn’t special-case an empty doc: when doc.num_vectors() == 0, min_distance remains f32::MAX and the callback is invoked with that value for every query row. This makes MaxSim/Chamfer behave very differently from the standard SimpleKernel (which returns early and leaves scores unchanged / chamfer=0). Consider adding an early return when doc has zero rows (or defining/ documenting the intended semantics and adjusting callers/tests accordingly).

        for (i, q_ref) in query.rows().enumerate() {
            // Find min distance (IP returns negated, so min = max similarity)
            let mut min_distance = f32::MAX;

            for d_ref in doc.rows() {
                let dist = MinMaxIP::evaluate(q_ref, d_ref)?;
                min_distance = min_distance.min(dist);
            }

            f(i, min_distance);
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-18T22:19:27Z

diskann-quantization/src/multi_vector/slice.rs

+            std::ptr::copy_nonoverlapping(v.ptr.as_ptr(), buffer.as_mut_ptr(), total);
+        }
+
+        // SAFETY: `buffer` has the correct length and alignment from above cheks.


Typo in comment: "cheks" → "checks".

Suggested change

// SAFETY: `buffer` has the correct length and alignment from above cheks.

// SAFETY: `buffer` has the correct length and alignment from above checks.

Copilot · 2026-02-18T22:19:27Z

diskann-quantization/src/multi_vector/slice.rs

+    /// Returns the cached byte stride per row.
+    const fn stride(ncols: usize) -> usize {
+        slice::SliceRef::<T, M>::canonical_bytes(ncols)
+    }


SliceMatRepr uses SliceRef::canonical_bytes(ncols) as the row stride, but canonical_bytes does not include any trailing padding to ensure the next row starts at SliceRef::canonical_align(). For metadata with higher alignment than T (e.g. #[repr(align(8))]), rows after the first can become misaligned, and SliceRef::from_canonical_unchecked will then create misaligned references (UB). Consider defining a padded row stride (e.g. canonical_bytes(ncols).next_multiple_of(canonical_align)) for pointer stepping, while still passing a canonical_bytes-sized subslice to SliceRef/SliceMut for each row; update total_bytes/layout/check_slice accordingly and document the padding.

Copilot · 2026-02-18T22:19:28Z

diskann-quantization/src/multi_vector/slice.rs

+    pub fn new(nrows: usize, ncols: usize) -> Result<Self, SliceMatReprError> {
+        let stride = Self::stride(ncols);
+
+        // Check that total bytes don't overflow or exceed isize::MAX.
+        let total = nrows
+            .checked_mul(stride)
+            .ok_or(SliceMatReprError::Overflow {


SliceMatRepr::new relies on SliceRef::canonical_bytes(ncols) for stride, but canonical_bytes computes size_of::<T>() * ncols without checked arithmetic and can overflow usize for large ncols, potentially letting new succeed with an incorrect (wrapped) stride and leading to OOB/UB later. It would be safer to compute the stride with checked operations (or explicitly validate ncols against usize::MAX / size_of::<T>() and addend overflow) before the nrows * stride check.

Copilot · 2026-02-18T22:19:28Z

diskann-quantization/src/multi_vector/slice.rs

+    fn check_slice(&self, slice: &[u8]) -> Result<(), SliceMatError> {
+        let expected = self.total_bytes();
+        if slice.len() != expected {
+            return Err(SliceMatError::LengthMismatch {
+                expected,
+                found: slice.len(),
+            });
+        }
+
+        let align = Self::alignment().raw();
+        if !(slice.as_ptr() as usize).is_multiple_of(align) {
+            return Err(SliceMatError::NotAligned { expected: align });
+        }


check_slice enforces base-pointer alignment even when expected == 0. For empty matrices (e.g. nrows == 0), callers may reasonably pass an empty &[u8]/&mut [u8] that has a dangling pointer not meeting canonical_align(), causing MatRef::new/MatMut::new to fail even though no memory will be dereferenced. Consider treating expected == 0 as a special case and skipping the alignment check (or accepting any pointer) so zero-sized matrices can be constructed from empty slices.

Copilot · 2026-02-18T22:19:28Z

diskann-quantization/src/meta/slice.rs

+    ///
+    /// This invariant is checked in debug builds and will panic if not satisfied.
+    pub unsafe fn from_canonical_mut_unchecked(data: &'a mut [u8], dim: usize) -> Self {
+        debug_assert_eq!(data.len(), Self::canonical_bytes(dim));


from_canonical_mut_unchecked docs state that both length and alignment invariants are checked in debug builds, but the implementation only debug_assert_eq!s the length. Consider adding a debug_assert! on data.as_ptr() alignment to Self::canonical_align() (or adjusting the docs) so the safety contract matches behavior and misaligned callers are caught earlier.

Suggested change

debug_assert_eq!(data.len(), Self::canonical_bytes(dim));

let expected_align = Self::canonical_align().raw();

let expected_len = Self::canonical_bytes(dim);

debug_assert!((data.as_ptr() as usize).is_multiple_of(expected_align));

debug_assert_eq!(data.len(), expected_len);

codecov-commenter · 2026-02-18T23:00:53Z

Codecov Report

❌ Patch coverage is 94.36451% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.05%. Comparing base (e5c83f5) to head (63fb95d).

Files with missing lines	Patch %	Lines
diskann-quantization/src/multi_vector/slice.rs	95.23%	31 Missing ⚠️
diskann-quantization/src/minmax/multi/max_sim.rs	91.01%	16 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #785      +/-   ##
==========================================
+ Coverage   89.00%   89.05%   +0.04%     
==========================================
  Files         428      429       +1     
  Lines       78417    79194     +777     
==========================================
+ Hits        69795    70525     +730     
- Misses       8622     8669      +47

Flag	Coverage Δ
miri	`89.05% <94.36%> (+0.04%)`	⬆️
unittests	`89.05% <94.36%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
diskann-quantization/src/meta/slice.rs	`99.51% <100.00%> (+<0.01%)`	⬆️
diskann-quantization/src/multi_vector/matrix.rs	`97.28% <100.00%> (ø)`
diskann-quantization/src/minmax/multi/max_sim.rs	`94.24% <91.01%> (-3.82%)`	⬇️
diskann-quantization/src/multi_vector/slice.rs	`95.23% <95.23%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

arkrishn94 added 10 commits February 13, 2026 17:11

slice_repr first draft

3951e50

tests

fd4fd02

fmt 2024 edition

a36f79f

merge main

cb2db3d

fixed expect

144ba54

test for full

4fec0be

compress into for full

cc9687a

some more docs

64dbd48

Merge remote-tracking branch 'origin/main' into u/adkrishnan/slice-mat

15d4166

merge main

restore bad formatting

63fb95d

arkrishn94 requested review from a team and Copilot February 18, 2026 22:13

Copilot started reviewing on behalf of arkrishn94 February 18, 2026 22:14 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[multi-vector] Multi-vectors with support for per-row metadata #785

[multi-vector] Multi-vectors with support for per-row metadata #785
arkrishn94 wants to merge 10 commits intomainfrom
u/adkrishnan/slice-mat

arkrishn94 commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

codecov-commenter commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

	// SAFETY: `buffer` has the correct length and alignment from above cheks.
	// SAFETY: `buffer` has the correct length and alignment from above checks.

-        debug_assert_eq!(data.len(), Self::canonical_bytes(dim));
+        let expected_align = Self::canonical_align().raw();
+        let expected_len = Self::canonical_bytes(dim);
+        debug_assert!((data.as_ptr() as usize).is_multiple_of(expected_align));
+        debug_assert_eq!(data.len(), expected_len);

Conversation

arkrishn94 commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Feb 18, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments