Skip to content

fix: preserve cloud storage URIs in LanceDB vector store config validation#2233

Open
ohharsen wants to merge 1 commit intomicrosoft:mainfrom
ohharsen:fix/cloud-uri-vector-store
Open

fix: preserve cloud storage URIs in LanceDB vector store config validation#2233
ohharsen wants to merge 1 commit intomicrosoft:mainfrom
ohharsen:fix/cloud-uri-vector-store

Conversation

@ohharsen
Copy link

Description

_validate_vector_store_db_uri() unconditionally calls Path(db_uri).resolve() on the LanceDB db_uri, which destroys cloud storage URIs. For example, gs://my-bucket/indexes/lancedb gets resolved to /current/working/dir/gs:/my-bucket/indexes/lancedb — stripping the double slash from the scheme and prepending the CWD. This silently breaks LanceDB's native support for GCS, S3, and Azure cloud storage backends.

Related Issues

N/A — discovered while using LanceDB with a gs:// URI for Google Cloud Storage.

Proposed Changes

  • graphrag/config/models/graph_rag_config.py: Skip Path(db_uri).resolve() when db_uri starts with a recognized cloud storage scheme (gs://, s3://, az://, abfs://). Local file paths continue to be resolved as before.
 def _validate_vector_store_db_uri(self) -> None:
     """Validate the vector store configuration."""
     store = self.vector_store
     if store.type == VectorStoreType.LanceDB:
         if not store.db_uri or store.db_uri.strip == "":
             store.db_uri = graphrag_config_defaults.vector_store.db_uri
-        store.db_uri = str(Path(store.db_uri).resolve())
+        if not store.db_uri.startswith(("gs://", "s3://", "az://", "abfs://")):
+            store.db_uri = str(Path(store.db_uri).resolve())

Checklist

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

Additional Notes

Reproduction steps:

  1. Configure VectorStoreConfig with a cloud URI: db_uri="gs://my-bucket/path/to/lancedb"
  2. Pass it to GraphRagConfig(vector_store=...)
  3. Observe that config.vector_store.db_uri is now a local filesystem path (e.g. /cwd/gs:/my-bucket/path/to/lancedb)
  4. local_search and drift_search fail with AttributeError: 'LanceDBVectorStore' object has no attribute 'document_collection' because lancedb.connect() can't find any tables at the mangled path

This also affects @validate_call on local_search/drift_search, which re-validates the config (creating a Pydantic copy), so even manually overriding db_uri after initialization doesn't work.

LanceDB has supported cloud storage URIs natively since v0.6+ (docs). This one-line fix enables that integration to work through GraphRAG's config layer.

@ohharsen ohharsen requested a review from a team as a code owner February 17, 2026 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant